Comparison and Evaluation of Support Vector Machine and Gene Programming in River Suspended Sediment Estimation (Case Study: Kashkan River)

Simulation and evaluation of sediment are important issues in water resources management. Common methods for measuring sediment concentration are generally time consuming and costly and sometimes does not have enough accuracy. In this research, we have tried to evaluate sediment amounts, using Support Vector Machine (SVM), for Kashkanriver, Iran, and compare it with common Gene-Expression Programming. The parameter of flow discharge for input in different time lags and the parameter of sediment for output dhuring contour time (1998-2018) considered. Criteria of correlation coefficient, root mean square error, mean absolute error and Nash Sutcliff coefficient were used to evaluate and compare the performance of models. The results showed that two models estimate sediment discharge with acceptable accuracy, but in terms of accuracy, the support vector machine model had the highest correlation coefficient (0.994), minimum root mean square error (0.001ton/day) , mean absolute error(0.001 ton/day) and the Nash Sutcliff (0.988) hence was chosen the prior in the verification stage. Finally, the results showed that the support vector machine has great capability in estimating minimum and maximum sediment discharge values. Keywords— Suspended Sediment, Kashkan, Support Vector Machine, Gene Expression Programing.


INTRODUCTION
Historically, there have been a number of attempts to estimate the sediment yield using modeling that can be broken down into different groups (White, 2005). The deterministic models can be grouped as either empirical or conceptual. These models generally need long data records and take into account the hydrodynamics of each mode of transport. The deterministic and stochastic models are based on the physical processes of the sediment yield, and there are some of these models in the literature (Singh et al., 1998;Yang, 1996;Cohn et al., 1992;Forman et al., 2000) for sediment discharge estimation. The application of the physics-based process computer software programs necessitates detailed spatial and temporal environmental data that is not often available. In practice, the most commonly used model is the rating curve model, which is based on the relationship between the flow Q and the sediment S. The amount of sediment yield in a river is measured as sediment load (S), which depends upon the sediment concentration and the river discharge (Q). Accurate estimation of the sediment yield is rather difficult because of the temporal variation of both the sediment concentration and the river discharge.
Generally, the time-series techniques assume linear relationships among variables. However, these techniques are difficult to employ for the real hydrologic data due to the temporal variations. In contrast, support vector machine (SVM) is a nonlinear model and can be used to identify these relations. Neural networks are increasingly being used in diverse engineering applications because of their ability to solve nonlinear regression problems successfully. This feature is highly important aspect of neural computing because it allows it to be used to model a function where one has little information or incomplete understanding. Thus, the SVM approach is extensively used in the water resources literature in the field of prediction and forecasting (In recent years, Support Vector Machines (SVM) has been widely used in various fields. Runoff and sediment yield estimation can utilize SVM as well (misraet al,2009). SVM is a powerful nonlinear pattern recognition technique (Vapnik,1998;Kecman,2000). The relationship was used to estimate suspended sediment load by using linear regression model, power regression model, artificial neural network and support vector machine in this study. Records of river discharges and suspended sediment loads in Kaoping river basin were investigated as case study. The result shows that SVM outperforms the ANN and other two regression models (Chiang and Tsai,2011).This study presents geneexpression programming (GEP), which is an extension of genetic programming (GP), as an alternative approach for modeling the functional relationships of sediment transport in sewer pipe systems. A functional relation has been developed using GEP. The proposed relationship can be applied to different boundaries with partial flow. The proposed GEP approach gives satisfactory results) compared to the existing predictor (Ghani and Azamathulla,2011).The study Records of river discharges and suspended sediment loads in the Goodwin Creek Experimental Watershed in United States were investigated as a case study. As a result, we believe that the proposed SVM model has high potential for predicting suspended sediment load (Chiang et al.,2014).The study compares the results of the Soil and Water Assessment Tool (SWAT) with a Support Vector Machine (SVM) to predict the monthly streamflow of arid regions located in the southern part of Iran, namely the Roodan watershed. Results indicate that the SVM has a closer value for the average flow in comparison to the SWAT model; whereas the SWAT model outperformed for total runoff volume with a lower error in the validation period (Jajarmizadehet al.,2015).Discharge time series were investigated using predictive models of support vector machine (SVM) and artificial neural network (ANN) and their performances were compared with two conventional models. The evaluation of the results includes different performance measures, which indicate that SVM and ANN have an edge over the results by the conventional RC and MLR models. Notably, peak values predicted by SVM and ANN are more reliable than those by RC and MLR, although the performances of these conventional models are acceptable for a range of practical problems (Ghorbaniet al.,2016).In total, according to done research and mention this point that Kashkan river as the main source of water supply for different sectors and adjacent areas, so the estimated Suspended sediment and management proceedings to improved optimal operation of reservoir more than ever it is essential. So the purpose this research is estimated Suspended sediment in Kashkanriver with the help support vector machine and compared that's results with gene expression programming.

Case study and used data
Kashkan River is the most flooded river in Lorestan province. The Kashkan catchment area is located in the southwestern part of Iran with a surface area of 1.5 km2. This area forms an important part of the rugged branches of the Karkhe River and covers about one-third of the Lorestan soil. Watershed of Kashan River in the hydrological division of Iran is a part of the Persian Gulf catchment. The river is located between latitude ′ 34 ′ 31 ° 47 ° to ′ 12 48 12 ° 48 east and latitude ″ 45 ° 5 ° 33 to ″ 41 ° 44 ° 33 ° N in Lorestan province. The location of the study area is shown in Figure 1.

Gene Expression Programming
Gene Expression Programming method presented with Ferreira in 1999 (Ferreira.2001). This method is a combination of genetic algorithms (GA) and genetic programming (GP) method than in this, simple linear chromosomes of fixed length are similar to what is used in genetic algorithm and branched structures with different sizes and shapes aresimilar to the decomposition of trees in genetic programming.Since this method all branch structures of different shapes and size are encoded in linear chromosome with fixed length, this is equivalent than Phenotype and Genotype are separated from each other and system could use all evolutionary advantagesbecause of their. Now,however the Phenotype in GEP included branch structures used in GP, but the branch structures be inferences by GEP (than also calledtreestatement) are explainer all independent genomes. In short can say improvements happened in linear structure then is expressed similar with tree structure and this causes only the modified genomemoved to the Next Generation and don't need with heavy structure to reproduce and mutation (Ferreira.2001

Support Vector Machine
Support Vector Machine is anefficient learning system based on optimization theory that used the principle of induction minimization Structural error and results an overall optimal solution (Vapnik,1998). In regression model SVM is estimated function associated with the dependent variable Y as if is afunction of several independent variables X(Xuet al.,2007).Like other regression problems is assumed the relationship between the dependent and independent variables to be determined with algebraic function similar f(x) plus some allowable error ( ).
If W is coefficients vector, b is constant characteristic of regression function, and also ∅ is kernel function, then goal is to find a functional form for f(x). It is realized with SVM model training by collection of samples (train collection).
To calculate w and b require to be optimized error function in -SVM with considering the conditions embodied in Equation 4 (Shin et al.,2005).
In the above equations, C is integer and positive, that it's factor of penalty determinant when an error occurs. ∅ is kernel function, N is number of samples and two characteristics ε i and ε i * are shortage variables. Finally can rewrite SVM function as follow (Shin et al.,2005): Average Lagrange Coefficients α ̅ i in characterized space is ∅(x).Maybe calculation be very complex. To solve this problem, the usual process of SVM model is choose a kernel function as followrelation.
Can be used of different kernel functions to create different types of -SVM. Various kernel functions used in SVM regression models are: Polynomial with three Characteristics of the target, Radial Basis Functions (RBF) with one Characteristics of the target, and Linear respectively, are calculated as follows relation (Vapnik.1998).

Evaluation Criteria
In this research to evaluate the accuracy and efficiency of the models was used indices Correlation Coefficient (CC), Root Mean Square Error (RMSE), Nash-Sutcliffe coefficient (NS), and Bias according to the following relations.Best values for these four criterions are respectively 1, 0, 1, and 0.
In the above relations x i and y i are respectively observed and calculated values in time step i, N is number of time steps, x ̅ and y ̅ are respectively mean observed and calculated values.

III. RESULTS AND DISCUSSION
One of the most important steps in modeling, is select the right combination of input variables. Also shown in Table  2.The structure of input combinations.
In this Table Q(t), Q(t-1) and Q(t-2) are respectively discharge in t, t-1 , t-2 time as input and S(t) is sediment in t time as output being considered. Due to the significant cross-correlation between input and output data, in order to achieve an optimal model to estimate the sediment to Kashkan river use of different combinations of input parameters that showed them in Table3. To estimate input discharge Kashkan river using by Gene Expression Programming and Support Vector Machine with have catchment hydrometric data from 240 registered records during the period (1998-2018), count in 192 records to training and 48 remaining records to verification.

The results of Gene Expression Programming
Using gene expression programming due to the selection of variables in the model and remove variables with less impact and also ability to provide a clear relationship were considered to estimating sediment to the Kashkanriver. Since ever four input are incorporated to determining the significant variables and more reviews in addition three of the original operator (F1) and the states based on arithmetic operators default (F2). The reason for choice this type of operator has been based on studies (Ghorbaniet al.,2012) and (Khatibiet al.,2012).
Results of gene expression programming model for both operator in Table3 ., 1994). To obtain suitable values of these parameters (C, ɛ, σ),the RMSE was used to optimize parameters. In order to estimate the sediment to the Kashkanriver by SVM model can examine types of kernel function, than was selected linear kernel, polynomial and radial basis functions that are common types used in hydrology. The results of study models is given in Table3. According to this table combined model number 3 with radial basis functions kernel has the highest correlation coefficient R=0.994, lowest root mean square error RMSE=0.001 ton/day ,mean absolute errorMAE=0.001ton/day and NS=0.988 in verification stage that has optimal solution than other models. In Fig3 shown the best model for verification of data. As shown in Figure 2, scatter plot Support Vector Machine matched observed and simulated values relative to the with the best fit line there y=x. Which explains the ability of this model is the estimation most values. The scatter plots of gene expression programming related to the verification stage in Fig(2-b) show the fit line of computational values with four mathematical operators to the best fit line y=x.As is from this

IV. CONCLUSIONS
In this research, we tried to evaluated performance some models to simulating sediment to the Kashkan River In the province lorestan using by sediment month data in Kashkanriver. Used models include gene expression programming and support vector machine models.
Observed sediment values compared with estimated sediment in these models (GEP and SVM). The results summarized as follows: A: SVM model has high accurate and a little error to estimate minimum, maximum, middle values and peak sediment, and high correlation with the observed value. B: Gene expression programming model with the four basic arithmetic operations has high ability to estimating minimum, maximum, and middle values and peak values, also support vector machine with radial basis functions kernel has high ability estimating minimum and middle values but to estimating maximum values do have enough operation. C: Increasing the number of parameters in the various models to simulating sediment cause to improve operation to estimating sediment. D: Estimating sediment using by combined models have lower error and high correlation than other models to estimated sediment in reservoirs dam.
Totally the results of this research showed support vector machine method has highest accurate than other models. As research results Jajarmizadeh et al (2015) and  has been proven its. Also this research shown using of gene expression programming and support vector machine models could use to estimating sediment to the river.

V. ACKNOWLEDGMENTS
The authors are very grateful of the Regional Water Company, Lorestan Province, Iran, because they participate in gathering the data required to perform the work.