GEP- and MLR-based equations for stable channel analysis

For decades, research on stable channel hydraulic geometry was based on the following parameters: river discharge, dimensionless discharge, the median size of bed material and the slope. Although significant research has been conducted in this area, including applied machine learning to increase the geometry model prediction accuracy, there has been no remarkable improvement as the variables used to describe the geometry relationship remain the same. The novelty of this study is demonstrated by the parameters used in the stable channel geometry equations that outperform the existing equation’s accuracy. In this research, sediment transport parameters are introduced and analysed by applying the multiple linear regression (MLR) and gene expression programming (GEP) methods. The new equation of the width, depth and bed slope can give much-improved results in efficiency and lower errors. Furthermore, a new parameter B/y is introduced in this study to solve the restriction issue, either in width or depth prediction. The results from MLR and GEP show that in addition to the existing hydraulic geometry parameter, the B/y parameter is also able to give high accuracy results for width and depth predictions. Both calibration and validation for the B/y parameter yield high R and NSE values with low mean squared errors and mean absolute errors.

seem to be complicated and require a deep understanding of fluid flow, which is often time-consuming. Bonakdari et al. (2020) suggested including different key geomorphological variables to increase the prediction accuracy.
Currently, the use of machine learning in water resources engineering has been the method of choice for many researchers (Muhammad et al. 2018;Kargar et al. 2019;Montes et al. 2020;Roushangar & Shahnazi 2020;Sharafati et al. 2020). This method was extensively investigated by Gholami et al. (2019aGholami et al. ( , 2019b and Shaghaghi et al. (2018) for use in a stable channel. Machine learning can be applied by using various methods such as artificial neural networks, adaptive network-based fuzzy inference systems, support vector machines, gene expression programming (GEP) and evolutionary polynomial regression (EPR) (Giustolisi & Savic 2009;Ebtehaj et al. 2019;Khosravi & Javan 2019;Roushangar & Ghasempour 2019;Yahaya 2019;Asheghi et al. 2020;Najafzadeh & Oliveto 2020). The research done by Bonakdari et al. (2020) showed that GEP and EPR had improved the accuracy of the model prediction; however, the improvement is not remarkable compared to the original equation.
These earlier findings suggest that a new equation should be developed to increase the accuracy of stable channel prediction. To date, many researchers have focused on parameters Q, d 50 S o and Q* to determine stable channel geometry. This paper will focus on introducing a new equation for stable channel geometry by adopting sediment transport parameters that resemble the hydraulic characteristic of the river. The width/depth (B/y) parameter will be introduced to the equation for predicting stable channel geometry to solve the problem when there is a restriction in width or depth, as discussed earlier in the analytical approach. The proposed equation will be further enhanced by using the machine learning method to increase the model prediction accuracy.

Multiple linear regression
This study uses ordinary multiple regression in linearised form after log transforming all the variables included in the equation. Since all the possible test cases are described in one power law equation, the fitting of all regressions can be determined using statistical analysis software, Statistical Product and Service Solutions (SPSS). The dependent variable used for this equation is B*, y*, S o and B/y.

Gene expression programming
GEP has been developed as an extension to genetic programming (Koza 1994). This program involved several search techniques, such as decision trees and logical expression, that help to evolve computer programs. GEP is encoded in linear chromosomes that can be represented by an expression tree (ET). With the aim to analyse particular data through genetic modification, the population of an ET will adapt and discover the traits of the data. The decision tree and logical expression have to be initially carefully determined to ensure the program yields the best results (Ferreira 2001b). The chromosome of each individual is randomly selected and evaluated by the fitness function and selected by the genetic operator to reproduce with modification (mutation). The same process occurs in the new individual for whom the trends are repeated until the required accuracy is obtained (Ferreira 2001a(Ferreira , 2001b. Figure 1 summarises the steps taken to execute the GEP. The data are first randomised before continuing to the next step. Approximately, 80% of the data are used for training and the remaining data for testing purposes. The setting of the function and chromosome architecture includes chromosome length, gene number, head size, linking function and genetic operators. The root-mean-square error (RMSE) was used as a fitness function to fit the curve to the target value. The stopping criteria were set at 25,000 generations for all the runs of the various models. The summarised parameters used in the GEP model are shown in Table 2.

Study area
The river data used in this study were obtained from the Malaysian Department of Irrigation and Drainage. Three rivers were chosen for the study to represent different hydraulic characteristics of a large river (Muda River), a medium river (Langat River) and a small river (Kurau River). The locations of these rivers are shown in Figure 2.

Muda River Basin
The Muda River originates in the highland areas of Kedah, a northern state of Malaysia, which borders Thailand. It is the largest river in Kedah and is important in supplying water to the three states of Kedah, Perlis and Pulau Pinang (Sim et al. 2015). The Muda River Basin covers a drainage area of 4,210 km 2 . In terms of length, the Muda River is approximately 180 km long, with a slope of 1/2,300 (or 0.00043). The channel width of the Muda River is typically around 10 m upstream, 100 m in midstream and widest at its estuary, averaging 300 m (DID 2009). In terms of depth, bathymetric surveys show that the shallowest point in the river is located 2.5 km upstream from the river mouth, resulting in difficulty for navigation during low tides (Julien et al. 2010).

Langat River Basin
The Langat River forms one of the four major river systems in the state of Selangor. The Langat River is a medium-sized river, approximately 180 km long with an average annual flow of 35 m 3 /s and a mean annual flood of 300 m 3 /s. The Langat River system flows from the north-eastern state of Selangor to Negeri Sembilan and the Federal Territory of Putrajaya, finally emptying into the Straits of Malacca. The Langat River Basin has a total catchment area of 2,396 km 2 (DID 2009).

Kurau River Basin
The Kurau River Basin is a typically small river basin, draining an area of approximately 682 km 2 . In terms of elevation, elevations at the river headwaters are moderately high, being 900-1,200 m. In terms of slopes, the upper 6.5 km of the river averaged 12.5%, while those lower down the valleys are significantly lower, in the order of 0.25-5%. The average velocity of the Kurau River ranges from 0.45 to 0.636 m/s with the highest sediment load being 0.878 kg/s (Saleh et al. 2018). A reservoir was constructed at the middle section of the river, approximately 65 km upstream. Two major systems, the Kurau River system and the Merah River system, are upstream of this reservoir, and both rivers drain into the reservoir (DID 2009).

Field data collection
River surveys, flow measurements and field data collection were conducted based on the Guidelines for Field Data Collection and Analysis of River Sediment by Ab. Ghani et al. (2003). The data collection includes flow discharge, bed and bank material, suspended load, bedload and water surface slope. In addition, bed elevation, water surface and thalweg measurements were also conducted at selected cross-sections. The range of the data for this study is shown in Table 3.

Development of new equations for stable channel geometry
The new equations were formulated by using dimensionless hydraulic geometry relations as proposed by Kaless et al. (2014), Parker (1978), Parker (1979) and Pitlick & Cress (2002). Along with the dimensionless stable geometry equation in terms of B*, y* and S, a new dimensionless B/y equation was introduced to solve the restriction issue in either width or depth prediction. The equation also uses dimensionless full bank discharge Q* as one of the parameters in the development of the equation.
Researchers have divided the significant parameters that contribute to sediment transport into four-parameter classes, namely, mobility, transport, sediment and flow resistance (Ebtehaj & Bonakdari 2016;Harun et al. 2020). This concept has been widely used by researchers to define the sediment transport in the river (Sinnakaudan et al. 2006;Harun et al. 2020), in closed channels or in pipes (Ebtehaj et al. 2019;Danandeh Mehr & Safari 2020). These parameters were also found to be prevalent in the application of sewer sedimentation analysis Kargar et al. 2019). The details of the parameters are shown in Table 4. These parameters were combined to get the best relationship with regard to the dimensionless hydraulic geometry relations to width, depth, slope and width/depth. The parameters governing the stable channel geometry are presented in a dimensionless form, where ψ is the flow parameter, R is the hydraulic radius, U* is the shear velocity, S o is the bed slope, S s is the specific gravity of the sediment, v s is the fall velocity of the bed material, V is the cross-sectional velocity, g is the acceleration due to gravity, A is the area of cross-section, v is the kinematic viscosity of water, ϕ is the transport parameter, D gr is the dimensionless grain size and l s is the friction factor.

Goodness of fit of model performance
The equations were evaluated by using several indices, namely, R 2 , Nash-Sutcliffe efficiency coefficient (NSE), mean squared error (MSE), mean absolute error (MAE) and discrepancy ratio (DR). Different statistical indexes used are presented in Equations (1)-(5), where O i and P i are the observed and predicted values, while O i and P i are the mean observed and predicted values. R 2 represents the correlation between measured and modelled values, MSE is the calculated mean error in the form of data units squared and MAE represents the absolute error between the measured and modelled values. In large events, MAE helps to reduce bias (Bennett et al. 2013). The NSE is used to describe how much the modelling error is to the variance of the observed data. One is the perfect value for the NSE; an NSE less than 0 indicates that the predicted model is unreliable and an NSE value closer to 1 represents high accuracy between the observed and the predicted model ( (Gholami et al. 2017) and the perfect correlation of computed and measured stable channel geometry should be one (Yadav et al. 2019). This study uses a DR of 0.8-1.2 (20% accuracy) to analyse the prediction model in multiple linear regression (MLR).

Stable channel geometry by using MLR analysis
The significance of each parameter in Table 4 was assessed to ascertain which parameter is influential to the stable channel geometry. The correlation analysis determines the significance of the parameter. The probability (p) value , α ¼ 0.05 indicates convincing evidence that the parameter is influencing the stable channel geometry. The results from the correlation analysis show that the stable channel geometry (B/y, B*, y*, S) were significant to the parameters Q*, ψ, The selected best combination of characteristic parameters, together with the related equations from multiple linear analyses, were further assessed in the study of variance (ANOVA) approach. Figure 3 explains the different input combinations for the determination of dimensionless width/depth, width, depth and the slope of a stable channel.

Width/depth (B/y) prediction
The B/y parameter was evaluated with a different combination of inputs, as shown in Figure 3. This study shows that B/y was significant to the combination of parameters Q*, ψ, U Ã =V, V=gd 50 (S s À 1), VS o =v s , D gr , U Ã =v s and R/d 50 . In total, there were 37 combinations that were significant to the prediction of B/y. The best five combinations of the parameters are given in Table 5.
The R 2 value for model one is the highest among all the models (0.890) followed by model two (0.638), model three (0.598), model four (0.542) and model five (0.453). In contrast, MSE for model one yields the lowest MSE (45.549) followed by model two (157.900), model three (187.70), model four (208.298) and model five (253.270). As for the NSE, model one yields the highest value (0.889), followed by model two (0.614), model three (0.542), model four (0.492) and model five (0.382). A comparison between all models shows that model one yields the highest DR value which ranges between 0.8 and 1.2 (79.06%), and model five yields the lowest DR value (31.62%). The results of this study indicate that the B/y is best described by the combination parameters of Q*, V= ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi gd 50 (S s À 1) p and R/d 50. The above findings conclude that model one is superior in comparison to the other models.

Width prediction by using dimensionless width (B*)
Results from the correlation analysis revealed that B* was significant with the parameters Q*, ϕ, ψ, U Ã =V, VS o =v s , V=gd 50 (S s À 1), D gr , U Ã =v s , l s and R/d 50 . There were 51 combination models altogether for the B* prediction. Table 6 lists the best five combinations of the parameters that are significant to the B prediction.
Model one has the highest DR (93.16%), followed by model two (64.10%), model four (55.56%), model three (46.15%) and model five (37.61%). In terms of R 2 , model one has the highest value (0.964), and model five has the lowest value (0.745). The trend was also the same for NSE, with model one showing the highest value (0.964), thereby indicating that the model predicts better compared to the other models. Moreover, model one tends to yield the lowest MAE (3,031.499); meanwhile, model five has the highest MAE among the models (10,167.036). Model one is considered superior compared to the other models because it has the highest R 2 and NSE and is able to give better DR that has 93.16% accuracy.

Mean depth prediction by using dimensionless depth (y*)
Significant parameters were combined to yield the best equation that can predict the average depth of the river. In the present study, it is observed that parameter Q*, C v , ϕ, ψ, VS o =v s , V=gd 50 (S s À 1) , U Ã =V, D gr , l s and R/d 50 were significant to the development of depth prediction. The analysis showed that 51 combination parameters were significant to the y* prediction. Table 7 lists the best five combinations of parameters that are significant to the depth prediction. The bold values highlight the best model for every stable channel geometry prediction.
Journal of Hydroinformatics Vol 23 No 6, 1255 Models one and two have the highest value of DR square (92.31%) followed by model three (91.88%), model four (91.45%) and model five (91.03%). Among all the models, Model two has the lowest in terms of MSE (58,451.679) and the highest in terms of R 2 (0.977) and NSE (0.977). Model two is selected to predict the depth instead of model one because model two gives a higher R 2 and NSE and the lowest MSE compared to the other models.

Slope (S) prediction
The prediction of the slope was calculated by combining the significant parameter, as discussed in Figure 3. In this study, parameters C v , ϕ, ψ, VS o =v s , D gr and R/d 50 were observed significant to the slope prediction. The study revealed that 57 models were significant to the slope prediction. The best five combinations for the slope prediction are listed in Table 8.
Model one has the highest R 2 (1) followed by model three (0.807), model two (0.795), model four (0.789) and model five (0.763). The same trend also applies to the NSE, but for MSE, the trend is in the reverse order. Model one has the lowest MSE, and model five has the highest MSE. Model one is found to be superior compared to the other models, as it has the highest DR ratio (100%) followed by model two (55.56%), model three (54.70%), model four (53.85%) and model five (50.43%).
The summary of the stable channel geometry prediction model in dimensionless form by using the MLR method is shown in Table 9.

Stable channel geometry by using GEP
The best model for every stable channel parameter was further determined by using GEP. The function for the stable hydraulic geometry was analysed based on the significant parameter, as shown in Table 9. The formulations of GEP for stable hydraulic geometry are shown in Table 10. The bold values highlight the best model for every stable channel geometry prediction. The bold values highlight the best model for every stable channel geometry prediction.

Journal of Hydroinformatics Vol 23 No 6, 1256
The performance of the training and testing of the GEP model is shown Table 11. The GEP model for all stable channel geometries has a high correlation for training and testing data. The same trend was also observed for the errors in terms of MSE and MAE.
The performance of the GEP model in comparison to the MLR model is shown in Table 12. Both models were analysed based on R 2 , NSE, MSE and MAE. In all parameter predictions, GEP models outperformed MLR models, in which the GEP model yields better R 2 and NSE and produces the least errors in terms of MSE and MAE. It can be concluded that the GEP model is a far superior model compared to the traditional MLR method.

Evaluation of the existing equation and comparison with the newly developed equation
The performance of the currently developed equation by using MLR and GEP was compared to the previous equation (Figures 4-6), and the results are shown in Tables 13 and 14. The channel width, depth and slope of the stable channel are estimated by using the existing equation. As the equations were presented in dimensionless form, calculations of the width and the depth were done by multiplying the equation with d 50. For the B/y equation, the width is calculated by multiplying the equation by the depth of the channel. However, the equation will be divided by the width to obtain the depth of the channel.
In the prediction of channel width (B), the majority of the previous equations had moderate R 2 values but with low or negative values in terms of NSE. This shows that several of the equations were underpredicted, which explains the higher observed values compared to the predicted data. Only Pitlick & Cress (2002), Lee & Julien (2007) and Haron et al. (2019) have a positive value for NSE. Lee & Julien (2007) has the highest NSE of all, which is 0.61. Bonakdari et al. (2020) have the highest values of R 2 in EPR and GEP, namely, 0.685 and 0.655; however, these lack accuracy in the prediction, as the NSE value transpired to be negative. The NSE for both equations is À0.06 and À0.54, respectively. In terms of the error, Julien & Wargadalam (1995) have the highest MSE and MAE values, which are 1,335.68 and 24.76, respectively. Lee & Julien (2007) have The bold values highlight the best model for every stable channel geometry prediction.   (10)   In the prediction of flow mean depth y, many of the existing equations have high R 2 and NSE values, indicating that most of the equations are able to predict the depth of the river with low error (MSE and MAE). Hey & Thorne (1986), Lee & Julien (2007) and Bonakdari et al. (2020) EPR were particularly excellent in predicting the depth of the river with the DR (0.8-1.2) 39.32, 37.6 and 52.99%, respectively. Hey & Thorne (1986) have the lowest MSE (0.122) and MAE (0.282) among all the existing equations. Conversely, Simons & Alberstone (1960) have the highest MSE (40.49) and MAE (6.311). The newly developed equation is able to predict with better accuracy, coupled with less error. The current study GEP (y) equation has the highest R 2 and NSE values (R 2 ¼ 0.960, NSE ¼ 0.960) followed by the current study MLR (y) equation ( Vol 23 No 6, 1260 In the slope prediction, many existing equations are unable to predict the slope accurately, and many of the existing equations had low R 2 and NSE values. Julien & Wargadalam (1995) and Lee & Julien (2007) equations were found to be excellent in predicting the slope with an extremely high R 2 value (1, 0.982) and an high NSE value, which are 1.0 and 0.861, respectively. The MSE and MAE for both equations are also fairly low, being 3.131 Â 10 À11 and 4.985 Â 10 À8 (MSE), and 3.8467 Â 10 À6 and 2.054 10 À4 (MAE). In all the equations, the EPR and GEP equations of Bonakdari et al. (2020) were found to have the highest error. This may be due to the sensitivity of the equations used and is, therefore, not suitable for studying Malaysian rivers. The newly developed slope equation using GEP and MLR is able to predict with high R 2 and NSE values (R 2 ¼ 1, NSE ¼ 1). The MSE for both equations is fairly low, namely, 1.773 Â 10 À12 and 6.6217 Â 10 À12 , respectively, and MAE is 1.092 Â 10 À6 and 5.766 Â 10 À6 , respectively. Both Julien & Wargadalam (1995) and the current study equation (MLR and GEP) with fewer variables are determined to be suitable to predict the stable channel slope.
The results from Table 15 show that both MLR and GEP equations can predict the width of the river precisely. The R 2 and NSE values are high for all the validation data except for the current study GEP (B/y) equation and for the Ariffin (2004) data, in which the values for R 2 and NSE are 0.880 and 0.543, respectively. The predicted data are highly correlated but lack accuracy. This could be due to the fact that the GEP equation developed is sensitive to the characteristics of the current river data. The data from the Ariffin (2004) study are generally based on small river data. Conversely, the present study MLR (B/y)   Table 16 shows that the developed MLR and GEP equations have an excellent degree of depth prediction accuracy based on the evidence of high R 2 and NSE values with low MSEs and MAEs. The current equation also outperformed the existing equations, as many of the existing equations yielded underpredicted values. However, the current study MLR (B/y) and current study GEP (B/y) equations yield low accuracy on the Saleh et al. (2017) data, whereby low R 2 and negative NSE values were observed. The model prediction is poorly correlated and inaccurate, as the predicted model performed worst. This could be due to the characteristics of the river that has a higher discharge compared to the current study data. This indicates that the equation is not suitable to apply to rivers with high discharge.
The slope validation prediction as shown in Table 17 revealed an equation from this study that is also able to predict the slope with high accuracy, particularly current study S (GEP), which yields better R 2 and NSE values and lower errors. This study also applies to the much-improved parameter that only consists of two parameters, namely, flow parameter and R/d 50 .
As shown in Table 18, the new equation B/y also performed well in the validation data. The current study GEP (B/y) and and current study MLR equations have high R 2 and NSE values in all the validation data with low MSEs and MAEs. This indicates that parameter B/y is suitable to represent the geometry of the river. In the case of restriction either in the width or depth of the river, this equation is fundamental in helping decision-makers to determine the suitable channel dimension.

Limitation of the proposed model
The current study focuses on developing a new equation for stable channel geometry that is suitable for the median size of bed material between 0.29 and 3.0 mm. Different key geomorphological variables such as bank profile and vegetation types need to be considered in future studies to enhance researchers' understanding of the effect of changes to the stable channel geometry. Applications of different machine learning techniques with higher accuracy can be conducted to increase the accuracy of the model prediction. For the B/y model prediction, a different variable that resembles the hydraulic characteristic of the river should be further explored to increase the efficiency of the equation, especially in rivers with a high volume of discharge.

CONCLUSION
The current study set out to develop a new equation that can improve the stable channel geometry prediction accuracy. This study also aims to establish an equation in the dimensionless form of B/y to provide an alternative solution for the width and depth prediction. Existing equations predict stable channel geometry with low accuracy, as many model predictions use the same parameter. This research has enhanced our knowledge in predicting stable channel geometry by associating the sediment transport parameters to predict stable channel geometry. The newly proposed equations by using MLR and GEP give a better prediction for stable channel hydraulic geometry with a far better R 2 , NSE and DR. Moreover, it was established that B/y is a good parameter that can be used to predict the geometry of the river. This result also suggests that the proposed MLR and GEP models are robust in predicting stable channel geometry, with lower MSEs and MAEs compared to the existing equations.   1.000 0.999 6.951 Â 10 À9 7.041 Â 10 À5 1.000 1.000 3.436 Â 10 À9 0.0000399 1.000 0.893 4.169 Â 10 À11 6.454 Â 10 À6 1.000 1.000 9.262 Â 10 À10 2.138 Â 10 À5 Current study GEP (S) 1.000 1.000 6.697 Â 10 À11 7.306 Â 10 À6 1.000 1.000 3.430 Â 10 À11 0.0000045 1.000 1.000 1.568 Â 10 À12 4.47 Â 10 À6 1.000 1.000 3.430 Â 10 À11 4.47 Â 10 À6 Journal of Hydroinformatics Vol 23 No 6, 1266