1 Introduction

The performance of reinforced concrete (RC) columns plays a crucial role in the overall performance of structural systems with their load transfer, stability and durability properties [1, 2]. Columns transfer the loads from beams and slabs to foundation in structures. Seismic failure modes of the RC columns are classified into three major types: shear, bond splitting and flexural modes. In flexural failure mode, the plastic deformation is localized in a small region namely the plastic hinge region after the yielding of the member. The performance of the plastic hinge region is critical for flexural members as it governs the load carrying and deformation capacities of the column. The length of the plastic hinge region is an important design parameter where intense confinement is used to increase the ductility of the member. Therefore, accurate estimation of the length of this region will contribute significantly to a more realistic prediction of the behavior of the structure. When a column is subjected to large earthquake loading, significant inelastic curvatures in the end regions of the column may occur. It is assumed that the inelastic curvature, which is distributed nonlinearly, remains constant over a specific length known as the plastic hinge length (\({L}_{p}\)). The \({L}_{p}\) is a critical parameter for seismic analysis and the design of RC structures [3,4,5]. Inaccurate prediction of the \({L}_{p}\) may result in inadequate seismic design, reducing the ability of the structure to withstand dynamic forces. The prediction of plastic hinge length \({L}_{p}\) of the RC columns in real-world scenarios has a significant effect on the design, evaluation, and strengthening of reinforced concrete columns. It enables cost-effective solutions and improves engineering techniques by optimizing the use of materials. The prediction models may assist in the development of building codes, encourage further structural engineering research and affect how structures are built in the future.

In general, there are many factors that can affect \({L}_{p}\). The factors affecting the \({L}_{p}\) in RC members subjected to monotonic loading could be different from those of columns under cyclic loading [6, 7]. Due to material uncertainties and complex interactions among constituent materials, extensive research has been conducted on the plastic hinge behavior of RC columns, employing both experimental and analytical approaches [8]. First empirical models in the literature for the \({L}_{p}\) were proposed by Baker [9], Baker and Amarakone [10], Corley [11] and Mattock [12], respectively. These models reveal that two crucial factors such as the shear span and the section depth have a primary impact on the \({L}_{p}\). Since these formulations were calibrated mainly based on experimental tests on beam elements, the fixed-end rotation contribution was not clearly evaluated. In formulations calibrated related to the RC column, this contribution is explicitly and prominently represented [13]. Paulay and Priestley [14] proposed an equation by considering the effect of the displacement due to the longitudinal reinforcement slipping out of the joint or foundation.

Besides, the use of different confinement models may significantly influence on the \({L}_{p}\). For example, Tanaka [15] is concerned with the effects of lateral confinement reinforcement on the ductile behavior of RC columns. Ho and Pam [16] examined the post-elastic behavior of low axially loaded high-strength RC columns with transverse reinforcement. Légeron and Paultre et al. [17] demonstrated that the increase in \({L}_{p}\) with the axial loads could also be observed for high-strength concrete columns. Ho [18] introduced the effect of the transverse and longitudinal reinforcement ratios, axial load level, compressive strength of concrete, yield strength of longitudinal bar, and cross-sectional depth on the \({L}_{p}\). In the parametric study by Ou et al. [3], it is examined the effects on the \({L}_{p}\) of circular RC columns of the longitudinal reinforcement ratio, the shear span-to-depth ratio, the axial force ratio, and the concrete compressive strength. This study, where the yield strengths of the longitudinal reinforcement are taken as 414 MPa and 685 MPa, shows that \({L}_{p}\) increases when the axial load, longitudinal reinforcement, and shear span-to-depth ratios increase. In addition, Ou et al. [3] have shown that as the concrete strength increases, the \({L}_{p}\) decreases for the 414 MPa case but increases for the 685 MPa case. Mortezaei [19] performed hundreds of elastic-time history analyses to predict plastic hinge length of RC columns under the combined effect of near-fault vertical and horizontal ground motions, by considering influence of various parameters such as axial load level, height-depth ratio, and strength of concrete. Geng and Zhou [20] presented a predictive model to evaluate the effective plastic hinge length for existing RC columns under lateral load reversals, using data from 24 test columns to evaluate the effects of slenderness ratio, axial load and transverse reinforcement. In the study of Barbagallo et al. [21], the most appropriate value for the \({L}_{p}\) was calibrated by comparing laboratory and numerical test results. Based on optimal \({L}_{p}\) values obtained from laboratory tests specifically to identify elements that accurately represent columns in existing RC buildings, a simple relationship depending on the volumetric ratio of the transverse reinforcement is proposed to determine \({L}_{p}\). In this study, it is emphasized that the optimum \({L}_{p}\) decreases with the increase in the volumetric ratio of the transverse reinforcement.

Studies have been carried out to determine the \({L}_{p}\) with machine learning algorithms, which have been widely used in recent years. Wakjira et al. [22] used various ensemble machine learning algorithms to propose a model. Feng et al. [23] used the adaptive boosting method based on an ensemble machine learning algorithm to predict the \({L}_{p}\) of RC columns. The performance of the model has been assessed using the tenfold cross-validation technique. Bayrak et al. [24] has extensively studied the use of the Levenberg–Marquardt Algorithm with artificial neural network (ANN) models to predict the \({L}_{p}\) of columns and shear walls. The findings revealed that the proposed ANN model represents a practical and valuable approach to predict the \({L}_{p}\) of both columns and shear walls.

Genetic Expression Programming (GEP), one of the machine learning approaches in evolutionary computing, combines the components of functional programming with genetic algorithms to produce extraordinary, efficient and comprehensive results. Unlike traditional statistical methods and some machine learning models, GEP develops mathematical expressions for a given data set, thanks to its flexible gene structure that adapts to variable lengths during the optimization process. The evolutionary nature of GEP enhances the model's ability to more effectively explore various nonlinear functions. This feature allows direct understanding of models by providing a clear and interpretable form in scientific research or engineering design. This is in contrast to many black box machine learning models, where the relationship between input features and output can be difficult to understand.

The aim of this study is to derive a new equation to predict plastic hinge length \({L}_{p}\) by GEP. As can be seen from the above expressions, major factors affecting on the \({L}_{p}\) are sectional depth, axial force level, aspect ratio, longitudinal reinforcement ratio, concrete compressive strength, and transverse reinforcement ratio. However, none of the proposed equations include all the above parameters. Therefore, it is aimed to predict a new equation by considering all the important parameters affecting \({L}_{p}\). In the prediction of \({L}_{p}\), 133 RC column specimens collected by Ning and Li [25] were taken into account. The performance of the GEP model has been demonstrated by comparing the experimental and the results of the proposed equations in the literature. Besides, sensitivity and parametric analysis were performed to examine the cumulative effect of each input parameter on the overall performance and predictive capacity of the proposed GEP models. This detailed analysis has contributed to a comprehensive understanding of the complex relationships between the input variables and the resulting model outputs.

2 Summary of experimental data

In this study, the experimental database of 133 RC rectangular column specimens was taken from the study of the Ning and Li [25] to predict formulation with the GEP model. Table 1 shows the experimental database and ranges of parameters considered in this study. As seen in Table 1, the upper and lower limits of the design variables comprise a wide enough range of values. These parameters are cross-sectional width of the column \(B\) (mm), cross-sectional depth of the column \(H\) (mm), shear span \(L\) (mm), compressive strength of concrete \({f}_{c}^{\mathrm{^{\prime}}}\) (MPa), the diameter of longitudinal bars \({d}_{b}\) (mm), longitudinal reinforcement ratio \({\rho }_{s}\), yield strength of the longitudinal reinforcement \({f}_{y}\) (MPa), spacing of transverse reinforcement \(s\) (mm), transverse reinforcement ratio \({\rho }_{v}\), yield strength of transverse reinforcement bars \({f}_{yv}\)(MPa), axial load ratio \(P/{P}_{o}\), and aspect ratio \(L/H\), respectively.

Table 1 Experimental database and ranges of parameters [25]

Also, Fig. 1 shows frequency histograms of each input parameter taken into account in the GEP model.

Fig. 1
figure 1figure 1

Frequency histograms of each input and output parameter in the GEP model

In addition, Fig. 2 shows the relationship between each input parameter and the experimental plastic hinge lengths (\({L}_{p,exp.})\) of the columns included in the experimental database.

Fig. 2
figure 2figure 2

Relationship between each input parameter and the \({L}_{p,exp.}\) of the RC columns included in the experimental database

3 Existing models for plastic hinge length of the RC columns

In the literature, many researchers have proposed different formulations in prediction of the \({L}_{p}\). However, substantial disparities exist among these various formulations, and there has been no systematic assessment of their accuracy in predicting of the \({L}_{p}\) [26]. A summary of the most frequently used formulas in the literature regarding the \({L}_{p}\) of the columns is shown in Table 2. As seen in, the number of the input parameters which are included in the proposed equations shows differences.

Table 2 Proposed models for Lp of RC columns

The form of the commonly used equation for the \({L}_{p}\) of RC columns is given below [7, 13].

$$L_{p} = \alpha L + \beta \left( {f_{y} d_{b} } \right)$$
(1)

In the two-term relationship of the \({L}_{p}\), the first term includes the effects of bending moment distribution along the \(L\), whereas the second term considers the yield penetration of the longitudinal reinforcement into the column base or the joint [7]. \(\alpha\) and \(\beta\) in Eq. 1 are model coefficients proposed differently by various researchers. The models proposed by researchers, shown in Table 2, are discussed in the following sections.

Priestley and Park [36] proposed an equation based on the \(L\) and the \({d}_{b}\). The values of \(\alpha =0.08\) and \(\beta =6/{f}_{y}\) in the form of the Eq. 1 represent the model coefficients, respectively. The relationship proposed by Paulay and Priestley was obtained by taking the model coefficients \(\alpha =0.08\mathrm\,{ and }\,\beta =0.022\) in the form of Eq. 1. Due to their experiments on columns subjected to high axial loads, Sheikh and Khoury [28] introduced an equation predicting that \({L}_{p}\) is approximately equivalent to the \(H\) of the column, without accounting for the influence of any other parameters. Priestley et al. [37] proposed the same equation derived by Paulay and Priestley [14] for the \({L}_{p}\) and limited the result of the equation to \(0.044\,{f}_{y}{d}_{b}\). Panagiotakos and Fardis [38] proposed a equation to predict the \({L}_{p}\) for RC columns subjected to cyclic loads. This equation includes model coefficients of \(\alpha =0.12\) and \(\beta =0.014{\alpha }_{sl}\), with the constant \({\alpha }_{sl}\) representing the fixed-end rotation due to the longitudinal bar slippage. If the slippage of longitudinal reinforcement bars from its anchorage zone is 1, otherwise \({\alpha }_{sl}\,\mathrm{ is }\,0\). Ho [18] derived an equation for the\({L}_{p}\), taking into account the parameters of the \(H\), \({f}_{c}^{\mathrm{^{\prime}}}\), \({f}_{y}\), \({\rho }_{s}\), \({\rho }_{v}\), and \(P/{P}_{o}\). Lu et al. [1] took into account the form of the equation proposed by Priestley and Park [36], by taking the model coefficients of \(\alpha =0.077\) and \(\beta =8.16/{f}_{y}\), respectively. In EN 1998–3 EC8 [39], \({L}_{p}\) was given for members with detailing for earthquake resistance and no lapping of longitudinal bars near the section where yielding is expected, taking into account the parameters of the \(L\), \(H\), \({f}_{c}^{\mathrm{^{\prime}}}\), \({d}_{b}\), and \({f}_{y}\). In the form of Eq. (1), the model coefficients \(\alpha =1/30\) and \(\beta =0.11/\sqrt{{f}_{c}^{\mathrm{^{\prime}}}}\) are taken, respectively. Berry and Eberhard [40] proposed an equation for the \({L}_{p}\) and limited the result of the equation to \(0.25L\). Thus, the model coefficients calibrated with experimental results correspond to \(\alpha =0.05\) and \(\beta =0.1/\sqrt{{f}_{c}^{\mathrm{^{\prime}}}}\), respectively. Berry et al. [41] revised the same equation by taking the model coefficients of \(\alpha =0.0375\) and \(\beta =0.12/\sqrt{{f}_{c}^{\mathrm{^{\prime}}}}\), respectively. Bae and Bayrak [26] demonstrated that the \(P/{P}_{o}\), \(L,\) \(H\) and \({\rho }_{s}\) are the main parameters in the proposed equation using the least squares method for \({L}_{p}\). It has been observed that the \({L}_{p}\) is equal to approximately \(0.25H\) at low axial loads and increases at higher axial load levels. It was also observed that for a given \(P/{P}_{o}\), the \({L}_{p}\) increased as the \(L/H\) increased. Biskinis and Fardis [42] proposed the Lp as a function of the \(H\) and \(L/H\). Ning and Li [25] used a method to determine the \({L}_{p}\) by a probabilistic approach. They proposed an equation for the \({L}_{p}\) by taking into account the \(P/{P}_{o}\), \(L,H\), and the \({d}_{b}\).

4 Gene expression programming

GEP which combines genetic programming (GP) and genetic algorithm (GA) is a new technique in which individuals are encoded as fixed-length linear sequences expressed as nonlinear entities of different sizes and shapes [43]. Gene Expression Programming (GEP) algorithm consists of five components: function set, terminal set, fitness function, control parameters and termination condition. To summarize, the function set includes various mathematical operations and functions. The terminal set consists of variables, constants, and other basic elements that construct expressions on chromosomes. The fitness function plays an especially key role in evaluation by measuring the difference between the output and the desired output. Control parameters include factors such as population size, mutation rate, transmission rate, which affect the manipulation of genetic material. The termination condition determines when its evolutionary process should stop. GEP may define complicated mathematical expressions that are tough to see in the data since it combines functions and terminals. It assumes that a suitable mathematical expression can be used to model the data and that it precisely represents how the gene functions on unobserved data [43,44,45].

The flowchart shown in Fig. 3a presents the main steps of GEP. The process begins with the creation of the chromosomes of the initial population and then, the fitness of each individual is evaluated. Individuals are then selected based on their fitness for reproduction. The process is repeated for a certain number of generations or until a proper solution is found [43, 46]. All of the genes in GEP can code expression trees (ETs) with different shapes and sizes. Expression trees are put together by using the linking functions. The ET illustration of a chromosome is shown in Fig. 3b. The algebraic explanation (\(\sqrt[3]{\left(a-b\right)*\sqrt{(c+d)}}\) and the Karva language (\(3Rt.*.-.Sqrt.a.b.+.c.d\)) of the ET are also seen in the same figure. The Karva language is formed by writing from left to right and top to bottom along the top line of the ET [43].

Fig. 3
figure 3

Flowchart of a gene expression algorithm and expression tree of a chromosome [43]

In addition, GEP uses chromosomes composed of genes structurally organized in a head and a tail. The head contains constant, parameters, functions, and mathematical operators such as {1, a, b, Sqrt,3Rt, tan, + , − , *, /}. The tail contains only terminals: constant and parameters such as {1, a, b}. In GEP applications such as symbolic regression or function finding, depending on whether the error chosen is absolute or relative, the fitness fi of an individual program i is expressed by Eqs. (2)–(3).

$$f_{i} = \mathop \sum \limits_{{j = 1}}^{{C_{t} }} \left( {M - \left| {C_{{\left( {i,j} \right)}} - T_{{\left( j \right)}} } \right|} \right)~$$
(2)
$$f_{i} = \mathop \sum \limits_{j = 1}^{{C_{t} }} \left( {M - \left| {\frac{{C_{{\left( {i,j} \right)}} - T_{\left( j \right)} }}{{T_{\left( j \right)} }}.100} \right|} \right)$$
(3)

where M is the range of selection, C(i,j) the value returned by the individual chromosome i for fitness case j, and Tj is the target value for fitness case j. Note that for a perfect fit C(I,j) = Tj and fi = fmax = Ct M [43].

4.1 Gene expression model and parameter selection

Most of the empirical models in the literature have been derived based on a limited number of experimental results. Experimental database was utilized in GeneXproTools 5.0 software to develop the GEP model in this study [47]. The program was run until the finest model was obtained. The experimental database in Table 1 is taken into account in the prediction of the \({L}_{p}\). The summary of the parameters used in GEP model is given in Table 3. In training and validation, \({B, H, L,f}_{c}^{\prime}\), \({d}_{b}\), \({\rho }_{s}\), \({f}_{y}\), \(s, { \rho }_{v}, {f}_{yv}, P/{P}_{o}\,\mathrm{ and} \,L/H\) are selected as input parameters, while \({L}_{p}\) is used as output (target) parameter. The database parameters were randomly separated into two sets as training and validation. Thus, 85% of the whole experimental database (113 data) was used as training data whereas the remaining 15% (20 data) were employed as validation data for verifying the validity of the GEP model. The root mean square error (RMSE) constitutes a prevalent metric employed to quantify the disparities between predicted and target values. Root mean square error (RMSE) was considered as a fitness function in the GEP model.

Table 3 Parameters used in GEP model

In Genetic Expression Programming (GEP) models, sub-expression trees (Sub-ETs) contribute to the overall functionality of the model by providing a structured way to represent and process genetic information. In other words, it represents the basic building blocks of the model and captures the terminal input parameters (di for i = 0, 1, 2, …). These parameters represent variables or features that the model uses to make predictions or perform calculations. Constants (c0, c1, …) add constant values to expression trees, providing stability and diversity to genetic material. Mathematical operations such as addition, subtraction, multiplication, and division involve mathematical or logical functions that operate on genetic information, introducing complexity and variability. These functions enable the model to perform complex calculations and transformations on input parameters and constants. In this study, all models consist of four different sub-expression trees (Sub-ETs) are linked by addition function. For the derivation of the GEP model, input parameters di and the constants (ci) have been parsed into expression trees as shown in Fig. 4,

Fig. 4
figure 4

Expression trees of GEP model for \({L}_{p}\)

In the presented ETs, d0 represents \(B\) (mm), d1 is \(H\) (mm), d2 is \(L\) (mm), d3 is \({f}_{c}^{\prime}\) (MPa), d4 is \({d}_{b}\) (mm), d5 is \({\rho }_{s}\), d6 is \({f}_{y} \left({\text{MPa}}\right),\) d7 is \(s\) (mm), d8 is \({\rho }_{v}\), d9 is \({f}_{yv}\)(MPa), d10 is \(P/{P}_{o}\), and d11 is \(L/H\), respectively. The GEP-based formulation of the \({L}_{p}\) of RC columns in terms of \({B,H,L,f}_{c}^{\prime}\), \({d}_{b}\), \({\rho }_{s}\), \({f}_{y}\), \(s, {\rho }_{v}, {f}_{yv}, P/{P}_{o},L/H\) is presented in Eq. 4. The model proposed in this study is valid for columns where the bending failure occurs, cross-sectional dimensions greater than 14 cm, and \(L/H\) ratio greater than 1.8.

$$\begin{aligned} L_{p} = &\, \;f_{c}^{\prime } - d_{b} + \frac{{\left( {1083.37 - L} \right)\sqrt {\rho _{s} .s} }}{{\left( {H - f_{{yv}} - 0.645s} \right)\tan f_{c}^{\prime } }} \\ & + \frac{L}{{d_{b} }} - \frac{{\left( {58.95 - \frac{P}{{P_{o} }}} \right)}}{{\rho _{v} \left( {B - f_{c}^{\prime } } \right)}} - \left( {\frac{{f_{y} }}{B} + 11.26} \right)\left( {\frac{L}{H} - {\text{d}}_{b} } \right) \\ & - \tan \left\{ {\frac{{4.63}}{{\sqrt H }}\left( {\frac{P}{{P_{o} }} + f_{{yv}} + 49.51} \right)} \right\} - \tan \left( {\rho _{v} - s} \right) + 39.07 \\ \end{aligned}$$
(4)

4.2 The performance of the GEP-based formulation

In order to evaluate the prediction performance of the GEP model, the results of the experimental and GEP model for \({L}_{p}\) were compared in Fig. 5. As seen in Fig. 5, the fluctuation in the GEP model and experimental results was compatible with each other. The \({R}^{2}\) values for the training and validation database were obtained as 0.76 and 0.78, respectively. Therefore, it can be said that the results of the GEP model were satisfactory and close to those of the experiments.

Fig. 5
figure 5

Comparison of the performance of the GEP model and experimental results for training and validation sets

5 Performance evaluation of existing models

Table 4 shows the statistical criteria to evaluate the performance of existing models. These include mean (M), standard deviation (SD), the root mean square error (RMSE), mean absolute percentage error (MAPE), coefficient of variation (COV) and coefficient of determination (R2). The description and mathematical formulation of each indicator are represented as follows:

$$M = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \frac{{{\text{model}}}}{\exp }}}{n}$$
(5)
$${\text{SD}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\frac{{{\text{model}}}}{{{\text{exp}}}} - M} \right)^{2} }}{n - 1}}$$
(6)
$${\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\exp - {\text{model}}} \right)^{2} }}{n}}$$
(7)
$${\text{MAPE}} = \frac{100}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\frac{{{\text{exp}} - {\text{model}}}}{{{\text{exp}}}}} \right|$$
(8)
$${\text{COV}} = \frac{{{\text{SD}}}}{{\text{M}}}$$
(9)
$$R^{2} = \frac{{\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {\exp - \overline{\exp } } \right)\left( {{\text{model}} - \overline{{{\text{model}}}} } \right)} \right]^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\exp - \overline{\exp } } \right)^{2} \mathop \sum \nolimits_{i = 1}^{n} \left( {{\text{model}} - \overline{{{\text{model}}}} } \right)^{2} }}$$
(10)

where \(n\) shows the number of specimens, \({\text{exp}}\) and \(\overline{{\text{exp}} }\) show the actual output and the mean of the real output values, respectively. The \({\text{model}}\) and \(\overline{{\text{model}} }\) also show the predicted output and the mean of the predicted outputs, respectively.

Table 4 Comparison of the statistical parameters for the \({L}_{p}\)

In Table 4 and Fig. 6, the results of the Eq. (4) derived by the GEP model were compared with those of existing models. Table 4 and Fig. 6 show that MAPE, RMSE and COV values of the GEP model are equal to 19.67, 60.17 and 0.27, respectively, and the R2 value is equal to 0.76 for all data. As seen in, compared to other existing models, the GEP model has the lowest MAPE, RMSE and COV values, on the contrary, the highest R2. This confirms that the developed GEP model can predict the \({L}_{p}\) of RC column with good accuracy. Among the existing models, Ning and Li [25] model provided the least RMSE and COV, and Ho [18] model has the highest R2, as seen in Table 4.

Fig. 6
figure 6

Statistical comparison between GEP and other proposed models for \({L}_{p}\)

Thus, it can be said that Ning and Li [25] and Ho [18] models outperformed the other models in predicting the \({L}_{p}\). Also, Biskinis and Fardis [42] model have the highest MAPE and RMSE. Berry et al. [41] model has the highest COV and the least R2 as shown in Fig. 6 and Table 4. Thus, Table 4 and Fig. 6 show that the Biskinis and Fardis [42] and Berry et al. [41] models give the lowest performance compared to the existing models.

In addition, the comparisons of the proposed models versus experimental \({L}_{p}\) were plotted in Fig. 7. As can be seen, there is a considerable discrepancy between the experimental data and the results of the models which are proposed by researchers. An evaluation was made by considering the R2 value, which is one of the statistical criteria. The highest values of R2 demonstrated that proposed and experimental \({L}_{p}\) are close and have a good correlation. Among the existing models, Ho [18] model provided the highest R2 of 0.44, followed by Ning and Li [25] model R2 of 0.39 as shown in Fig. 7. Ho [18] showed the highest agreement with experiment results.

Fig. 7
figure 7figure 7figure 7

Comparisons of the proposed models versus experimental \({L}_{p}\)

Conversely, as seen in Fig. 7, the Berry et al. [41] model shows the worst performance among all existing models with the value 0.1 of R2. As shown in Fig. 7, the GEP model in this study demonstrates the highest accuracy between all existing models with the 0.76 value of R2 for all database.

Besides, the performance of the proposed GEP model was assessed by plotting the ratio of the experimental input data to the (\({L}_{p,exp.}/{L}_{p,prop.})\) for each parameter (Fig. 8). The trend line reflects accurate and consistent predictions for all parameters.

Fig. 8
figure 8figure 8

Variation of the (\({L}_{p, {\text{exp}}.}/{L}_{p,{\text{prop}}.})\) with experimental input data

6 Sensitivity and parametric analyses

A parametric analysis was also conducted in conjunction with sensitivity analysis. The predicted values of \({L}_{p}\) obtained by GEP-based models as a function of each parameter to find the effect of each input parameter on the output parameter are presented in Fig. 9. The changes in \({L}_{p}\) were noted against the change in the value of each input parameter from maximum to minimum, and the rest of all input parameters are kept at the mean value [48]. As shown in Fig. 9, the \({L}_{p}\) constantly increase with the increase in \({L,f}_{c}^{\prime}\), \({\rho }_{s }\) and \({f}_{y}\), unlike decrease with increase in \(L/H\) and \(P/{P}_{o}\). As \(B\), \(H\) and \({\rho }_{v}\) increase, the \({L}_{p}\) results in an increment till a certain point, and then, it starts decreasing. Conversely, with the increasing of \({d}_{b}\), \(s \,{\text{and}}\, {f}_{yv}\), \({L}_{p}\) tends to decrease first and then increases.

Fig. 9
figure 9

Parametric analysis of inputs

A sensitivity analysis was carried out by Eqs. (11) and (12) to determine the relative contribution of the parameters on the \({L}_{p}\).

$$N_{i} = f_{\max } \left( {x_{j} } \right) - f_{\min } \left( {x_{j} } \right)$$
(11)
$${\text{SA}} = \frac{{N_{j} }}{{\mathop \sum \nolimits_{n}^{j = 1} N_{j} }}$$
(12)

where \({f}_{max}\left({x}_{j}\right)\) and \({f}_{min}\left({x}_{j}\right)\) represent the maximum and minimum predicted outputs according to the \(i\) th input domain, while other input parameters are kept constant their mean values. Sensitivity analysis is presented in Fig. 10. As can be seen in Fig. 10, the results of the sensitivity analysis revealed that \({f}_{c}^{\prime}\), \({d}_{b}\), \(L\), \(L/H\), \({\rho }_{v}\) and \(H\) are the most important parameters on the \({L}_{p}\) with a relative contribution of 23.05%, 18.56%, 18.52%, 14.2%, 10.43%, and 8.02%, respectively, while the parameter \({\rho }_{s}\) shows the least contribution on the \({L}_{p}\), with the percentage of the impact of 0.17%. The relative contributions of the rest parameters are shown in Fig. 10.

Fig. 10
figure 10

Relative contribution of each input parameter on the proposed GEP model

The parameters \(s\), \({f}_{yv}, P/{P}_{o}, { f}_{y}\) and \(B\) contribute about 1.88%, 1.82%, 1.55%, 1.01% and 0.8%, respectively.

7 Conclusions

Due to the large differences among existing equations in the literature, an equation for the plastic hinge length (\({L}_{p}\)) of the RC columns using genetic expression programming (GEP) was proposed in this study. The following conclusions can be drawn from the present study:

When comparing the results of GEP model against the experimental results of the \({L}_{p}\), it has been concluded that \({R}^{2}\) values for the training and validation database are 0.76 and 0.78, respectively. This comparison demonstrates that the GEP model has sufficient sensitivity in predicting \({L}_{p}\).

Compared with the existing models in the literature, the GEP model exhibited MAPE, RMSE, and COV values of 19.67, 60.17, and 0.27, respectively. These results confirmed that the GEP model provided the best performance among existing models.

The results of the sensitivity and parameter analysis revealed that the compressive strength of concrete, diameter of longitudinal bars, shear span, aspect ratio, transverse reinforcement ratio and cross-sectional depth of the column are the most efficient parameters on the \({L}_{p}\), respectively, while the longitudinal reinforcement ratio is parameter with the least effect.

As a result, it can be said that GEP has more effective and reliable results compared to traditional regression analysis, so it can be used safely in the derivation of relations related to many problems in civil engineering in the future.

The GEP-based model proposed in this study and the obtained results are only valid for the database provided from the literature. In order to achieve more sensitive results, it is recommended to derive new equations for the \({L}_{p}\) by increasing the number of experiments and input parameters.