Prediction of RC T-Beams Shear Strength Based on Machine Learning

The contribution of shear resisted by flanges of T-beams is usually ignored in the shear design models even though it was proven by many experimental studies that the shear strength of T-beams is higher than that of equivalent rectangular cross-sections. Ignoring such a contribution result in a very conservative and uneconomical design. Therefore, the aim of this research is to investigate the capability of machine learning (ML) techniques to predict the shear capacity of reinforced concrete T-beams (RCTBs) by incorporating the contribution of the flange. Five machine learning (ML) techniques, which are the Decision Tree (DT), Random Forest (RF), Gradient Boosting Regression Tree (GBRT), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost), are trained and tested using 360 sets of data collected from experimental studies. Among the various machine learning models evaluated, the XGBoost model demonstrated exceptional reliability and precision, achieving an R-squared value of 99.10%. The SHapley Additive exPlanations (SHAP) approach is utilized to identify the most influential input features affecting the predicted shear capacity of RCTBs. The SHAP results indicate that the shear span-to-depth ratio (a/d) has the most significant effect on the shear capacity of RCTBs, followed by the ratio of shear reinforcement multiplied by the yield strength of shear reinforcement ( ρ v f yv ), flange thickness ( h f ), and flange width ( b f ). The accuracy of the XGBoost model in predicting the shear capacity of RCTBs is compared with established codes of practice (ACI 318-19, BS 8110-1:1997, EN 1992-1-2, CSA23.3-04) and existing formulas from researchers. This comparison reinforces the superior reliability and accuracy of the machine learning approach compared to traditional methods. Furthermore, a user-friendly interface platform is developed, effectively simplifying the implementation of the proposed machine-learning model. The reliability analysis is performed to determine the value of the resistance reduction factor (φ) that will achieve a target reliability index ( β T = 3.5).


Introduction
RC beams with T-cross sections are utilized in a variety of construction applications, including bridge decks, building floors/slabs, retaining walls, and parking garages (Ayensa et al., 2019;Ramadan et al., 2022;Ribas González et al., 2017).The advantage of T-sections lies in the fact that a portion of the slab acts integrally with the beam, bending along with it under the loads to effectively resist straining actions.On the contrary, the shear resistance of RCTBs is typically assessed solely based on the area of the beam web.The contribution of the flanges to the shear strength is commonly overlooked in shear design models (Zararis IP et al., 2006).Nevertheless, recent studies have demonstrated a notable contribution of the flanges to the shear strength of T-beams (Amna & Monstaser, 2019;Ayensa et al., 2019;Bresler & MacGregor, 1967;Elgohary et al., 2019;Giaccio et al., 2011;Hawileh et al., 2022;Mhanna et al., 2020Mhanna et al., , 2021aMhanna et al., , 2021b;;Ramadan et al., 2022;Samad et al., 2016, Sarsam et al., 2018;Swamy et al., 1974;Thamrin et al., 2016).
In the study by Ramadan et al., (2022), the authors conducted a study on the shear behavior of T-beams, highlighting the impact of the beam flange area on shear strength.The results of the experiment showed that T-beams had a higher shear strength compared to rectangular beams with the same web size.Furthermore, the research discovered that elevating the ratio of flange thickness-to-beam depth from 0.3 to 0.5 within the examined range of variables caused an increase in shear strength of up to 54%.Likewise, increasing the ratio of flange width-to-web width from 3 to 5 led to an escalation in shear strength of up to 19%.Ayensa et al., (2019) displayed that the significance of the flange can result in considerable cost savings during the construction of new structures.It can also be a critical factor when assessing the shear strength of existing systems.Test results of a study by Bresler and MacGregor, (1967) concluded that the geometry of the beam, particularly for I and T-sections, has a significant impact on the shear strength and behavior due to the varying magnitudes of shearing stress developed in the web.This variation also affects the propagation of diagonal cracking.Also, Amna et al., (2019) presented an experimental and numerical model to investigate the shear behavior from using lightweight concrete beams.The research findings indicated a notable rise in the ultimate load as the shear span to depth ratio decreased.Conversely, there was an inverse relationship between the failure load and the shear span to depth ratio.Sarsam et al., (2018) conducted a literature review to investigate how flanges impact the shear capacity of RCTBs.Based on their analysis, they developed an equation to predict the shear capacity of RCTBs.Swamy and Qureshi, (1974) developed a procedure for determining the maximum shear strength of the compression zone in T-beams.This procedure was created based on Mohr's theory of failure and the biaxial stress criteria concept.However, this strategy requires a series of calculations and is complicated to implement into practice.The shear mechanism of RCTBs with simply supported subjected to a 4-point bending test was examined by Samad et al., (2016).The study encompassed similar variables and parameters, such as a shear span-to-effective depth ratio (a/d) of 3.5 and a longitudinal reinforcement of ρ = 2.15%.Results indicate significant differences between EC2 and ACI318-08 design codes when calculating the shear capacity V n , and concrete shear resistance V c of T-beams.In their study, Thamrin et al., (2016) found that the shear capacity of the T-beams tested was notably influenced by both the ratio of longitudinal reinforcement, and the size of the flange.Where the shear capacity of T-beams is generally reported to be 5-20% higher than that of beams with a rectangular cross-section, and current codes are using conservative empirical equations to calculate the shear capacity of beams.Elgohary et al., (2019) presented an experimental test of RCTBs without stirrups, where the flange width-to-web width ratio was varied.The test results indicated that the shear strength was improved by 10-30% owing to the involvement of the flange.Furthermore, some theoretical models, as demonstrated in the references (Bairan Garcia & Mari Bernat, 2006;Cladera et al., 2015;Kotsovos et al., 1987;Moayer & Regan, 1974;Tureyen & Frosch, 2003;Wolf & Frosch, 2007;Zararis IP et al., 2006), have considered the contribution of flanges to shear strength.Based on prior literature, it is evident that existing design formulas cannot accurately forecast the shear capacity of flanged sections, primarily due to the intricate nature of shear failure in T-sections.This underscores the growing necessity for more accurate prediction methodologies that account for the augmented shear capacity resulting from the inclusion of flanges in T-sections.
Modern machine learning (ML) algorithms present a promising solution to address these concerns, as they excel at handling complex problems with multiple variables without relying on assumptions.In the past decade, data-driven approaches including various ML techniques have had an increasing presence and importance in a wide range of civil engineering applications, including geotechnical (T.Q. Huynh et al., 2022;Vadyala et al., 2022), structural (Le-Nguyen et al., 2022;H. D. Nguyen et al., 2021aH. D. Nguyen et al., , 2021bH. D. Nguyen et al., , 2022;;Shahin et al., 2023), material perspectives (J.Schmidt et al., 2019;J. Wei et al., 2019;Nguyen-Sy et al., 2020;Wang et al., 2020), etc.Moreover, ML techniques have demonstrated significant success in estimating the shear strength of RC beams.Several studies have utilized ML methods to predict the shear strength of RC beams, yielding promising results.For example, Goh (Goh, 1995) was among the pioneers in utilizing Artificial Neural Networks (ANNs) to predict the ultimate shear strength of deep beams.Similarly, Zhang et al., (2022) applied random forest (RF) in conjunction with an optimization technique to predict the RC beams shear strength, including the stirrup's effect on beam strength.Cladera and Mari, (2004) and Jung and Kim, (2008) utilized experimental data to train and test an artificial neural network (ANN)-based model for estimating the shear strength of RC beams.Additionally, Ashour et al., (2003) and Alshboul et al., (2022) introduced a gene expression (GEP)-based model designed to calculate the shear strength of RC beams.Also, a variety of ML algorithms have been employed to predict the shear strength of steel fiber-reinforced concrete and ultra-high-performance fiber-reinforced concrete beams.Support Vector Machine (SVM), Artificial Neural Networks (ANN), k-Nearest Neighbor (k-NN), Gene Expression Programming (GEP), Decision Tree (DT), Random Forest (RF), XGBoost, AdaBoost, Gradient Boosting Regression Tree (GBRT), and CatBoost models (Abambres & Lantsoght, 2019;Chaabene & Nehdi, 2020;Jiang & Liang, 2021;Keshtegar et al., 2019;Rahman et al., 2021;Sarveghadi et al., 2019;Shahnewaz & Alam, 2020;Shatnawi et al., 2022;Solhmirzaei et al., 2020;Xiangyong Ni & Kangkang Duan, 2022;Yaseen et al., 2018;Ye et al., 2023) have been extensively utilized for this purpose.These models offer diverse approaches to accurately predict the shear strength of RC beams, contributing to advancements in structural engineering research and design.
While machine learning (ML) proves highly beneficial in enhancing metamodels and enjoys widespread popularity among researchers, its implementation in structural engineering applications presents two significant challenges.The primary hurdle lies in the crucial initial step of converting input features into numerical data suitable for the algorithm's consumption.Meanwhile, the secondary challenge involves determining the optimal technique to address the prediction problem (Keshtegar et al., 2019).The current study investigated five prominent ML techniques, which are DT, RF, GBRT, LightGBM, and XGBoost.The primary objective was to ascertain the most suitable technique for predicting the shear capacity of RCTBs.DT algorithm provide inherent interpretability, enabling visualization of decision-making processes and identification of key influential features impacting predicted shear capacity (Sutton, 2005).By strategically combining multiple DT, RF algorithm achieve enhanced prediction accuracy compared to individual DT.This ensemble learning approach reduces prediction errors and fosters the development of more generalizable models that are less susceptible to overfitting the training data (Breiman L, 2001).GBRT algorithm exhibit remarkable proficiency in managing complex relationships between input features and the target variable (such as shear capacity).Their adeptness enables effective capture of non-linear effects within the data.Moreover, GBRTs often achieve high accuracy across diverse datasets (Jerome H. Friedman, 2001).LightGBM (Ke G et al., 2017) and XGBoost (Chen & Guestrin, 2016) algorithms are efficiency in computational speed and memory usage, making them ideal for handling large datasets common in structural engineering.They integrate robust regularization techniques to prevent overfitting, enhancing model generalizability to new data.Additionally, both algorithms can handle missing values and categorical data.

Research significance
Despite the promising outcomes demonstrated by ML algorithms in diverse domains, there is a notable lack of literature focused on their application in estimating the shear capacity of RCTBs.This highlights the need for further research and improvement of ML techniques specifically within this area of study.Furthermore, from the previous literature, it can be determined that current design formulas cannot accurately predict the shear capacity of RCTBs, due to the complex nature of shear failure in T-sections.Hence, this study explores the potential of five ML algorithms-Decision Trees (DT), Random Forest (RF), Gradient Boosted Regression Trees (GBRT), LightGBM, and XGBoost-to predict the shear capacity of RCTBs.The models are trained and tested using experimental data consisting of 360 data points.The SHAP method was utilized to determine the significance of input features and illustrate their positive or negative impacts on the predicted outcomes.Furthermore, sensitivity analysis is conducted to assess the ML models' capacity to capture the influences of geometric and material parameters on their predictions.The developed ML model is compared against established codes of practice and existing empirical formulas.Furthermore, a user-friendly interface platform is developed, effectively simplifying the implementation of the proposed machine-learning model.Reliability analysis is performed to determine the value of the resistance reduction factor (φ) that will achieve the specified target reliability index.

Database Preparation and Description of Parameters
A literature survey was conducted to gather experimental data on the shear capacity of RCTBs with and without shear reinforcement.The survey yielded 360 experimental results from the reviewed literature, as illustrated in Table 1.The input parameters that affect the shear capacity of RCTBs are web width of beam ( b w ), effective depth of beam (d) (the distance between the steel rebar and the compression fiber's edge), flange width ( b f ), flange thick- ness ( h f ), compressive strength of concrete ( f ′ c ) (conventional normal concrete), shear span-to-depth ratio (a/d), longitudinal steel ratio (ρ), and ratio of shear reinforcement multiplied by the yield strength of shear reinforcement ( ρ v f yv ).The output variable is maximum shear capacity ( V exp ).Moreover, Table 2 displays the descriptive statistics for every parameter contained within the dataset, where descriptive statistics refer to the numerical  values that can be used to describe a dataset, such as its measures of central tendency (mean) and variability (standard deviation).
Figure 1 displays a histogram of the data distribution and features.It displays the frequency of both input and output features within a specific range.The outcomes indicate that the web width ( b w ) spans from 50 to 200 mm, and the effective depth (d) of the beams varies between 200 and 400 mm, with a notable concentration around 300 mm.Additionally, the flange width ( b f ) lies within 200-600 mm, accompanied by a flange thickness ( h f ) around 85 mm.The a/d ratio predominantly cent- ers on a value of 3. The concrete compressive strength stands approximately at 30 MPa, while the longitudinal steel ratio (ρ) hovers around 0.025%.Moreover, the product of the shear reinforcement ratio and the yield strength of shear reinforcement ( ρ v f yv ) approximates 1 MPa.The maximum shear capacity ( V exp ) of the beam encompasses a range of 10-400 kN, with a primary peak around 150 kN.Finally, the histogram highlights the need for future experimental programs to address the gaps in the dataset.

Pearson Correlation Analysis
The Pearson correlation coefficient (PCC) is a statistical metric used in sensitivity analysis to measure of the significance of a linear relationship between two variables It is also known as Pearson's r or the Pearson coefficient for product-moment correlation (PPMCC) (Schober & Schwarte, 2018).The PCC has a value between − 1 and + 1, with 1 representing a perfect positive linear relationship between the variables, − 1 denotes a perfect negative linear relationship, while 0 denotes no linear relationship.In addition, the PCC evaluates the strength and directionality of a relationship between two variables.Figure 2 shows the results of the Pearson The degree of correlation between b w and b f reaches 0.75, while the degree of correlation between d and h f reaches 0.70, which indicated that features have a strong relationship.Among all the input features, it appears that the effective depth of the beam (d) has the greatest effect on the output, with a correlation coefficient of 0.63.Furthermore, it can be observed that the flange dimensions (thickness ( h f ) and width ( b f )) positively affect the shear capacity of RCTBs, with correlation coefficients of 0.57 and 0.50, respectively.This highlights the significance of the flange in predicting the shear capacity of RCTBs.These results indicate that variations in these variables may have a significant impact on the output, while changes in the other input parameters may have a comparatively smaller effect.

Overview of Machine Learning Models
The utilization of machine learning has gained prominence within the field of structural engineering, enabling the analysis and prediction of complex structural behaviors with improved accuracy and efficiency.Machine learning algorithms can identify patterns and relationships within the data, aiding in the modeling and simulation of structural systems.These algorithms can be applied to various aspects of structural engineering, including structural optimization, damage detection, and structural health monitoring.For instance, researchers have successfully utilized machine learning techniques to predict the behavior of structures under different loading conditions, optimize structural design parameters, and detect and classify structural damage.Figure 3 demonstrates a standard ML workflow employed in predictive modeling.Using a learning algorithm and an initial dataset, computer systems can be trained to continually improve and learn until they reach the desired performance level.Therefore, the ML model accuracy is dependent on the nature and characteristics of the initial data, as well as the effectiveness and efficiency of the learning algorithm employed.

ML Algorithms
The current study investigated five ML algorithms commonly used in the field of structural engineering (Thai, 2022).As these algorithms have been previously explained in detail in other publications, the subsequent sections will simply emphasize their key features.

Decision Tree (DT)
DT (Sutton, 2005) is a type of tree-based model used for visualizing the decision-making process.It is also commonly known as CART (Classification and Regression Tree), which refers to the model's capability to perform both classification and regression tasks.Figure 4 illustrates that a decision tree is composed of four essential components: a root node, decision nodes, leaf (terminal) nodes, and two or more branches.The root node is positioned at the top of the tree and serves as the chief decision node, reflecting the goal of the decision tree.
The decision node, on the other hand, is where a condition is established that splits the dataset, while the leaf node denotes the conclusion of a branch, demonstrating a decision to be performed.In a decision tree, the root node represents the source data, and this data is recursively divided into smaller subsets using splitting conditions according to several metrics such as MSE (in the case of regression problems).The process of splitting is repeated for each subset that is derived until no further splits can be made that reduce the chosen metric or until the tree reaches its maximum allowable depth.This iterative process continues until the resulting tree is deemed satisfactory for the problem at hand.

Random Forest (RF)
Leo Breiman, (2001) developed RF algorithm, which used decision trees (DTs) as its base or weak learners and combines them to enhance the model's performance.DTs are prone to overfitting and instability but using ensemble learning with multiple DTs can help overcome these issues.In RF, several decision trees are trained on distinct random subsets of the dataset and features to improve the model's performance and robustness, and the final prediction is generated by voting on the outputs of these individual trees, as shown in Fig. 5. Random Forest can be advantageous over DT when dealing with large databases with many input variables, as it can handle them more efficiently by employing a large number of trees.While RF may train faster than a single DT, it can take longer to develop predictions due to the need to combine results from multiple trees.The default parameters of RF are frequently sufficient to generate acceptable outcomes, but hyperparameter tuning can be used to enhance the model's accuracy or speed.Overall, RF is a strong technique that inherits many of the merits of decision trees while also being more robust and adaptable for certain types of datasets and problems.

Gradient Boosting Regression Tree (GBRT)
Gradient Boosting Regression Tree (GBRT) was developed by Friedman, (2001).It is based on the AdaBoost algorithm with two modifications to the weak learners.
In the first modification, decision trees (DTs) were used as weak learners.The second modification is that weights are updated based on the residual errors of the previous weak learner, rather than its classification errors.GBRT has become a popular ML technique for regression tasks due to their ability to handle complex interactions between variables and the flexibility to learn non-linear relationships.Figure 6 Ke et al., (2017) developed Light Gradient Boosting Machine algorithm with a focus on computational efficiency without sacrificing accuracy, boasting a speed that can up to 20 times more rapid than Gradient Boosting Machine.Light word refers to its impressive speed compared to other boosting algorithms, including XGBoost, which can be slow to train with large datasets.The primary distinction between LightGBM and other boosting algorithms lies in the method of its expanded tree.Light-GBM uses a leaf-wise tree growth strategy to select the leaf with the largest loss (see Fig. 7) to reduce additional loss and improve accuracy than depth-first strategies used by other boosting algorithms when developing trees on the same leaf.This is in contrast to level-wise tree growth strategies used by many other gradient boosting algorithms, which may be less effective in reducing loss on individual leaves.LightGBM relies on two key features to achieve exceptional speed when compared to other boosting frameworks: (1) gradient based one-side sampling (GOSS) and ( 2) exclusive feature bundling (EFB).The use of GOSS and EFB can speed up training time for gradient boosting decision trees by reducing computational complexity.While the search results include information on LightGBM's design and performance.

Extreme Gradient Boosting (XGBoost)
XGBoost was developed by Chen et al., (2016), where XGBoost is considered to be one of the most efficient machine learning methods as it enhances the efficiency of Gradient Boosting Machines.XGBoost is a machine learning algorithm that employs several advanced techniques to train a high-dimensional data in a timely and precise manner.The algorithm uses a compressed column to store input dataset, which reduces the sorting cost and speeds up the training process.Additionally, it leverages a randomization technique to minimize overfitting and increase generalization.Lastly, XGBoost uses parallel and distributed computing techniques to fully utilize all available CPU cores during training and split finding.This enhances the scalability and performance of the algorithm, making it possible to train models on very large datasets with a large number of features.

Development of ML Models
The ML models proposed in this paper, comprising eight input parameters, are selected to identify the optimal model with the most influential parameters for predicting the target variable (shear capacity).To address challenges associated with low learning rates at extreme parameter values and enhance the accuracy and speed of modeling, the collected database is normalized to a range (1) where ŷ represents the predicted value, y is the corresponding actual value, y̅ is the mean of all y values in the data set, and N is the sample size.

Hyperparameter Tuning and Cross-Validation
There are multiple techniques that can improve the performance of ML models, and one effective strategy is to fine-tune the hyperparameter.Performance of a model can be greatly affected by the choice of hyperparameter values, and a systematic approach to variating the values and evaluating performance for each combination is called hyperparameter optimization.The grid search method is widely employed for hyperparameter tuning purposes.In order to address concerns related to overfitting during the hyperparameter optimization process, the approach of K-fold cross-validation is employed.The dataset is first partitioned into a training set and a testing set.Specifically, 80% of the entire dataset is for the training set, and the remaining 20% is for the testing set.The K-fold cross-validation involves carrying out the following steps: a) For the intention of cross-validation, it is customary to divide the training dataset into K-equivalent groups or folds.b) In each iteration, use K-1 folds for training and the remaining fold for validation.c) Repeat this process until every fold has been used for validation at least once.This approach allows for comprehensive validation of the model and reduces the risk of overfitting.
For hyperparameter optimization in this study, grid search technique was used together with tenfold For example, the effectiveness of the XGBoost algorithm depends on the initial setup of hyperparameters like the number of trees (n_estimators) and the learning rate (H.Nguyen et al., 2021aNguyen et al., , 2021b)).The optimization of XGBoost hyperparameters using tenfold cross-validation can be represented through four charts, each presenting the R 2 score for a different number of estimators (100,200,300,400).In each chart, there are four lines indicating various maximum depths (4, 8, 12, 16).According to Fig. 9, the R 2 score exhibits its optimal fit when the learn- ing rate reaches 0.5, the number of estimators equals 300, and the maximum depth equals 4.Moreover, with a high learning rate, the number of estimators shows no influence on the R 2 score.Conversely, when the learning rate drops below 0.2, a greater number of estimators is necessary for the R 2 score to achieve a stable maximum value.Furthermore, Table 3 synthesizes the optimal hyperparameter values for the DT, RF, GBRT, LightGBM and XGBoost models, which were all subjected to hyperparameter optimization.

Model Prediction Results
This section explores the predictive capacity of the developed ML models.After identifying the most effective hyperparameters for each ML algorithm.Figure 10 presents scatter plots illustrating the relationship between predicted and actual (experimental) shear capacity values of RCTBs using various ML models.The developed ML models demonstrate a strong alignment between predicted and experimental shear capacity, with R 2 val- ues exceeding or equaling 98%.Among these models, the XGBoost model demonstrated superior predictive performance, yielding an R 2 value of 0.9914 and minimal values for RMSE, MAE, and MAPE, which were 9.53, 3.75, and 0.03, respectively.In contrast, the RF model recorded the lowest R 2 value of 98.20 and higher values for RMSE (13.80),MAE (5.80), and MAPE (0.0422), as depicted in Table 4. Importantly, across all ML models, the predicted shear capacity values cluster closely around the 45° diagonal line, indicating a precise correlation between the experimental and forecasted shear capacity values.Additionally, Fig. 11 illustrates the residuals of predicted shear capacity of RCTBs, depicted as the difference between V exp and V pred on both training and testing datasets.It is noteworthy that the residuals for all models are centered around zero.Therefore, it can be concluded To facilitate comparison of the performance of the presented ML models, Fig. 12 displays the Taylor diagram (Taylor, 2001) for the ML models.Taylor diagram represents predicted model values in comparison to the original data by plotting correlation (r) and standard deviation against a reference dataset.In Fig. 12 the XGBoost model directly lies near the reference line of the original dataset.The Taylor diagram can also explain the reliability and precision of the ML models.Moreover, to visualize the outcomes of the models, graphical representations of the experimental and predicted shear capacity values versus the experiment number are provided, as depicted in Fig. 13.Additionally, error data are employed to enhance the visibility of the generated models.It can be observed that, among all models, the XGBoost model demonstrates that most of the predicted points are closer to the experimental points.It also indicates that the predicted errors in the XGBoost model are lower than those in other models.However, one potential reason for the occurrence of predicted errors is the scarcity of experimental data points available in this region, emphasizing the necessity for future experimental programs to address this gap.Finally, the superior performance of the XGBoost model in predicting the shear capacity of RCTBs leads to a focus on this methodology in subsequent investigations.
The shear capacity of RCTBs was determined using shear design models.It should be noted that all shear design models were applied to all specimens, except for the Thamrin et al. model, which predicted the capacity of specimens without steel stirrups.Figure 14 illustrates the experimental versus predicted shear capacity based on shear design models and the XGBoost model.The solid line denotes the best fit between the experimental and predicted responses.Furthermore, the assessment of both the six models and XGBoost model based on the mean, standard deviation (STD), and coefficient of variation (COV) of the V Exp to V Pred ratio is demonstrated in Table 6.It can be clearly indicated from Table 6 that all shear design provisions significantly underestimated the shear capacity of RCTBs, where the average ratio of experimental to predicted shear values for all the codes was above 1.0 (in the range of 1.87-2.94).Furthermore, the CSA23.3-04( 2004) code provided the most accurate predictions for the shear capacity of RCTBs, with an average ratio of V Exp to V Pred shear capacity of 1.87.Among the three existing formulas developed to predict the shear capacity of RCTBs, Table 6 indicates that the Zararis et al., 's model (2006) offered the most accurate predictions for the shear capacity of RCTBs overall.Particularly, the average ratio of V Exp to V Pred shear capacity is 1.71.The accuracy of the Zararis et al. model comes from its incorporation of the whole area of the T-beam and its determination of an effective width that is suitable for predicting the shear capacity.Following that, the predictions of the Sarsam et al., model (2018) also explore the influence of flanges on the shear capacity of RCTBs (average ratio of V Exp to V Pred = 1.78).Finally, despite Thamrin et al., 's (2016) equation being developed to predict the shear capacity of RCTBs without internal stirrups, it failed to accurately estimate the true capacity and provided conservative results.In particular, the mean ratio of V Exp to V Pred was 1.65 with COV of 32%.
In conclusion, the contribution of the flange to shear capacity is significant and should not be neglected when determining the shear capacity of RCTBs.Unfortunately, most existing design guidelines overlook this aspect, resulting in inaccurate estimations of the true shear capacity.With respect to the models proposed in the literature, the model presented by Zararis et al., (2006) offered accurate estimations of the shear capacity of RCTBs and could be safely utilized for designing RCTBs in shear.Furthermore, the XGBoost model demonstrated better predictive capability in contrast with the shear design models, with an average ratio of V Exp to V Pred was 1.0045 ± 0.0596 and COV of 5.93%, as observed in Fig. 14h and Table 5.

Explainability of XGBoost Model Using SHAP Approach
In this study, a unified SHAP method was utilized to explain the output of an XGBoost model and to identify the most variables and their interactions that impact the shear capacity of RCTBs.The most significant factor is determined through the absolute SHAP value.The x-axis position of each point represents the Shapley value for the corresponding factor, indicating its impact on the shear capacity.The y-axis displays a list of factors arranged according to their level of importance, as shown in Fig. 15a.The SHAP results indicate that the shear ratio (a/d) has the most significant effect on the shear capacity of RCTBs.This is followed by the ratio of shear reinforcement multiplied by the yield strength of shear reinforcement ( ρ v f yv ), flange thickness ( h f ), and flange width ( b f ).The SHAP sum- mary plot, shown in Fig. 15b, was used to evaluate the influence of input features on prediction results.Each point on the plot represents a prediction instance, with positive and negative SHAP values indicating the correlation between input features and outputs.Furthermore, the color of each point indicates the value of the input feature, varying from low (blue) to high (red).In general, it is observed that all input features show positive Shapley values (red points) except a/d, suggesting that higher values of these parameters result in an increase in the ultimate shear capacity of RCTBs.

Sensitivity Analysis
The sensitivity analysis of the XGBoost model involves varying one parameter at a time while keeping the rest ), shear span-to-depth ratio (a/d), Longitudinal steel ratio (ρ), and ratio of shear reinforcement multiplied by the yield strength of shear reinforcement ( ρ v f yv ).The result of the sensitivity analysis is presented in Fig. 16.In general, an increase in all input features results in an increase in the ultimate shear capacity of RCTBs, except for the increase in a/d, which decreases the ultimate shear capacity of RCTBs, as shown in Fig. 16f.This also indicates that increasing the flange dimensions enhances the shear capacity RCTBs, as shown in Fig. 16c, d.Furthermore, the predicted outcomes from sensitivity analysis of the XGBoost model for the shear capacity of RCTBs with diverse geometric and material properties align with the findings reported in references (Ayensa et al., 2019;Hawileh et al., 2022;Kadr et al., 2019;Ramadan et al., 2022) concerning the structural behavior of RCTBs.

Noise sensitivity analysis
Noise, defined as unwanted or irrelevant information in data, can significantly impact the performance of machine learning algorithms.Understanding the effects of noise on model performance is crucial for ensuring the robustness and reliability of predictive models.The impact of noise on XGBoost model performance was analyzed by comparing the performance metrics V n = Either of 0.17  2004)   1997). c Eurocode EN 1992-1-2, model (2004). d CSA23.3-04, model (2004).e Sarsam et al., model (2018).f Thamrin et al. model (2016).g Zararis et al., model (2006).h XGBoost model obtained under different noise levels.Trends and patterns in model behavior were identified to understand how noise influences predictive accuracy and generalization capabilities.The XGBoost model demonstrates robust performance with an R-squared value of 98% in the absence of noise.However, as sample noise increases by 100%, the model exhibits a notable decline in R-squared by 23.50%, resulting in a value of 75%.This decreasing trend persists with each successive 10% increase in sample noise from the noise-free condition, as illustrated in Fig. 17a.Similarly, concerning mean absolute error (MAE) and root mean squared error (RMSE), the XGBoost model achieves 10.9 kN and 15.6 kN, respectively, in the absence of noise.Nevertheless, with a 100% increase in sample noise, both MAE and RMSE experience a substantial uptrend, increasing by 60%.This increasing trend continues steadily with each 10% increment in sample noise from the noise-free condition, as depicted in Fig. 17b, c.Moreover, concerning the mean absolute percentage error (MAPE), the XGBoost model exhibits a similar trend with an increase in sample noise.With a 100% increase in noise, the XGBoost experiences a 55% MAPE rise.The graphical illustration of the MAPE trend is presented in Fig. 17d.As a result, the XGBoost model exhibited a decrease in predictive accuracy with increasing levels of noise, indicating its sensitivity to noisy inputs.

Graphical User Interface
The study developed a graphical user interface (GUI) platform utilizing the most precise model, namely XGBoost, to enhance accessibility for both practical engineers and the research community.This platform provides an easy-to-use interface to input data and obtain predictions of the shear capacity of RCTBs.The use of this GUI platform eliminates the need for users to have extensive knowledge of machine learning techniques, making the research more accessible to a wider audience.Python library allowed the research team to build an interactive UI for the machine learning model.Users can provide input feature values and quickly obtain the corresponding shear strength value through this interface.The GUI platform is accessible via reference (Yehia, 2024).It is important to note that this GUI is exclusively applicable for RCTBs with geometric and material properties outlined in Table 1, as the XGBoost algorithm was trained using these specific ranges.

Reliability Analysis for Shear Capacity of RCTBs
Reliability analysis in structural engineering is a process of evaluating the probability of failure of a structure.It is used to ensure that structures are designed to meet their intended function and performance requirements, and that they are safe under all anticipated loading conditions To ensure safety, structures are designed such that their capacity (R) exceeds the demand (Q).
The load and resistance factor design (LRFD) method is a common approach for calculating the limit state of a structure.In LRFD, the limit state is defined as the point at which the structure fails to meet its intended function.The LRFD equation for the limit state in the resistance factor format is: where the nominal resistance ( R n ) was reduced by the capacity reduction factor (φ) to account for uncertainties in the material properties and loading conditions.The load effect due to each type of load ( Q i ) was multiplied by the load partial safety factor ( γ i ) to account for the varia- bility of the loads.The load combination of 1.2DL + 1.6LL as specified in ACI 318 (ACI Committee 318, 2014) was (5) The target reliability index is selected based on the consequences of failure (Wight JK, 2016).A higher target reliability index corresponds to a lower probability of failure.According to (Wight JK, 2016), the target reliability index is typically between 3.0 and 3.5.Szerszen and Nowak, (2003) assumed a normal distribution for both the dead load and live load distributions.The bias of the dead load for cast-in situ concrete is 1.05, and the coefficient of variation is 0.1.The bias of the 50 year live load is 1.0, and the coefficient of variation is 0.18.
The reliability index β was calculated for a range of capacity reduction factors φ = 0.75:0.95and for each case of α = 0.0:0.10:1.00,where α is the ratio between live load to dead load.The results are plotted in Fig. 18.As expected, the value of β increased as φ decreased.
The resistance reduction factor was calibrated to achieve a target reliability index of 3.50.This was done using the least square method in Eq. ( 6): A larger safety margin is associated with a smaller resistance reduction factor.Figure 19 shows how the LSM varies as the resistance reduction factor (φ) changes for a target reliability index of 3.5.Furthermore, Fig. 19 shows that the minimum LSM corresponds to a capacity reduction factor of 0.89, which is recommended for RCTBs to achieve a target reliability index of 3.5 based on the proposed XGBoost model.

Conclusion
This paper has presented five machine learning (ML) models-Decision Trees (DT), Random Forest (RF), Gradient Boosted Regression Trees (GBRT), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost)-for predicting the shear capacity of RCTBs.ML models were trained using extensive datasets of experimental data to comprehend the intricate relationships between input parameters and the corresponding shear capacity.The accuracy of shear design models in predicting the shear capacity of RCTBs was

Fig. 1
Fig. 1 Histogram with a normal distribution fit for both input and output features

Fig. 2
Fig. 2 Pearson correlation analysis of nine parameters illustrates that the training of the next tree considers the residual error from the previous tree.After the initial weak learner, each subsequent tree is trained.The final model can capture the residual errors of the weak learners and improve the accuracy of predictions.GBRT also offer flexibility by enabling the tuning of hyperparameters, including the number and depth of trees, and the learning rate to control the model's convergence and training speed.

Fig. 6
Fig. 6 Illustration of the GBRT model of [− 1, 1].The four common statistical metrics, specifically mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and correlation coefficient ( R 2 ), are frequently utilized to assess the efficacy of prediction models.A reduced value for MSE, RMSE, and MAE corresponds to improved model performance, whereas R 2 values within the range of 0-1 quantify the concurrence between projected and actual values.Consequently, elevated R 2 values signify enhanced model effectiveness.The formulations for these four parameters are as follows (Wakjira et al., 2022):

Fig. 7
Fig. 7 Outline representation of LightGBM and another Bas

Fig. 11
Fig. 11 Residual of the predicted shear strength in both the training and test datasets for a DT, b RF, c GBRT, d LightGBM, and e XGBoost models

Fig. 15 a
Fig. 15 a Overall importance of the input features, and b summary plot of the input feature effects

Table 1
Geometrical dimensions and material properties of the experimental tests

Table 4
Performance metrics of ML models

Table 5
Existing design models for estimating the shear capacity of RCTBs

Table 6
Assessment of shear design models and XGBoost model based on V Exp to V Pred ratio STD standard deviation, COV coefficient of variation