Application of Ensemble Learning Using Weight Voting Protocol in the Prediction of Pile Bearing Capacity

Accurate prediction of pile bearing capacity is an important part of foundation engineering. Notably, the determination of pile bearing capacity through an in situ load test is costly and time-consuming. Therefore, this study focused on developing a machine learning algorithm, namely, Ensemble Learning (EL), using weight voting protocol of three base machine learning algorithms, gradient boosting (GB), random forest (RF), and classic linear regression (LR), to predict the bearing capacity of the pile. Data includes 108 pile load tests under different conditions used for model training and testing. Performance evaluation indicators such as R-square (R2), root mean square error (RMSE), and MAE (mean absolute error) were used to evaluate the performance of models showing the efficiency of predicting pile bearing capacity with outstanding performance compared to other models. The results also showed that the EL model with a weight combination of 
 
 
 
 w
 
 
 1
 
 
 
  = 0.482, 
 
 
 
 w
 
 
 2
 
 
 
  = 0.338, and 
 
 
 
 w
 
 
 3
 
 
 
  = 0.18 corresponding to the models GB, RF, and LR gave the best performance and achieved the best balance on all data sets. In addition, the global sensitivity analysis technique was used to detect the most important input features in determining the bearing capacity of the pile. This study provides an effective tool to predict pile load capacity with expert performance.


Introduction
e cost for the foundation of the work usually accounts for a very large proportion of the total construction cost [1]. erefore, choosing the right foundation structure solution, as well as determining the load capacity of the foundation, also plays an important role in reducing construction costs [2]. Among the commonly used foundation solutions today, the pile foundation is one of the most popular deep foundations for its advantages. In particular, the pile foundation is capable of transmitting into the deep soil layers [3,4]. Determining the exact bearing capacity of the pile will help determine the correct foundation size and depth of the pile, thereby choosing the most appropriate foundation solution.
To determine the bearing capacity of the pile, people can use different methods [5][6][7][8][9][10], for example, dynamic load testing (PDA) and static pile load test. In addition, many traditional formulas had been proposed, which were mainly based on the in situ testing result of soil such as cone penetration test (CPT) and standard penetration test (SPT) [5][6][7][8][9]. In some other studies, the authors also used the finite element method to analyze the pile load and pile displacement relationship of [10][11][12].
In the specific conditions, the above methods all prove to have many advantages, but, besides that, there are still many problems that need to be carefully considered before being widely applied in construction practice. Specifically, in their study, Jesswein et al. [13] showed that the calculation of pile load capacity according to SPT is not completely reliable because it still lacks precision, although it is cheap and easy to perform. Similarly, the analytical method is considered impractical, since the theoretical approach is based on too many assumptions and simplifications [3]. In another argument by Abu-Farsakh and Titi [14], it is assumed that the methods of piles empirical analysis or piles static analysis are expensive and have low accuracy due to the large selection of safety factors. On the other hand, for the pile load test method, although the percentage of reliability is high, it is time-consuming and costly and the equipment is often very cumbersome [15]. e dynamic method is highly dependent on the characteristics of the pile, the hammer, and the position of the pile to predict the bearing capacity of the pile without considering soil effects [11,16]. Last but not least, the characteristics of the numerical simulation method are finite-element-based; basically still approximate method and the results depend a lot on the modeling process [17].
In recent studies, many scientists have used a new approach to the problems related to the foundation of the building. at method is known as artificial intelligence (AI). Based on the development of computer science, AI has gradually proven its remarkable effectiveness in numerous different fields such as construction [18], transportation [19], security [20], and medicine [21]. e essence of AI algorithms is a combination of math, algorithm, and creativity. AI allows solving complex problems and has many unknowns, so it is suitable for nonsnow problems in the field of geotechnical engineering [3]. Scientists have used a lot of different AI algorithms to solve the problem of predicting the pile bearing capacity, which can be named as artificial neural network (ANN) [3,[22][23][24][25], deep neural network (DNN) [26][27][28], adaptive neurofuzzy inference system (ANFIS) [29,30], and random forest (RF) [23]. e above studies all showed good predictive efficiency and were expected to become a highly generalizable tool in the prediction of pile load capacity. However, the published literature mainly focuses on a certain algorithm, making it difficult to confirm the optimum of the model for the data. Consequently, the usage model can lead to imbalances in bias and variance or overfitting and underfitting problem which is difficult to be fully solved due to the model's algorithm [31,32]. erefore, in this study, the Ensemble Learning model [33][34][35] combining several base models according to a certain rule is expected to balance bias and variance, bringing outstanding efficiency in predicting the bearing capacity of piles. In addition, published studies are only interested in input parameters related to soil characteristics and pile geometry parameters but not other important factors such as pile type, method of pile testing, pile construction method, and pile tip type. ese additional parameters which greatly influence the prediction performance were carefully considered during the modeling process in this study.
In this study, three different machine learning models were developed to independently predict the pile load capacity on the same data set, namely, gradient boosting (GB), random forest (RF), and classic linear regression (LR). en, an Ensemble Learning (EL) algorithm was developed, using the voting protocol of base models to make the final prediction. A total of 108 data sets from the published literature were used to train and test the model; and since the data set is aggregated from projects around the world, the resulting model can expect the ability to predict the bearing capacity of different types of piles in different regions and can be widely beneficial internationally. e remainder of the paper is structured as follows: Section 2 presents the materials needed for the research, including an introduction to base machine learning methods such as RF, GB, and LR, as well as an algorithm to combine the underlying algorithms into a voting EL model. Section 3 presents the database used to develop the machine learning models, as well as the statistical information of the input and output variables. Section 4 shows the results analyzed by the machine learning models and compares those machine learning models with each other to see the outstanding performance of the EL model. Finally, Section 5 offers some conclusions and opinions.

Random Forest.
e scientist Breiman [36] was the first to mention the random forest algorithm, also known as the group learning method. Later, this algorithm was exploited and developed by many other scientists; for example, in the research of Kuhn and Johnson [37], the authors used decision trees of the same distribution to establish forests and train and predict sample data. Decision trees are capable of solving classification and regression problems based on a supervised learning approach [23,38]. e bootstraps technique (also known as the bagging method) is used for the RF algorithm, which has the effect of bringing together different base learners of different backgrounds and random construction.
e biggest advantage of RF is that it is possible to build a set of basic learners while controlling variations.
e RF algorithm is developed through the following steps [39]: Step one: create random vectors. e random sample is selected from the input data set.
Step two: for each sample, a decision tree is built, and from there, choose 1 prediction result.
Step three: vote for each prediction results, and then choose the most voted prediction. e result is the final prediction. e schematic diagram of the random forest model is shown in Figure 1.

Gradient Boosting.
Boosting algorithm was first introduced in 1990 by the machine learning community. By repeating some simple patterns, weak learners will be improved to become good learners [40]. In the first phase, boosting is designed as a machine learning process to improve the prediction of binary results, which is done through base learners (using sets of decision tree weights) [41]. Boosting is then considered to reduce the slope in functional space and can be used to accommodate statistical regression models [42].
Gradient boosting (GB) has advantages over other statistical learning methods in that it can interpret results and simplify complex interactions, and it also requires preprocessing of data and adjustment of parameters number [43]. In addition, GB also helps increase the slope, improves model accuracy, and makes selection [44]. In this study, gradient boosting used the decision trees model of a fixed size as base learners and used boosting algorithm developed by Friedman [45,46].

Linear Regression.
Linear regression (LR) models have been widely used to solve problems in statistics [47][48][49]. In essence, linear regression tries to model the relationship between input and output variables by adjusting a linear equation.
In this study, since there are 9 input variables and one output variable, the linear regression equation has the following form: where β j is the weight of jth input variables and β 0 is the bias (or intercept). e least-squares method [50,51] was used to find the values of the optimal weights by minimizing the sum S of squared residuals: 2.4. Ensemble Learning. Ensemble Learning (EL) is a machine learning technique that combined several base models to create one optimal predictive model. e combining models were expected to provide better predictive performance than member models and used to solve out any problem in technique [52,53]. e random forest and gradient boosting models were bagging Ensemble Learning themselves based on multiple decision trees to make final predictions. erefore, a combined model of random forest, gradient boosting, and classical linear regression based on average voting protocol was expected to give the best prediction of pile load capacity. ere are two ways to use the voting protocol in the EL method, namely, average voting (AV) and weight voting (WV). In AV-EL, the base model's weights are the same and are equal to 1, while in WV-EL the weights are in the [0, 1] range and the sum of all weights is equal to 1. e flow chart of the EL model used in this study is presented in Figure 2, in which weight voting of three base models was used to make the final prediction. e final prediction equation has the following form: where w j is the weight of the jth learner; P ji is the prediction of the jth learner; and n is the number of learners in the EL model. In this study, the optimum weight of WV-EL was determined by random search (RS) technique. In this technique, the weights are randomly selected within the range [0, 1] which allows the sum of all weights to be 1. e performances of the WV-EL models using different combinations are compared, and then the model that gives the best performance is selected as the final EL model. e pseudocode of the RS technique used in this study is written as follows: To evaluate the consistency between the predicted results and the actual results, three performance indicators were used: RMSE (root mean square error) [54,55], R 2 (coefficient of determination) [56,57], and MAE (mean absolute error) [58,59]. R 2 allows determining the statistical relationship between the predicted values with the experimental results. R 2 ranged from [−∞ ÷ 1], and the model will be said to be more accurate when R 2 approaches 1. Conversely, the lower the values of RMSE and MAE are, the more accurate the calculation result is. e values of R 2 , RMSE, and MAE were calculated through the following formulas: where k denotes the number of samples, y i and y i are the actual and predicted outputs, respectively, and y is the average value of y i .

Data Used
In this study, a database of 108 pile load test reports was collected from published literature [30]. All input parameters that might affect the determination of the pile load were considered, including the type of test (T), the type of pile (P), the installation method (denoted as I), end of pile type (E P ), the pile tip cross-sectional area (A t ), and the shaft area (A f ). e soil properties were shown through parameters obtained from cone penetration test (CPT) results, including the average cone tip resistance along the embedded length of the pile (q ca ), the average cone tip resistance over influence zone (q ct ), and the average sleeve friction along the embedded length of the pile (f sa ). e considered output is the ultimate bearing capacity of the pile (denoted as P u ). e summary of the database statistics is presented in Table 1, which includes the min, mean, max, median, and standard deviation (SD) values of all parameters used in this study. Besides, the histogram charts of all the input and output variables were illustrated in Figure 3

Effect of Training Data Size.
Since the size of the training set affected the accuracy of the model, the size of the training data set will be carefully considered [60]. In this section, the suitable training part size was found by changing it from 20% to 80%; meanwhile the size of the testing set decreased from 80% to 20% of the original data set. To prevent data leakage, the testing set was hidden from the training process and the performance of the model was based on the 5-fold CV technique [61,62]. e random forest model was selected for training and verification during this period. It is important to note that 300 samplings were performed in shuffling the database indexes to make the results more objective. e initialization of the RF model is shown in Table 2. Figure 4 showed the result of the RF model using different training set sizes. It can be seen that training performance increased slightly with the increase in training set size. To be more specific, the average R 2 value was increased from 0.847 to 0.918, while the training set size was increased from 20% to 80%. At the same time, the average predictions of the RF model on the validation set showed the otherwise Table 1: Summary of the statistical information of the database used in the present study.   Mathematical Problems in Engineering results; the average R 2 was increased from negative to 0.584 when the training data size was increased from 20% to 80%. Moreover, the standard deviation (SD) was decreased from 0.998 to 0.1 when the training data size was increased from 20% to 80%. e decrease of SD indicated that the model was more stable with the biggest size of the training set. Aiming to get the best accuracy and stability of the model, the training set size of 80% is selected in this study.

Predictive Capability of the Base Models.
In this section, the predictability of the base ML and AV-EL models was examined. For the test results on all models to be consistent, a single method of dividing training and test data was used for all ML models. e initialization parameters of base models are presented in Table 3, while the Ensemble Learning method used average voting protocol. It is important to notice that the testing data set is used to evaluate the performance of base models, while the training set is used to build the models. In addition, percentage relative error (RE) in the bearing capacity of the pile is also considered for the testing data set to evaluate the performance of the three base models and combination model. is criterion is expressed as where b i and b are the actual and predicted outputs. e comparison of RE for the testing set is shown in Table 4. It can be seen that all three base models give a quite good predictive performance. However, no model outperformed the EL model for the entire testing data set. e predictive performance of one model may be satisfactory for some data samples but poor for the rest. On the other hand, the performance of the EL model seems to balance the predicted performance among the base models, leading to the best overall prediction result for the entire test data set. Figure 5 presented a visual comparison of the ultimate bearing capacity of pile-based experimental and predicted results from ML models. e performance indicator results were summarized in Table 5. It can be seen that the GB and RF models seem to suffer from overfitting as the performance indicator for the training set is significantly better than that for the testing set. In contrast, the LR model suffered from underfitting when the performance for the testing set was better than that for the training set. At the same time, the combination model seems to have solved all of the overfitting and underfitting problems of the three base models and gave the best balance in performance on training and testing sets. To be more specific, the performance index values of the combined model were R 2 � 0.928, MAE � 292.897, and RMSE � 407.041 on the training set and R 2 � 0.955, MAE � 263.49, and RMSE � 372.338 on the testing set.

Improved Performance of EL Model Using Weight Voting
Optimization. In the previous section, the performances of the base models were proved to be dissimilar. Models such as GB and RF have good predictive performance, while LR has lower predictive performance. Moreover, the AV-EL model seems to exhibit a slight imbalance between bias and variance, based on the fact that the performance on the testing set was still better than that on the training set. erefore, to create even a better EL model, member models should play a different role in the final prediction. To do that, the RS technique was used to optimize the weights of the base models. In this technique, 1000 simulations were performed to find out the best weight combination for the WV-EL model. It is important to notice that, to avoid data leak, 5fold CV technique was used to evaluate the WV-EL model instead of the testing set. Besides, performance indicator R 2 was selected to be the fitness value of optimization. e results of the optimization process are illustrated in Figure 6. It can be seen that, out of three base models, GB played the most important role for the EL model in predicting the pile bearing capacity. To be more specific, the higher the first weight was, the better the WV-EL model performed. Besides, the second weight should be in the [0.2-0.7] range and the third weight should be less than 0.4. e best fitness value of optimization was R 2 � 0.932, and the optimum weights are summarized in Table 6. e regression graph for the WV-EL model is shown in Figure 7. e visual comparison between P u results of the WV-EL model and experimental data is illustrated in Figure 8 (for the training data set) and in Figure 9 (for the testing data set). It can be seen that the WV-EL model gave a       In other words, the WV-EL model exhibited a more balanced performance and was a better choice than the AV-EL model. Table 7 presents some results on ML applications in pile foundation. e results of the present study as well as those of the previous studies showed the expected ML technique in predicting pile bearing capacity with good prediction. e comparison between study results does not make much sense because of the different data sets and major inputs. However, among those models, the WV-EL model showed high predictive performance and provides a balanced prediction across the entire data set. Besides, the data set size in the study is limited, so further studies on other data sets are needed to provide the most general model for the prediction of pile bearing capacity.

Comparison with the Previous Study.
SPT is standard penetration test value, CPT is cone penetration test value, G denotes pile geometrical properties, ϕ denotes internal friction angle of soil, σ′ denotes effective stress of the soil, HW is hammer weight, PS is pile set, DH is     drop height of hammer, T is type of test, P is type of pile, I denotes the installation method, and E P denotes end of pile type.

Sensitivity Analysis.
In this section, a global sensitivity technique is conducted to evaluate the importance of input parameters for the model using the Sobol method [64]. e survey data set is taken from the permitted range by Saltelli's sampling scheme [65]. To make the results more objective, 300 simulations taking into account the shuffling in the order of the training data were performed to construct the EL model. e analysis result is shown in Figure 10. It can be seen that, among the 9 input variables used to detect the pile bearing capacity, the shaft area (A f ) was the most important feature, which scored an average sensitivity index of 0.537. From a soil mechanic point of view, A f affects the contact area between the soil and the pile; in other words, it affects the total positive friction force on the pile and so plays an important role in predicting the bearing capacity of the friction pile. e pile tip cross-sectional area (A t ) was the next important variable, which refers to pile tip resistance, with score sensitivity index of 0.363. e variables q ct , T, and f sa were ranked as the third to the fifth important predictors, with an average sensitivity index ranging from 0.039 to  0.031. e remaining variables including E P , q ca , I, and P got a sensitivity index less than 0.012, indicating that they did not affect the result much.

Conclusions
In this study, an EL model was developed to predict the pile bearing capacity. e combination model was made up of three base models: GB, RF, and LR. A database containing 108 samples was used to develop and evaluate ML models. An optimization technique, namely, the RS, was used to choose the optimal weights for the WV-EL model. e results showed that the two EL models used in this study, namely, AV-EL and WV-EL, exhibited superior performance compared to the base models. Besides, the EL models seem to solve the overfitting and underfitting problems of the base models by demonstrating excellent performance on both the training and testing data sets. In particular, the WV-EL model which has weights of w 1 � 0.482, w 2 � 0.338, and w 3 � 0.18 for the GB, RF, and LR model, respectively, showed the best balance in the ability to predict pile bearing capacity on both the training and testing data sets.
In addition, the global sensitivity analysis technique was used to detect the most important input variables. e results showed that, out of all the input parameters, the shaft area (A f ) and the pile tip cross-sectional area (A t ) were considered to be the most important features for modeling the pile bearing capacity. e research results have provided an effective tool for predicting the pile capacity according to the CPT test results and also showed potential in the use of the weight voting protocol Ensemble Learning model in solving foundation engineering problems.
Data Availability e processed data are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.