Use of machine learning techniques for predicting the bearing capacity of piles

Designers need to estimate the load bearing capacity of piles and the most precise way is through static pile load tests. The Brazilian Standard (ABNT 2019) define procedures for this type of test, which basically consists on applying an increasing load to an executed pile and measuring its displacement. Designers can obtain the load bearing capacity by examining the load-displacement graph, using criteria defined by the standard. Nonetheless, they cannot rely only on static pile load tests because they are expensive, time consuming and usually executed when part of the piles of the project are already in place. The most popular approach to estimate pile bearing capacity beforehand is to use semiempirical methods, like those proposed by Aoki & Velloso (1975), Décourt & Quaresma (1978, 1998) and Meyerhof (1976). Most semi-empirical methods propose two separate estimates: one for the shaft resistance and another for the tip resistance. The total pile bearing capacity given by the sum of them. These methods usually estimate the bearing capacity through results of in situ tests and pile geometric features. In several countries (including Brazil), contractors usually only make available the standard penetration test (SPT). The main reasons are cost and simplicity when compared to methods like the cone penetration test, making the SPT popular in those countries. Even when designers do have access to other in situ tests, they sometimes rely on correlations to convert data into SPT values. In recent years, machine learning techniques are increasingly gaining space within a wide variety of engineering applications. Their advantages include the capability to deal with large amounts of data and to find complex and highly nonlinear relationships among different parameters. In geotechnics many works have been using these algorithms to solve different kinds of problems with good results over traditional methods. Some of these problems are: soil classification (Bhattacharya & Solomatine, 2006; Kovacevic et al., 2010; Bonini et al., 2017; Carvalho & Ribeiro 2019), soil profiling (Arel, 2012), soil liquefaction (Juang & Chen, 1999; Hanna et al., 2007; Goh & Goh, Abstract Geotechnical engineers frequently rely on semi-empirical methods like Décourt-Quaresma and Meyehof’s to estimate the bearing capacity of piles. This paper proposes alternatives to these methods, presenting an approach using machine learning models for predicting the bearing capacity of precast concrete piles. It uses data samples including 165 load tests, each one accompanied with a SPT sounding. This study proposes two types of analysis using two separated datasets, one based on the Décourt-Quaresma method and the other based on the Meyerhof method. Six machine learning algorithms of distinct biases are trained and tested with a leave-one-out cross validation procedure and the models’ predictive performance is assessed through two metrics: root mean squared error (RMSE) and coefficient of determination (R2). The best performing technique was random forest (RF) using Décourt-Quaresma dataset, with an RMSE of 642.38. All other machine learning techniques obtained a RMSE below 710, overcoming Meyerhof’s and Décourt-Quaresma’s semi-empirical methods, which both obtained RMSE values close to 900. This study proposes 95% and 90% confidence intervals for the best technique employing a graphical interpretation, so that geotechnical engineers can choose which level of safety they wish to work with. Finally, the study presents a case study showing that the best performing models achieve a reasonable accuracy, surpassing the semi-empirical methods in two of the three piles considered. The representativity of the new examples within the used datasets explain the accuracy of the techniques.


Introduction
Designers need to estimate the load bearing capacity of piles and the most precise way is through static pile load tests. The Brazilian Standard (ABNT 2019) define procedures for this type of test, which basically consists on applying an increasing load to an executed pile and measuring its displacement. Designers can obtain the load bearing capacity by examining the load-displacement graph, using criteria defined by the standard. Nonetheless, they cannot rely only on static pile load tests because they are expensive, time consuming and usually executed when part of the piles of the project are already in place. The most popular approach to estimate pile bearing capacity beforehand is to use semiempirical methods, like those proposed by Aoki & Velloso (1975), Décourt & Quaresma (1978, 1998 and Meyerhof (1976). Most semi-empirical methods propose two separate estimates: one for the shaft resistance and another for the tip resistance. The total pile bearing capacity given by the sum of them. These methods usually estimate the bearing capacity through results of in situ tests and pile geometric features. In several countries (including Brazil), contractors usually only make available the standard penetration test (SPT). The main reasons are cost and simplicity when compared to methods like the cone penetration test, making the SPT popular in those countries. Even when designers do have access to other in situ tests, they sometimes rely on correlations to convert data into SPT values.
The main objective of this paper is to propose a new approach for the use of machine learning techniques, using classical semi-empirical methods as a basis for estimating the bearing capacity of piles. It is better than previous machine learning models from the literature concerning generality for tropical soils and ease of use. The used datasets include only static load tests (slow maintained load) of pre-cast concrete piles executed accordingly to the Brazilian standard (see ABNT 2019) and accompanied with SPT soundings. The investigation starts with the training of six machine learning techniques, producing two models for each one: the first using the inputs from the Décourt-Quaresma method and the second the inputs from the Meyerhof method. A multiple linear regression (LR) is also included as a baseline for performance. The authors selected the Décourt-Quaresma method because it is commonly used in Brazilian foundation projects and Meyerhof for being widely used worldwide. Both sets of inputs include pile diameter and length, the mean SPT along the shaft and the mean SPT at the pile tip. The main difference of the two sets is how mean SPT values are calculated.
It is shown in a general application that the precision of all machine learning techniques surpassed both Meyerhof (1976) and Décourt & Quaresma (1978, 1998 semi-empirical methods with respect to RMSE. This work proposes a graphical method to provide 90% and 95% confidence intervals for the results of the best technique. A case study applies the top two machine learning models and the two semi-empirical methods to three new examples, from one site that was not included in the training dataset. The machine learning techniques presented reasonable performance, and were better than the semi-empirical methods in two of the three piles.

Semi-empirical methods
Semi-empirical methods work based on empirical correlations of in situ tests data and adjustments with load test results. Results can vary for these methods due to their implicit subjectivity. For the Meyerhof method, little subjectivity was included because it uses N SPT and pile geometry as inputs, which are not sensitive to interpretation. On the other hand, the Décourt-Quaresma method relies on soil types as presented in Table 1, which are sensitive to interpretation.
In these methods, the pile load capacity t R is usually divided into two parts: lateral friction l R and tip resistance p R , as shown in Equation 1. Different expressions are proposed for l R and p R in the literature, using information such as soil type, pile type, pile geometry and in situ test results.
The authors selected two methods for this study: the Décourt-Quaresma method for being popular in Brazil and the Meyerhof method for being widely used around the world. The next sections describe these methods.

Décourt-Quaresma
This method obtains the tip resistance using a factor related to the soil type, as presented in Table 1. It also uses the tip area p A and the mean N SPT index around the pile tip p SPT , considering the value at the tip, the one above and the one below. It obtains the lateral resistance using pile geometry and the mean N SPT index along the pile shaft where α and β refer to soil and pile type, respectively. U is the pile perimeter and L is the pile length.

Meyerhof
This method uses the N SPT index, pile length L and pile diameter D to estimate the pile bearing capacity. It calculates l SPT as the mean of the whole pile shaft and p SPT as the mean including 8D above the tip and 3D below it (Meyerhof, 1976). The expression proposed by Meyerhof is Table 1. Values for K (Décourt & Quaresma, 1978, 1998

Dataset
The information used in this work includes 165 precast concrete pile load tests and their respective SPT measures collected from many different construction sites in Brazil. It was obtained from the works of Lobo (2005), Vianna (2000) and Santos (1988) and all load tests were performed according to the Brazilian Standard (ABNT, 2019). When the maximum applied load was not achieved, the loadsettlement curve was extrapolated using the Van der Veen method (Van Der Veen, 1953). Interested readers can find further detail about these load tests in Lobo (2005), Vianna (2000) and Santos (1988). In specific cases, information about pile rupture and comparisons between applied and ultimate loads is available. Figure 1 presents the location of the soundings, most of them from the south and southeast regions of Brazil. The country presents a predominant tropical climate and high temperatures, with 65% of its territory covered by non-homogeneous lateritic soils. The clay-ferruginous soil is the most common type (Morais et al., 2020). The authors had access to some details about the set provided by Vianna (2000), which is composed by soundings taken from the city of Curitiba, in Paraná state. The geology of this region can be divided into three groups: a metamorphic rock complex from the Precambian; sedimentary deposits from Tertiary; and a more recent sedimentary deposit (Holocene), as a result of a partial removal of older sediments (Cenozoic). This entire sequence of Cenozoic sediments in the Curitiba Basin is named Guabirotuba Formation in the literature (Bigarella & Salamuni, 1962).
After assembly, raw data was preprocessed into two datasets. The first, named Décourt dataset, uses p SPT , l SPT , D (in cm) and L (in m) as calculated in the Décourt-Quaresma method. The second uses the same inputs, but defined accordingly to the Meyerhof method. Notice that the difference is how each method calculates p SPT and l SPT , as presented in previous sections. The authors did not include soil type among the inputs because, based on their previous experience, these variables do not contribute to improve accuracy and include too much human error. Thus, although the authors consider the position of the water table relevant for the problem, they decided not to include it because many of the used soundings did not include this information. Figure  in kN, obtained from the load test . Tables 2 and 3 present a sample of each dataset. Unity b refers to the number of blows needed for the sampler to penetrate 30 cm into the soil (Salgado, 2008). Tables 4 and 5 present correlation matrices generated for each dataset. Inputs are not severely correlated, with all values within the interval [ ] 0.7, 0.7 − . This indicates that they can be all considered informative, occurring few redundancies between them. Notice that the correlation between l SPT and p SPT is only 0.35 for the Décourt dataset, while it rises to 0.7 for the Meyerhof dataset. This can be explained by the way each method obtains these variables, with completely separated soil layers considered for the Décourt dataset and an intersection of common soil layers considered for the Meyerhof dataset (see Figure 2). D and L are the ones with stronger correlation to the output u Q , which was expected.

Model description
After pre-processing, this study uses both datasets to train a set of selected machine learning algorithms. First step is organizing each dataset as a matrix, where each column represents an input or the output and each line represents an example. In other words, each dataset becomes a 165 5 × matrix. Next, it divides the examples (lines) of each dataset into two portions: the training set and the test set. This work uses the leave-one-out cross validation approach, using the full dataset for training except for one example kept apart for test. The procedure tests all examples and the final accuracy is given by the mean (Wong, 2015).
is one metric used in this work to evaluate the performance of the algorithms. It is obtained using Equation 6, where ˆi y is a predicted value obtained from the model, i y is an observed value from dataset, y the mean of all observed values and ne is the number of examples. In this work, i y is the pile bearing load capacity ( u Q ) of a specific pile i. 2 R values close to 1 imply that the target variable is completely explained by the used model. 0 means no connection between predicted and observed values. The literature considers this metric a meaningful indicator of accuracy (Draper & Smith, 1998).
Other performance metric adopted in this work is the root mean square error (RMSE). It is calculated for all machine learning models and is given by Equation 7.
The machine learning techniques used in this work are k-nearest neighbor (KNN), kernel-KNN (KKNN), decision tree (DT), random forest (RF), artificial neural networks (ANN) and support vector machines (SVM). The following subsections present them, with a brief overview of its functionality. They were chosen considering their popularity within machine learning applications, their different biases and their reasonable results towards this work dataset. Multiple linear regression (LR) is also included as a baseline for the performance of the techniques.

KNN and KKNN
The KNN technique understands each example as a point whose coordinates are the inputs. It expects that a new example would have an output similar to those that are close in this input space. The regression problem can use Equation 8, which defines the output of the new example as the average value of its k nearest neighbors.
This work weights the output of each neighbor with respect to its distance to the new example, giving more weight to closer ones to improve accuracy (Dudani, 1976). It calculates the distance using the Minkowski metric, as presented in Equation 9. In this work 2 = p , which leads to the Euclidian metrics. Equation 9 gives the distance between arbitrary points represented by vectors a and b, with components ( ) 1 , ,  n a a and ( ) 1 , ,  n b b , considering an n-dimensional space.

( )
KNN has the disadvantage of poor performance for some type of complex problems (Kuo et al., 2008). The KKNN technique solves this problem by changing the distribution of samples, mapping them into a higher dimensional input space. The objective is to obtain a linear problem in this new space, equivalent to the nonlinear problem of the original space. Equation 10 presents an example of mapping a n-dimensional input space into a m-dimensional space:   S . ψ defines the mapping from 1 S to 2 S and , 1, , ϕ =  i i m , are input mapping functions. One problem in this approach is that finding ψ is usually impracticable. Nevertheless, the mapping does not require ψ if the internal product ( ) ( ) . ψ ψ a b is known for arbitrary vectors a and b. This inner product is called kernel (Yu et al., 2002).
The most commonly used kernel functions are: polynomial, radial basis and sigmoid, as shown in Equations 11, 12 and 13, respectively: where ρ , σ , γ and ω are adjustable parameters and .
a b is the inner product between vectors a and b. This work uses the radial basis kernel based on preliminary tests.

DT and RF
A DT model is a flow-chart-like structure, with nodes that create ramifications dividing the dataset. It starts with a single root node that receives the complete dataset and distributes it to other nodes using a rule, which is usually an inequality applied to one of the inputs. New nodes receive portions of the dataset, subjects them to their rules and distributes them to other nodes, forming the branches of the tree. The last nodes, called leafs, assign outputs to the examples. Figure 3 presents a scheme of a DT.
One disadvantage of DTs is that they tend to become overspecialized in the dataset used for training, which prejudices performance for new examples. This behavior is called overfitting. RF is a technique based on DTs that minimizes this problem by using a collection of different DTs built randomly. The algorithm selects a subset of examples for each tree and node, ensuring that they are different. After RF creates the trees, each one make a separate prediction and the mean gives the final value (Ho, 1995).

ANN
The interaction of neurons in the human brain inspires the ANN algorithm. Its structure consists of a number of processing elements or nodes that are arranged in layers: an input layer, an output layer and one or more hidden layers. Each node from the first layer receives an input i x , which is multiplied by an adjustable connection weight ij w . These values are inputs for the neurons of the next layer, that sum them and add a threshold value θ j to obtain a combined input This work uses a sigmoid function for activation, which is expressed as:  where λ is a calibration parameter.

SVM
SVMs use statistical learning principles as a basis. Their main objectives are minimizing errors associated with the training dataset and maximizing the generalization of the model (Vapnik, 1999). The algorithm uses a set of functions for its regression model that can have a solution as given in Equation 17: is the input of l samples and m dimensions, ∈ m R y is the output, w is the weight vector and ι is the bias. The margin is a distance from the hyperplane which is set to contain all points, as illustrated in Figure 4. This distance is the error ∈ to be minimized, included in Equation 18 as follows: Equation 19 presents the function to be minimized.
ξ i and * ξ i are parameters introduced to penalize points outside the margins and parameter C controls these penalties (Smola & Scholkopf, 2004). The algorithm solves this optimization problem using Lagrange multipliers (Vapnik, 1998).
This procedure is valid for linear problems. One can extend it to nonlinear problems using kernels to map the input data into a higher dimensional space. It is the same approach described for the KKNN. The authors chose radial basis functions after observing better accuracy in preliminary tests.

LR
A LR seeks a linear relationship between the input variables and the output. Equation 20 represents the model generated by this kind of regression: where ˆi y is the predicted variable, β j are the coefficients determined by the model and j x are the input values for the problem.
This technique has the advantage of being simple and widely used in geotechnical engineering practice, but it cannot reproduce non-linear behavior. Although it is not expected to obtain good results from this technique, it is included in this work as one of the baselines for the performance of the machine learning techniques.

General application
The objective of this example is to apply the six machine learning techniques to Décourt and Meyerhof datasets,using RMSE and 2 R metrics to evaluate performance. The baselines for performance are the original semi-empirical methods and LR.
Tables 6 and 7 present the performance obtained using the Décourt and Meyerhof datasets, respectively. RF was the technique with best accuracy, presenting the lowest RMSE in both tables. The second best was KNN for Décourt, followed  by ANN and KKNN. For Meyerhof, the second best was ANN followed by KKNN and KNN. DT presented the worst performance, which can be explained by the tendency of this technique to overfit. SVM presented poor performance as well, worse than LR which is the baseline. Table 8 presents a comparison between the performance of the semi-empirical methods (Décourt-Quaresma and Meyerhof), the LR and the RF algorithms. The subscript Dq indicates the use of the Décourt dataset for training. One can observe that even LR surpass the semi-empirical method of Meyerhof, encouraging the use of machine learning techniques for this type of problem. This conclusion is corroborated by other studies (Lee & Lee, 1996;Pham et al., 2020). Figure 5 complements the comparisons presented in Table 8, with abscissas representing measured values and ordinates representing predicted values. This study uses logarithm scale to better represent the wide range of values. A predicted value equal to the observed one produces a point at the black line, while poor predictions tend to produce points far from it. Note that the cloud of white circles, that represents RF Mey , is clearly more concentrated around the black line than black squares and triangles, which represent semi-empirical methods. It is also shown that the semi-empirical methods tend to underestimate the load bearing capacity, while points from RF Mey tend to be split in half by the black line.
In order to present a complementary analysis, Table 9 presents predicted values using RF Dq , the corresponding measured values Q u and the ratio between them. This ratio is organized in ascending order including all 165 load tests, with a range from 0.340 to 4.62. The objective is to produce confidence intervals for the RF Dq /Q u values, allowing a better understanding of the accuracy of the algorithm.
The authors first verify whether the RF Dq /Q u results follow a normal distribution using the Shapiro-Wilk test.  The procedure uses a significance level of 5% and a starting null hypothesis H 0 that data follows a normal distribution. However, when calculating the p-value, the result was far below 5%, indicating that the data does not have a normal behavior. This means that it is not possible to apply the confidence level theory for this distribution.
To solve this problem, this study proposes a less rigorous approach using the concept of percentiles. The n th percentile is a value greater than n percent of all values in the list. The authors use the RF Dq /Q u ordered list from Table 9 to estimate the confidence interval, considering that it must be centralized in the list with respect to the percentiles. The analysis proposes two confidence intervals: one of 90%, that must be limited by the 5 th and 95 th percentiles, and one of 95%, that must be limited by the 2.5 th and 97.  Figure 6 illustrates these results. Abscissa axis represents measured values, the ordinate axis represents predicted values and each point represents a RF Dq /Q u value. The continuous line is the locus of points with RF Dq = Q u , while the other lines represent the limits of the confidence intervals. Considering engineering practice, this graph can give to geotechnical engineers a sense of which confidence interval would suit better their specific case.

Case study
This section presents a case study with new examples to validate the generated models. The analysis uses results taken from three SPT soundings and load tests of precast concrete piles located in a construction site in Monte Largo, Paraná state, Brazil. These examples came from Lobo (2005) and were not used to train the machine learning techniques. The objective is to evaluate what would be the accuracy of the models if applied in the future to a completely new site. Figure 7 presents the SPT values and load test information.
The study starts calculating the results obtained with the original semi-empirical methods of Décourt-Quaresma and Meyerhof, as well as for the best performing techniques for each dataset. To facilitate comparisons, Table 10 presents all relative errors. For a predicted value ˆi y and an observed value i y , the relative error is ˆ/ = − i i i i RE y y y . Note that the first example seems to be more difficult than the other two. One possible explanation for this disparity is the soil of this examples, with the first underrepresented within the training datasets. This issue is investigated incorporating  these piles to the datasets, to verify if performance changes. The objective is verifying if the inclusion of two of the load tests of this construction site helps predicting the third one, as performed in the leave-one-out methodology. Table 11 presents the result. One can observe that most machine learning techniques presented some improvement for the first example, which is still the most difficult. For the other two, although some specific values increased, the overall performance of the techniques can be considered better. This allows concluding that the inclusion of information from the same construction site helped improving performance. In other words, the performance of the techniques for new examples depends on its representativity within the training dataset.

Conclusions
This work applies machine learning techniques to predict the bearing capacity of concrete precast piles. It presents two examples, the first with a general application and the second with a case study. The results obtained in the first example, considering all techniques applied to both datasets, allows concluding that RF is the best algorithm for this problem, with lower RMSE values. KNN and ANN also detached from the others, presenting the second best performance for Décourt-Quaresma and Meyerhof datasets, respectively. The semi-empirical methods of Décourt-Quaresma and Meyerhof presented a relatively poor performance in this example with an RMSE close to 900, being surpassed by all other techniques including LR. These results demonstrate that machine learning algorithms are a good alternative for predicting the ultimate bearing capacity of piles. The analysis proposed an approximation of the confidence intervals using the concept of percentile. A graph presented two intervals, 90% and 95%, to give engineers choices for the desired accuracy.
The second example presented a study to evaluate the effect of the representativity of the dataset. Results confirm that performance depends on representativity and also reveal the limits of these models, which tend to present poor accuracy for examples very different from the ones contained in the used datasets.

List of symbols a,b:
Arbitrary vectors a i ,b i : Vector components b: Number of blows needed for the sampler to penetrate 30 cm into the soil A p : Area of the pile tip f: Activation function I j : Combined input of a neuron k: Number of nearest neighbors K: Soil type factor K(a,b): Kernel l: Number of samples for a SVM L: Pile length n,m: Space dimensions ne: Number of examples o j : Output of a neuron p: Exponent of Minowsky equation q p ,q s : Parameters of Meyerhof´s method Q u : Pile bearing load capacity R 2 : Coefficient of determination RE i : Relative error R l : Lateral friction RMSE: Root mean square error R p : Tip resistance R t : Pile load capacity S 1 ,S 2 : Spaces of dimension n and m, respectfully SPT l : Mean N SPT index along the pile shaft Weight vector w ij : Adjustable connection weight x: Input of a SVM x i : Input of a neuron x i ,y i : Components of x and y, respectfully y: Output of a SVM y i : Observed valuê i y : Predicted value y : Mean of all observed values ξ i , * ξ i ,C: Parameters of a SVM α, β: Parameters of Décourt-Quaresma method θ j : Threshold value ρ,σ,γ,ω: Kernel parameters φ i : Component of j : Coefficients to be determined for a LR : Bias of a SVM : Calibration parameter : Mapping vector : Error to be minimized