Soils and Rocks

The most popular methods for soil classification from cone penetration test (CPT) data are based on examining two-dimensional charts. In the last years, several authors have dedicated efforts on replicating and discussing these methods using machine learning techniques. Nonetheless, most of them apply few techniques, include only one dataset and do not explore more than three input features. This work circumvents these issues by: (i) comparing five different machine learning techniques, which are also combined in an ensemble; (ii) using three distinct CPT datasets, one composed of 111 soundings from different countries, one composed of 38 soundings with information of soil age and the third composed of 64 soundings taken from the city of São Paulo, Brazil; and (iii) testing combinations of five input features. Results show that, in most cases, the ensemble of multiple models achieves better predictive performance than any technique isolated. Accuracies close to the maximum were obtained in some cases without the need of pore pressure information, which is costly to measure in geotechnical practice


Introduction
The classical approach for soil classification from CPT data is based on examining two-dimensional charts, with pioneer studies pursuing to predict the soil granulometrical distribution from two raw CPT measurements (Begemann, 1965). Later work stated that predicting soil behavior would be more useful for real engineering projects than predicting soil granulometry (Douglas & Olsen, 1981). As a result, the well-known Robertson classification methods were proposed, using two charts obtained from three raw CPT measurements (Robertson et al.,1986;Robertson,1990). These charts became particularly popular due to the proposed input transformations, capable of better separating soil classes. Nonetheless, further investigations exposed limitations in those methods (Jefferies & Davies, 1991), associated with overconsolidated clays with dilative behavior. Although these methods evolved to minimize these problems (Robertson, 1991), other studies have shown that similar limitations remained (Ramsey, 2002;Schneider et al., 2008). To overcome these limitations two new charts were proposed (Schneider et al., 2008(Schneider et al., , 2012. In recent work, these charts were modified to create a full behavior-based classification method (Robertson, 2016).
Many recent works from the literature have also applied machine learning (ML) techniques to different geotechnical problems and most of them use artificial neural networks (ANN) to predict soil characteristics (Goh, 1995(Goh, , 1996Schaap et al., 1998;Juang & Chen, 1999;Kumar et al., 2000;Juang et al., 2002;Juang et al., 2003;Hanna et. al., 2007). On the other hand, Livingston et al. (2008) used decision trees (DT) models, Kohestani et al. (2015) employed random forests (RF), whilst Goh & Goh (2007) induced support vector machine (SVM) models. In addition, most studies dedicated to soil classification from CPT data seek for new soil classes using data clustering (Hegazy & Mayne, 2002;Facciorusso & Uzielli, 2004;Liao & Mayne, 2007;Das & Basudhar, 2009;Rogiers et al., 2017;Carvalho & Ribeiro, 2020). But another possible approach, which is relatively unexplored in the literature, is using ML techniques to replicate predefined soil classification systems, like classical soil classification methods based on charts (Arel, 2012). Most work adopting this approach use only ANN models (Kurup & Griffin, 2006;Arel, 2012;Reale et al., 2018) and, when more ML techniques are used, applications are restricted to small CPT datasets, with all soundings taken at the same location (Bhattacharya & Solomtine, 2006). Recent work has explored the additional potentialities of ML techniques to prospect and discuss alternative geotechnical aspects of soil classification, using the k-nearest neighbor ML technique . Expanding this study with a larger and more diverse dataset, comparing more ML techniques and investigating different combinations of input features are the main objectives of this work.
Herewith, several ML techniques are trained to classify soil from CPT data, aiming to replicate classification systems generated with a student version of the CPeT-IT v2.0.2.5 software. First, CPeT-IT is used to classify all examples from three datasets: one composed of 111 CPT soundings taken from different countries; one composed of 38 soundings including soil age information; and the third composed of 64 CPT soundings taken from the city of São Paulo, Brazil. The authors believe that using more diverse data samples is important to reveal general properties of the problem and to assess the competence of the ML models more properly. Next, the collected soil samples are used to train the following ML techniques: distance-weighted nearest neighbors (DWNN), boosted DT, RF, ANN, SVM and a multiple model predictor (MMP), which is a combination of the previous models, aka a heterogeneous ensemble of classification techniques. In addition, the combination of different input features is tested, including the original inputs required by CPeT-IT. This allows to investigate and discuss novel geotechnical aspects related to soil classification. As a result, this work has achieved the following original contributions: -This is a first attempt to apply and compare multiple ML techniques of distinct biases (namely, DWNN, DT, RF, ANN, SVM) in a geotechnical application. In addition, their outputs are combined in an ensemble (MMP), resulting in higher predictive accuracies for soil classification; -Discussing the utility and application of Robertson charts for classifying tropical soil, as their usage is more common in the analysis of soil data from temperate countries; -Making possible to approximate Robertson soil classes without the need of pore pressure information, which is costly to measure in geotechnical practice. This is particularly important for the analysis of data from developing countries, which usually have severe budget constraints imposed on the engineering practice. Although the results that sustain the last contribution, presented in Section 5.4, are not enough to dismiss measuring pore pressure in real engineering projects, they are important to motivate discussions concerning novel methods for soil classification that may be especially appealing for underdeveloped and developing countries.

Classification methods used in CPeT-IT
This section describes the two soil classification methods replicated in this work using ML techniques. For both cases, class 0 denotes a misclassified soil.

Method influenced by soil granulometry (ISG)
One of the chart-based classification methods replicated in this work was proposed by Robertson (1991), which is referred as ISG throughout this text. In this reference, the author intended to include soil behavior within the classification system, nonetheless the defined classes refer to granulometrical soil composition only. Furthermore, borehole samples were used to make soil classes compatible with real soil types. The ISG soil classes are: -Sensitive, fine grained.
-Sand mixtures -silty sand to sandy silt.
-Sands -clean sand to silty sand.
-Gravelly sand to sand.
-Very stiff sand to clayey sand.
-Very stiff, fine grained. The four basic parameters measured in CPT are depth (z), uncorrected cone resistance ( c q ), lateral friction ( s f ) and pore pressure in a disturbed state ( 2 u ), usually measured behind the cone tip. In the method proposed by Robertson (1991), these parameters are combined to obtain normalized versions.
First, c q is corrected to discount the water pressure aiding cone penetration, resulting the total cone resistance t q . Next, the equilibrium pore pressure 0 u is needed to calculate the excess pore pressure 2 0 − u u . The 0 u value can be obtained by drawing a straight line through the 2 u value in the graphic. . In order to eliminate correlations, Robertson (1990) proposed that n q should be divided by ' 0 σ v to discount overburden and that s f and 2 0 − u u should be divided by n q , resulting in the normalizations presented in Equations 1 to 3: Later work (Robertson & Wride, 1998) found that the exponent n of ' 0 σ v in the 1 t Q expression should be 1 only for pure sands, 0.5 only for pure clays and intermediary for mixtures of them. The result is presented in Equation 4: where pa is a reference pressure of 0.1 MPa. The exponent can be obtained with the Equation 5: The parameter c I can be calculated as presented in Equation 6 (Robertson, 2009): Based on the previous equations, two charts are proposed by Robertson (1991) for soil classification. After obtaining raw CPT values and performing all procedures defined previously, a point can be placed in these charts, resulting in an attribution to each soil example. That is, the area to which the point belongs gives the class of the corresponding collected soil. If the obtained point is located outside the ranges defined within these charts, the soil is considered misclassified, receiving class 0 .

Method focused on soil behavior (FSB)
The second soil classification method replicated in this work was proposed by Robertson (2016) and is referred as FSB throughout this text. It includes, as a new application, a method to identify if soil contains microstructure. In this method, considered fully behavioral in the literature, soil classes are divided into three main blocks: clay-like, sandlike and transitional. One advantage of this division is that the behavior of sands and clays is clearly separable. Sands usually present high strength, low compressibility and high permeability, while clays usually present low strength, high compressibility and low permeability. Each soil group is subdivided as pursuing dilative or contractive behavior, according to the consolidation state. A separate class was created for contractive clays that are sensitive to disturbance. The FSB classes are: -CCS: Clay-like -Contractive -Sensitive.
-SD: Sand-like -Dilative. One problem of the ISG method, described in the previous section, is that q B has strong negative correlation with tn Q , which makes highly overconsolidated clays indistinguishable from very dense sands (Schneider et al., 2008). To solve this problem, a new normalized excess pore pressure was proposed (Robertson, 2016) as: The FSB method then employs two charts, one using r F and tn Q and the other using 2 U and tn Q . The first is similar to the chart proposed in Schneider et al. (2008), while the second uses the hyperbolic curves presented in Schneider et al. (2012). New curves are also added to the × r tn F Q chart to separate dilative and contractive behaviors, as well as for separating the contractive sensitive behavior.
The values obtained for tn Q , r F and 2 U enable obtaining one point in each of the charts. If classes given in both charts do not agree, the soil is considered misclassified (class 0 ). In addition to that, a soil sample is attributed to class 0 if the point is located outside the ranges of tn Q , r F and 2 U of the charts and if a modified normalized small-strain rigidity index is greater than 330 . Robertson (2016) highlights that the FSB method is inaccurate for aged or cemented soils, which contain microstructure.

Machine learning (ML) techniques employed
In this work, six ML techniques of distinct biases are used to replicate the soil classification methods described in Section 2. In this Section, a brief theoretical description is given for DWNN, DT, RF, ANN and SVM. In the MMP model, all previous five ML models have their outputs combined in the classification of new samples by a majority voting strategy. Table 1 presents the main advantages and disadvantages experienced by the authors, applying these ML techniques to soil classification problems.

Distance-weighted nearest neighbors (DWNN)
The DWNN technique (Dudani, 1976) is a distancebased technique, meaning that it uses distances to evaluate if two objects x and y are similar. In this work, the Euclidean distance is used, which can be written as: In DWNN, all known examples (composing the training dataset) can be regarded as a cloud of points within the input space. A new point can be classified according to its proximity to the known examples. For instance, it can be classified into the same class of its nearest neighbor. Or a majority voting of the classes of the k nearest neighbors can be employed instead. Weights can also be assigned to the votes of the nearest neighbors, proportional to the inverse of their distance to the new data point. This results in the DWNN technique. A Gaussian DWNN weighting is used in this work, which is given by: where ( ) , d x y is the Euclidean distance between two data items expressed in Equation 8. A recent work has shown that Gaussian weighting leads to better predictive performance in soil classification than attributing the same weights to all nearest neighbors (Carvalho & Ribeiro, 2019).

Decision trees (DT) and random forest (RF)
A DT can be defined as a graph with a tree structure, containing decision and leaf nodes (Quinlan, 1986). The decision nodes perform tests on the feature values of the data points, whilst leaf nodes output a class. Starting from the root node, the feature values of an example are used to decide to each branch of the tree the example will proceed until a leaf node is reached, giving the final classification of the object. Figure 1 illustrates a DT with six decision nodes (tests) and seven leaf nodes (classes).
The test performed by each decision node is usually chosen to maximize a goodness of split criterion, that is, the ability of distinguishing the classes. One problem of DTs is that they tend to overfit if they are induced to classify all training points correctly, meaning that the obtained solution can achieve good results only when applied to the same dataset that was used for its training. Overfitting can be avoided by DTs in multiple ways. One of them is pruning branches of the DT. Other strategy, employed in this work, is to join multiple trees trained using bootstrapping samples from the original dataset. From this point of this text, DT associated with the bootstrapping method is referred simply as DT. RF is another ensemble of tree-based models (Ho, 1995) which also randomly chooses subsets of input features from the original dataset in the bootstrapping procedure.

Artificial neural networks (ANN)
ANN are based on the brain structure and processing. Their fundamental units, the neurons, communicate to each other using weighted signals that usually belong to the [0,1] interval. The output of a neuron can be an input of another neuron, so that multiple layers of neurons can be combined. The neuron model presented in Figure 2 is called McCulloch  The MCP neuron receives input signals i x , which are multiplied by weights i w and summed up. After an excitation threshold θ is discounted, a signal is produced. This signal is input to an activation function g, generating an output signal y . In the original MCP model, the activation function is a stepwise or signal function. Alternative functions, including non-linear functions, can provide more representative power to the ANN models. If many artificial neurons are combined in layers, the model is called multi-layer perceptron neural network (Rumelhart et al., 1986). In this work, ANN architectures using up to two hidden layers were tested. The output layer has one neuron representing each class. The neuron outputting the highest value defines the final classification.
One can demonstrate that a network with a single hidden layer of neurons with non-linear activation functions can reproduce any continuous function, and that a network with two hidden layers of such neurons can reproduce any function (Hornik et al., 1989). Considering that a limit must be imposed to select among infinite possible architectures, in this work networks with three or more hidden layers are not tested.

Support vector machines (SVM)
In its simplest version, the SVM technique divides the input space with a hyperplane and assigns one class to each side. The optimal hyperplane seeks to maximize the margin of separation between both classes, as illustrated in Figure 3.
The support vectors correspond to examples that are placed over the margin limits after the hyperplane is defined. In Figure 3, for example, four support vectors are represented, two white circles and two white squares. In this work a soft-margin version of SVM is used, being possible that points remain within the margins or even on the wrong side of the decision border.
One limitation of this version of SVM is that it admits only linear separations between the classes. One way of extending the SVM to solve non-linear classification problems is by mapping the original input space into a higher dimension space, using a function called kernel. After preliminary tests, the polynomial kernel was chosen here due to its better predictive performance compared to other types of kernel functions. Considering x and y two points in input space, the polynomial kernel can be written as: where δ , κ and α are calibration parameters.
Although the described version of SVM is defined only for separating two classes, it is possible to extend it to multi-class problems by simply combining two or more binary classifiers. In this procedure, all classes must be evaluated in pairs, generating ( ) c classifiers for c classes.

Data analysis
The analysis performed in this paper use the following parameters from CPT soundings: q B : Dimensionless pore pressure normalization used by Robertson (1991). r F : Dimensionless lateral friction normalization used by Robertson (2016). -1 t Q : Dimensionless cone resistance normalization used by Robertson (1990).

Description of the used datasets
Professor P. K. Robertson provided the 38 soundings described in Table 2 and Professor P. W. Mayne provided the 73 soundings described in Table 3. The information given by these 111 soundings compose the dataset used in the main studies of this work; therefore, it is hereafter named Main dataset.
A second dataset, here named Geological dataset, is gathered to investigate the influence of soil age within soil classification. The motivation for its usage is the difficulty reported in the literature for classifying aged soil (Robertson, 2016). A variable called soil age (SA) is then proposed, which is represented by a number related to the geological age when the soil was deposited. The Geological dataset, which is described in Table 4, uses information only from the 38 soundings provided by Robertson because no information about soil age was available for the other soundings.
The third dataset used in this work is composed of 64 CPT soundings from the metropolitan area of São Paulo, Brazil, being here named Tropical dataset. Measurements were taken at each 2 cm of depth and included more than forty thousand soil examples. These soundings were provided by the São Paulo Metropolitan Company under a confidentiality term, so most information about it cannot be exposed here.
Robertson charts were produced using samples taken from temperate regions, which can lead to uncertainty when applied to tropical soil. To discuss this issue, in section 5.2 the Tropical dataset is used to test if the performance of the ML techniques remains accurate. The study is divided in two parts, in the first the Main dataset is used for training the ML techniques and the Tropical dataset is used for testing. The objective of this first part is discussing if Brazilian soil can be accurately classified using soil information from other countries. In the second part, the Tropical dataset is used for both training and testing, aiming to observe if accuracy raises when compared to the first part. Figure 4 presents data of one of the CPT soundings to illustrate the used data.

Data preprocessing
As data-driven techniques, ensuring data quality is important when ML techniques are concerned. The identification and treatment of outliers, which are inputs with discrepant values, is one of the important steps for a proper data cleansing. One way of automatically detecting potential outliers is by the use of boxplots. Nonetheless, preliminary tests have shown that removing all potential outliers severely reduces accuracy. In this work, this problem is avoided by applying the Edit Nearest Neighbor technique (Wilson, 1972). It compares the classes of the potential outlier and its nearest neighbors, removing it only if their labels do not match.
Another problem is an imbalance within classes, which can bias the ML techniques towards the majority class in detriment of classes with less examples. An evaluation based on histograms allowed identifying some issues, solved as listed next: 1) There were too few ISG class 0 examples, therefore they were completely removed from the datasets. FSB class 0 examples were maintained; 2) ISG classes were very imbalanced within the Geological dataset, therefore all analysis with this dataset were restricted to the FSB method; 3) Random sampling was applied to reduce majority classes, considering that CPT data contains several redundancies due to many measurements taken within each soil layer; 4) Minority classes were incremented applying the SMOTE oversampling technique (Chawla et al., 2002). After procedures 3 and 4, all classes have the same number of examples. A second data transformation is applied for the ANN, SVM and MMP analyses, imposing a logarithmic  scale to each input feature. This procedure was adopted because the original charts from Robertson use logarithmic scale and preliminary tests showed that better performance is achieved with this transformation. Figure 5 shows an example of the logarithmic scale effect.

General methodology
The 10-fold cross-validation procedure is applied for each dataset and input combination. In this process, the original dataset is divided into 10 partitions called folds, in which the class proportion is kept the same as in the original dataset. Among these 10 folds, one is used for testing, one is used for validation and the remaining compose the training set. The training set is the only one subject to all preprocessing procedures and is used as a reference for all predictions. The validation fold is used to calibrate the parameters of each technique and the testing fold is used to measure predictive performance for new data points previously unseen by the ML techniques. At each step of the 10-step procedure a different testing fold is selected, and the final predictive performance is given by the average and standard deviation of the ten values obtained.
The most common performance metric adopted in multiclass problems is accuracy, which is given by the total number of correct predictions divided by the total number of objects. Nevertheless, majority classes can bias this measurement once the testing and validation folds are not balanced. To solve this problem, the predictive performance measure used in this work is obtained by calculating accuracy for each class separately and then calculating their mean value. This value would be the accuracy if the classes were balanced and had the same number of objects. For simplicity, this performance measure is called accuracy here, although it is commonly referred as balanced accuracy in the ML literature.
The calibration process performed for each technique is described in Section 3.

Comments about the inputs
Many variables mentioned in previous sections can be used as inputs for the ML techniques. Specific combinations are selected here considering previous work from the authors  and the objectives of the present study. These combinations are: 1) z , t q , s f and 2 u : Raw CPT measurements, except for the correction of the cone tip resistance from c q to t q ; 2) z, 1 t Q , r F and q B : Depth plus normalizations proposed by Robertson (1990); 3) z, tn Q , r F and 2 U : Depth plus normalizations proposed by Robertson (2016); 4) tn Q , r F : Inputs used by the ISG method;

5)
tn Q , r F and 2 U : Inputs used by the FSB method; 6) z, 1 t Q , r F , q B and SA: Depth plus normalizations proposed by Robertson (1990) plus soil age; 7) z, tn Q , r F , 2 U and SA: Depth plus normalizations proposed by Robertson (2016) plus soil age; 8) z, c q and s f : Raw CPT measurements, excluding 2 u and not correcting c q to t q .
The use of combination 1 has the objective of evaluating how accurately ISG and FSB can be replicated without using the normalizations proposed by Robertson. Combinations 2 and 3 aim to test predictive performance when such normalizations are combined to depth. The original input combinations 4 and 5 are used as a reference, while combinations 6 and 7 aim to evaluate if soil age improves predictive performance. The last combination 8 refers to CPT equipment which cannot measure pore pressure, making impossible to correct c q to t q .

General performance for replicating ISG and FSB
Results in this section refer to the general performance of the ML techniques when applied to the Main and Geological datasets. These results are summarized in Table 5, where each line represents a 10-fold cross validation test (see Section 4.3). The first column presents the used inputs, and the second column represents the replicated method, ISG (Section 2.1) or FSB (Section 2.2). Considering that 10 tests (one for each fold) are made for each line, resulting in 10 separated accuracy measurements, other columns represent their mean value (MA stands for mean accuracy) for each technique. One can calculate MA from the individual accuracies i Ac using the expression: One can observe that MA is above 91% in all lines for MMP, which can be considered a good predictive performance for soil profiling. In most cases MMP presents best performance, in others it presents a performance close to the best one. Results obtained with z, t q , s f and 2 u show that accurate soil classification is possible without the data transformations proposed by Robertson. As expected, high accuracies are obtained when the original inputs are used for each method, tn Q and r F for ISG and tn Q , r F and 2 U for FSB. Nonetheless, the highest accuracy for ISG was obtained when z, tn Q , r F and 2 U were used as inputs for MMP and the highest accuracy for FSB was achieved when z, tn Q , r F , 2 U and SA were used for MMP. This suggests that including depth as an input brings relevant information to soil classification.
Reasonable accuracy was obtained for ANN and SVM only after applying logarithmic scale, as presented in Figure 5.
Preliminary tests have shown that objects assigned to class 0 in FSB prejudice the predictive performance of ANN and SVM. In order to quantify this influence, additional experiments were performed removing these objects from the training and test sets, resulting the values presented in Table 6. SD stands for standard deviation and, for a sake of conciseness, only results for MMP are presented. As the proposal is to focus on the FSB method, results from the ISG method are omitted. One can observe that a higher MA is achieved for most of the cases, including values close to 100%. This suggests that objects assigned to the class 0 of the FSB method do not form a homogeneous region within input space, making the classification problem harder.
In order to complement the application of ML techniques for soil profiling, the MMP was employed to determine the soil profile according to the ISG method for a sounding taken in Vancouver, Canada and provided by Professor Renato da Cunha (Cunha, 1994). The Main dataset was used for training. Comparing the result obtained with CPeT-IT v2.0.2.5 to the one obtained with the MMP they are almost the same, with an accuracy of 95.4%.

Study with the Tropical dataset
Once the DWNN technique did not present good performance its results are omitted, as well as some input combinations tested in Section 5.2, to avoid redundancy.
Results from the first part of the study are shown in Table 7. One can observe that, even though the multiple model is not the best performing technique for all testing combinations, its performance is in general close to the best one. This shows that MMP is stable, while larger variations can be observed for the other techniques. Comparing Table 7 to  Table 5, one can observe that accuracy drops in all cases.
Results from the second part of the study are presented in Table 8. The general behavior of the MMP is maintained, presenting stability and good performance when compared to other techniques. In some cases, accuracies close to 100% were obtained, showing that the information of the Tropical dataset is substantially different from the information of the Main dataset. This suggests that it is justifiable to develop new soil classification methods specific for tropical soil.

Soil classification without measuring the pore pressure
Once not all CPT equipment available in the market measure the pore pressure 2 u , one could question if this variable is really needed for soil classification. Consulting Section 2 one quickly concludes that, without 2 u , classifying soil within the original ISG and FSB methods is not possible. Pore pressure 2 u plays a fundamental role throughout the methodology proposed, not only for correcting cone resistance but also for calculating stresses and obtaining the final normalizations. Therefore, since the approach presented here simply replicates those charts, one should not conclude from this study that measuring 2 u could be neglected for soil classification in real engineering projects. Nonetheless, the aim here is to start a discussion in this direction, possibly leading to further studies with conclusions that are more consistent.
In this context, additional experiments were performed to verify if the friction penetrometer without the pore pressure filter could provide enough information for obtaining a rough approximation of the soil classes. Therefore, all techniques plus the MMP were tested with the Main dataset using only z, c q and s f as inputs, resulting the values presented in Table 9. This study was replicated for the ISG method, for the FSB method with class 0 objects and for the FSB method without class 0 objects.
One can notice that all techniques achieved accuracy higher than 90% for the ISG method, which can be considered reasonable for soil profiling. Although lower accuracies were obtained for the FSB method, the accuracy values can also be considered practicable, especially when objects assigned to the class 0 are removed. These results show that, for this specific dataset, soil can be classified within reasonable accuracy with CPT data that do not include pore pressure filter measurements.

Conclusions and recommendations
A general methodology for the application of ML techniques for soil classification from CPT data is presented in this paper, including six ML techniques of distinct biases: DWNN, DT, RF, ANN, SVM and MMP, which is a combination of the previous techniques. MMP joins the predictions of the multiple individual models by majority voting, producing a heterogeneous ensemble of classifiers. All techniques are applied initially to a dataset composed of 111 CPT soundings, testing different input combinations within a 10-fold cross-validation procedure. Training data is also subject to a preprocessing procedure within each 10-fold cross-validation step for improving data quality, including data transformation, cleaning and balancing. Tests are also performed with two other datasets, one containing soil age information and the other with tropical soil information. The original CPT measurements included within the analysis are depth z, cone resistance c q and corrected cone resistance t q , lateral friction s f and pore pressure 2 u . Included normalizations are the cone resistances 1 t Q and tn Q , the lateral friction r F and the pore pressures q B and 2 U . A soil age SA parameter was also included, representing the geological age when the soil was deposited.
The machine learning techniques were successfully compared and combined in an ensemble that produces more accurate results that any isolated technique. MMP can be also considered the most stable technique, with accuracies above 93% in most cases. The predictive results in the classification of soil samples from tropical areas are in general inferior to those recorded for soil from temperate areas, especially when the models built from temperate areas are employed in the classification of soil from tropical areas. This indicates the need to develop classification methods specific for tropical soil, which the authors suggest as future work. Another important observation is that accuracy remains reasonable for all techniques even if pore pressure information is omitted during training. These results can encourage future work pursuing soil classification methods that do not use pore pressure information, which can be costly to measure and requires specialized equipment. The results do not allow concluding that pore pressure measurements can be dismissed in real engineering projects, but that soil classes can be roughly approximated without this information. This can become an alternative for initial geotechnical studies in underdeveloped and developing countries, where budget constrains limit engineering practice. It is important to notice that none of these discussions would be possible by using the original Robertson charts alone, once these methods do not allow changing inputs or using incomplete data.