Machine Learning-Based Prediction of the Seismic Bearing Capacity of a Shallow Strip Footing over a Void in Heterogeneous Soils

: The seismic bearing capacity of a shallow strip footing above a void displays a complex dependence on several characteristics, linked to geometric problems and to the soil properties. Hence, setting analytical models to estimate such bearing capacity is extremely challenging. In this work, machine learning (ML) techniques have been employed to predict the seismic bearing capacity of a shallow strip footing located over a single unsupported rectangular void in heterogeneous soil. A dataset consisting of 38,000 ﬁnite element limit analysis simulations has been created, and the mean value between the upper and lower bounds of the bearing capacity has been computed at the varying undrained shear strength and internal friction angle of the soil, horizontal earthquake accelerations, and position, shape, and size of the void. Three machine learning techniques have been adopted to learn the link between the aforementioned parameters and the bearing capacity: multilayer perceptron neural networks; a group method of data handling; and a combined adaptive-network-based fuzzy inference system and particle swarm optimization. The performances of these ML techniques have been compared with each other, in terms of the following statistical performance indices: coefﬁcient of determination (R 2 ); root mean square error (RMSE); mean absolute percentage error; scatter index; and standard bias. Results have shown that all the ML techniques perform well, though the multilayer perceptron has a slightly superior accuracy featuring noteworthy results (R 2 = 0.9955 and RMSE = 0.0158).


Introduction
Underground voids may be caused by human actions or by the dissolution of soluble substances. Mining, digging aqueducts, the creation of sewerage networks and municipal facilities, and tunneling operations are examples of those human activities, whereas chemical interactions of calcareous and dolomite soils, and soluble rock dissolution, fall into the class of natural causes [1,2]. The voids, especially in urban areas, may be located adjacent to, or below shallow footings. In such cases, the performance of strip footings can be significantly affected by the presence of the underground voids, which therefore require special attention in the design process. A number of factors are linked to the impact of voids on the bearing capacity of strip footings, and they all have to be accounted for concurrently to achieve an optimal design, even in such adverse situations. In the engineering practice, void existence may adversely affect the stability of constructions, leading to a differential settlement of superstructures or the collapse of foundations.
To date, various studies have been conducted to investigate the bearing capacity of strip footings above voids, by using either theoretical or experimental methods. Preliminary studies examined the fundamental issues of the problem, see examples [3][4][5][6][7][8]. In those works, the focus was on the bearing capacity of strip footings over a single void. Later, Kiyosumi et al. [9] carried out a set of centrifuge tests over multiple voids, and the results were assessed through the PLAXIS finite element code. Kiyosumi, Kusakabe, and Ohuchi [10] then analyzed the bearing capacity of a strip footing on stiff grounds with voids, and they classified the failure patterns for a single hole into three categories: roof failure; sidewall failure; and combined failure. Yamamoto et al. [11], in a case characterized by multiple voids, demonstrated that closely spaced tunnels showed a good correlation between the results of finite element simulations and upper-bound limit analysis.
This problem was investigated using small strain kinematics. Lee, Jeong, and Ko [12] quantified the influence of single and dual continuous voids, allowing for a wide variety of geometries and material combinations on the undrained vertical load bearing capacity of strip footings. Lee, Jeong, and Ko [13] also studied the effect of load inclination. Lee, Jeong, and Lee [14] examined the undrained bearing capacity factors of ring footings, to assess the impact of the friction angle in heterogeneous soils. The results showed that, by increasing the inside-to-outside-radius ratio, the value of the soil bearing capacity factor is reduced, whereas when the ratio is reduced, the size of the plastic zones gets enhanced-the bearing capacity also rises with the soil heterogeneity. Lavasan et al. [15] offered a new framework for estimating the bearing capacity of a strip footing over voids, and showed that the bearing capacity is significantly affected by the void's geometry and position. Chakraborty and Sawant [16] focused on the impact of pseudo-static horizontal earthquake forces on the bearing capacity of strip footings placed over an unsupported circular tunnel in undrained clay.
More recently, the finite element limit analysis (FELA) method was used by Xiao et al. [17]. Design charts were given for a reduction factor to estimate the ultimate bearing capacity of a strip footing over a rock mass with voids. Xiao, Zhao, and Zhao [18] applied the same method to assess the undrained bearing capacity above voids in two-layered clays. Zhou et al. [19] instead used the discontinuity layout optimization approach to determine the bearing capacity above square voids in cohesive-frictional soils, reporting typical failure patterns. Lee and Kim [20] investigated the bearing capacity above multiple square voids in sandy soils and purely cohesive clays, and they also explored the effect of load inclination. Wu et al. [21] studied the effects of the depth and geometry of the voids, and of the eccentricity of loading on the bearing capacity. Recently, Zhang et al. [22] investigated the impact of seismic loading on the bearing capacity of undrained clay with voids.
To the best of the authors' knowledge, no research has been carried out so far to predict the bearing capacity of a shallow strip footing above a void by means of approaches based on ML tools. In the present study, a first attempt is made to set an extensive database dealing with the various factors affecting the said bearing capacity for strip footings above a single unsupported void in either heterogeneous or homogenous soils, under seismic and quasi-static excitations. The considered factors are: Results have been obtained as lower and upper bounds on the load bearing capacity of the strip footing via the FELA method. First, a sensitivity analysis has been conducted to quantify the effects of these parameters on the ultimate footing bearing capacity. Next, to predict the seismic bearing capacity, the multiplayer perceptron (MLP), group method of data handling (GMDH), and adaptive-network-based fuzzy inference system and particle swarm optimization (ANFIS-PSO) techniques have been adopted.
The proposed procedure, based on the ML tools, can therefore be framed in the following way: for all the problems characterized by the geometry of strip footing and a void similar to the one here considered, and by heterogeneous solids, of which the main properties affecting their resistance are cohesion and friction angle within the investigated ranges, the said ML tools can be assumed already tuned and trained. Practitioners can then use the ML-based models as a kind of black-box, to automatically and quickly obtain an estimation of the seismic bearing capacity of any shallow strip footing, so that the detrimental effects induced by the interaction with the buried, unsupported void can be easily quantified.
The remainder of this paper is organized as follows: in Section 2, the attacked problem is defined in detail, and the modeling in Optum G2 is validated with literature results to ensure the accuracy of the solutions in the database. The results of a parametric study, performed to assess how the aforementioned factors affect the solution, are reported in Section 3. The ML tools adopted to learn the link between the input factors and the footing bearing capacity, namely the output, are described in Section 4. The outcomes generated by the ML algorithms are given in Section 5 in statistical terms to assess the accuracy in reproducing all the ground-truth results in the dataset, and in Section 6 for a case study featuring a specific geometry, in order to provide a concrete idea concerning the accuracy of the attained results. The conclusions of the present work and foreseen future developments are then expressed in Section 7. Finally, Appendix A gathers the equations relevant to the MLP and GMDH models, and the MATLAB code for the adopted ANFIS-PSO model.

Problem Definition
A literature review, regarding the topic of concern here, has pointed out that the interaction between a strip footing and voids displays interesting complexities. Several parameters affect the interaction of the foundation, the soil, and the voids/tunnels. These parameters are listed in Table 1, along with the papers where they were already accounted for. Internal friction angle In Figure 1, the two-dimensional geometry of the specific problem considered in this work is depicted. Assuming the strip footing and the tunnel to extend infinitely in the out-of-plane direction, plane strain conditions are adopted in the analysis. The strip footing is assumed rigid, featuring a rough surface in contact with the heterogeneous soil. According to the proposed setting, the following dimensionless variables are adopted to parametrize the problem: where c 0 is, as stated above, the value of the cohesion at the free surface z = 0. For a homogeneous soil, k is null, whereas for heterogeneous soils, k is positive-additional details can be found in [41,42].
ing to the proposed setting, the following dimensionless variables are adopted to parametrize the problem: -Horizontal seismic acceleration coefficient , which is equal to the ratio between the horizontal earthquake-induced ground acceleration and the gravity acceleration. -Ratio = / between the void width and the foundation width . -Ratio = / between the void height and the foundation width . -Undrained shear strength / of the soil at the ground level, being the cohesion of the soil, and the soil specific gravity.
-Rate of change / of the cohesion with the depth, according to the law: where is, as stated above, the value of the cohesion at the free surface = 0. For a homogeneous soil, is null, whereas for heterogeneous soils, is positive-additional details can be found in [41,42].  Accounting for all the parameters listed above, the (dimensionless) undrained seismic ultimate bearing capacity Q of a strip footing placed above the void is assumed to be: To set the database for digging with the ML algorithms, FELA analyses have been carried out to determine using the Optum G2 software [43]. By exploiting mesh adaptivity, the code allows for the mesh to be progressively refined, namely reducing the characteristic size of the elements where plastic dissipation takes place. On its own, this mesh adaption allows for the bounds to be tightened on the ultimate bearing capacity Q Accounting for all the parameters listed above, the (dimensionless) undrained seismic ultimate bearing capacity Q of a strip footing placed above the void is assumed to be: To set the database for digging with the ML algorithms, FELA analyses have been carried out to determine Q using the Optum G2 software [43]. By exploiting mesh adaptivity, the code allows for the mesh to be progressively refined, namely reducing the characteristic size of the elements where plastic dissipation takes place. On its own, this mesh adaption allows for the bounds to be tightened on the ultimate bearing capacity Q generated by each analysis, as described below, without any (potentially not objective) user intervention.
This approach is broadly used in geotechnical engineering to solve stability problems [11,44] and ascertain the footing bearing capacity [18,[45][46][47][48]. Through the FELA method, which is a mixture of the finite element and the classical limit analysis of plasticity, lower and upper bounds on the ultimate bearing capacity of the foundation can be obtained. For further details regarding the theoretical background and the numerical implementation Algorithms 2021, 14, 288 5 of 27 of this method, readers are referred to [49][50][51] for the lower bound finite element limit theory, and to [49,[51][52][53] for the upper bound one. Notably, a striking advantage of the FELA approach is that preliminary assumptions related to the failure surface are not needed [54].
Meshes consisting of triangular elements have been adopted, independently of the values of the geometrical parameters introduced above. To start the iterative mesh adaption procedure, each model has been assumed to be made of 1000 elements. Through h-adaptivity, elements have been automatically generated or reduced in size where shear stress concentration and dissipation is going to affect the solution, until a limit of 10,000 elements has been attained. Such initial and final numbers of elements have been chosen on the basis of the results presented in [55,56], the latter to specifically keep the computational burden of each analysis below a critical threshold.
The Mohr-Coulomb failure criterion has been used to model the soil behavior, while the strip footing has been modeled as a weightless rigid plate. The ultimate load bearing capacity has then been defined through a multiplier for a uniform load applied to the footing, and automatically increased until the formation of collapse mechanisms.
In the analyses, a strip footing with a width B = 1 m has been placed on soil with a specific weight of 20 kN/m 3 . To avoid any perturbation to the solution induced by the domain boundaries, a model with a width of 40B and a depth of 20B has been considered. The contact of the soil with the bedrock has been modelled by constraining both the horizontal and the vertical displacements along the bottom surface. Along the side boundaries, only the horizontal components of the displacements have instead been restrained.
The pseudo-static method has been adopted to assess the seismic performance of the foundation over the cavity. For this purpose, the value of the horizontal acceleration coefficient k h has been exploited, and once the upper and lower bounds on q u or Q at collapse have been computed, the relevant mean value has been assumed as the representative result for each model. An example of the adopted models is depicted in Figure 2, where the typical FELA adaptive mesh of the Optum G2 software, along with the adopted boundary conditions, are sketched.
This approach is broadly used in geotechnical engineering to solve stability problems [11,44] and ascertain the footing bearing capacity [18,[45][46][47][48]. Through the FELA method, which is a mixture of the finite element and the classical limit analysis of plasticity, lower and upper bounds on the ultimate bearing capacity of the foundation can be obtained. For further details regarding the theoretical background and the numerical implementation of this method, readers are referred to [49][50][51] for the lower bound finite element limit theory, and to [49,[51][52][53] for the upper bound one. Notably, a striking advantage of the FELA approach is that preliminary assumptions related to the failure surface are not needed [54].
Meshes consisting of triangular elements have been adopted, independently of the values of the geometrical parameters introduced above. To start the iterative mesh adaption procedure, each model has been assumed to be made of 1000 elements. Through hadaptivity, elements have been automatically generated or reduced in size where shear stress concentration and dissipation is going to affect the solution, until a limit of 10,000 elements has been attained. Such initial and final numbers of elements have been chosen on the basis of the results presented in [55,56], the latter to specifically keep the computational burden of each analysis below a critical threshold.
The Mohr-Coulomb failure criterion has been used to model the soil behavior, while the strip footing has been modeled as a weightless rigid plate. The ultimate load bearing capacity has then been defined through a multiplier for a uniform load applied to the footing, and automatically increased until the formation of collapse mechanisms.
In the analyses, a strip footing with a width = 1 m has been placed on soil with a specific weight of 20 kN/m . To avoid any perturbation to the solution induced by the domain boundaries, a model with a width of 40 and a depth of 20 has been considered. The contact of the soil with the bedrock has been modelled by constraining both the horizontal and the vertical displacements along the bottom surface. Along the side boundaries, only the horizontal components of the displacements have instead been restrained.
The pseudo-static method has been adopted to assess the seismic performance of the foundation over the cavity. For this purpose, the value of the horizontal acceleration coefficient has been exploited, and once the upper and lower bounds on or at collapse have been computed, the relevant mean value has been assumed as the representative result for each model. An example of the adopted models is depicted in Figure 2, where the typical FELA adaptive mesh of the Optum G2 software, along with the adopted boundary conditions, are sketched.  To validate the modeling approach, the results of some studies in the literature have been considered for comparison purposes. First, as seen in [57], the bearing capacity of the foundation has been computed via the lower bound FELA method, in case of horizontal seismic loadings. The relevant results are compared in Figure 3, in terms of the bearing capacity factor N γ , to show how they perfectly match the available ones. Second, a comparison with the studies of [57][58][59][60][61][62][63][64] has been made through the data gathered in Table 2. Such a comparison is provided in terms of the bearing capacity factor, at a varying value of the internal friction angle ϕ. The results again show a good agreement with those published, and correctly bound almost all of them. Third, in Table 3, a further comparison is provided for the bearing capacity factor N c , allowing for the presence of a void at different depths D beneath the strip footing central line, see [10,12,19]. It has been assumed that the opening width is equal to the surface footing width, and is located within an utterly cohesive soil at an undrained state with cohesion varying in the range 60-300 kPa. Results are shown again to agree well with the reference ones.
been considered for comparison purposes. First, as seen in [57], the bearing capacity of the foundation has been computed via the lower bound FELA method, in case of horizontal seismic loadings. The relevant results are compared in Figure 3, in terms of the bearing capacity factor , to show how they perfectly match the available ones. Second, a comparison with the studies of [57][58][59][60][61][62][63][64] has been made through the data gathered in Table 2. Such a comparison is provided in terms of the bearing capacity factor, at a varying value of the internal friction angle . The results again show a good agreement with those published, and correctly bound almost all of them. Third, in Table 3, a further comparison is provided for the bearing capacity factor , allowing for the presence of a void at different depths beneath the strip footing central line, see [10,12,19]. It has been assumed that the opening width is equal to the surface footing width, and is located within an utterly cohesive soil at an undrained state with cohesion varying in the range 60 − 300 kPa. Results are shown again to agree well with the reference ones.     Table 3. Bearing capacity of the foundation: comparison with literature data, in terms of the capacity factor N c for a strip footing above a single void.

Parametric Study
Before moving to the setting of the database for the application of the ML algorithms, a parametric analysis is here presented. To evaluate the effect of the parameters listed in Section 2 on the seismic behavior of the system, a set of analyses has been designed. In an undrained situation, for a strip footing with a rough surface on the ground level, the pseudo-static acceleration coefficient at the threshold of strip footing sliding is 0.38. To study the effect of earthquakes, values of k h = 0, 0.15, 0.25, and 0.35 have been accounted for.
In the following, the outcomes of the parametric analysis are subdivided, in order to focus first on the effects of parameters related to the soil behavior, and next on the effects of parameters related to the geometry and position of the void.

Soil Parameters
The results in Figure 4 provide insights into the variation of the dimensionless bearing capacity Q = q u γB , see Equation (2), induced by the soil features, under different values of the horizontal seismic acceleration coefficient k h . Figure 4a shows the outcome of the analysis for a broad range of values of the undrained shear strength. The graph confirms that for a low shear strength, that is for c 0 /γB < 1, the model can be unstable (especially regarding the lower bound state): void collapse occurs without any loading and the value of Q gets close to zero, though it cannot be computed due to the instability of the model. As far as the effect of kB/c 0 on the seismic and static bearing capacity of the shallow footing is concerned, Figure 4a also shows that the bearing capacity raises slightly under seismic conditions, however, it then becomes independent of kB/c 0 .

Void Parameters
Results are depicted in Figure 5 showing the variation of the bearing capacity induced by varying the void parameters related to its shape and location. At constant seismic action measured through , the capacity = ⁄ is shown to increase with the void depth until a saturation value that depends on its own . Similar conclusions can be drawn considering the effect of the horizontal distance S from the footing central line. It has also been reported that the bearing capacity decreases as the ratio related to the void width increases, with instability triggered for a value approaching = 3. For a void located below the centerline of the footing (namely = 0), the effect of the ratio related to the void height is to first marginally reduce the bearing capacity, and then attain a constant value.  Figure 4b instead show that by increasing the angle of internal friction of the soil, the bearing capacity of the strip footing increases as well: for ϕ ≤ 30 • , the bearing capacity increases at a low rate, but for larger values, it keeps increasing sharply.

Void Parameters
Results are depicted in Figure 5 showing the variation of the bearing capacity induced by varying the void parameters related to its shape and location. At constant seismic action measured through k h , the capacity Q = q u /γB is shown to increase with the void depth D until a saturation value that depends on its own k h . Similar conclusions can be drawn considering the effect of the horizontal distance S from the footing central line. It has also been reported that the bearing capacity decreases as the ratio α related to the void width increases, with instability triggered for a value approaching α = 3. For a void located below the centerline of the footing (namely S = 0), the effect of the ratio β related to the void height is to first marginally reduce the bearing capacity, and then attain a constant value.
can be drawn considering the effect of the horizontal distance S from the footing central line. It has also been reported that the bearing capacity decreases as the ratio related to the void width increases, with instability triggered for a value approaching = 3. For a void located below the centerline of the footing (namely = 0), the effect of the ratio related to the void height is to first marginally reduce the bearing capacity, and then attain a constant value. As the data have to be handled through ML approaches, the quality of the dataset is one of the dominant aspects to assure the accuracy of the predictions. To assess the bearing capacity of the strip footing above a void, the adopted dataset has been made of 38,000 models, each one featuring its own values of the parameters describing the soil properties and the void geometry. According to the parametric study presented above, the values of the parameters have been selected within their domains to cover all the effective situations related to the interaction between the strip and the void. The aforementioned effective values of the parameters are collected in Table 4.  As the data have to be handled through ML approaches, the quality of the dataset is one of the dominant aspects to assure the accuracy of the predictions. To assess the bearing capacity of the strip footing above a void, the adopted dataset has been made of 38,000 models, each one featuring its own values of the parameters describing the soil properties and the void geometry. According to the parametric study presented above, the values of the parameters have been selected within their domains to cover all the effective situations related to the interaction between the strip and the void. The aforementioned effective values of the parameters are collected in Table 4. Total number of models considered 38,000 As already remarked, for each analysis, only the average value between the upper and lower bounds on the strip footing bearing capacity has been considered as an entry of the dataset.

Machine Learning Techniques
To learn and then estimate the seismic bearing capacity of the shallow strip footing above the void, three nonlinear machine learning algorithms have been adopted: an MLP artificial neural network (ANN); the GMDH; and a combined ANFIS-PSO.
According to Table 4, the database for digging is characterized by eight input parameters (c 0 /γB, kB/c 0 , D, S, α, β, ϕ, and k h ) and only one output (Q = q u /γB). In the following, a detailed description of these methods is provided, along with the impact of the relevant hyperparameters of their performance. Hyperparameters are defined here as the algorithm-dependent parameters that control the learning process.

Multilayer Perceptron (MLP)
McCulloch and Pitts first introduced ANNs as a powerful predictive tool in [66]. ANNs are a programming paradigm which try to mimic the brain's structure. They are broadly employed in artificial intelligence problems, from simple pattern-recognition tasks to advanced symbolic manipulation. ANN-based models process the available data during a training stage, assessing the network output to efficiently and accurately match some available targets in case of a supervised approach, like the one here adopted based on the FELA results included in the dataset. ANNs can be used for modeling complex systems in virtually any science [67,68].
MLP is a class of ANNs, and stands as an evolution of the perceptron neural network originally developed in the 1960s, see [69]. For a general problem, MLP is able to provide a nonlinear map f : R k → R h , in case of an input layer made of k neurons (input values) and an output layer made of h neurons (output values). It also consists of an arbitrary number of hidden layers, with a variable number of neurons. The neurons are storage cells for scalar values, obtained by an activation function applied to the neuron values in the previous layer.
For our problem, a preliminary analysis has been performed to identify the optimal architecture of the MLP, in terms of the number of neurons in the hidden layers, to accurately describe the correlation between input and output of the analyzed dataset. In this regard, one-hidden-layer and two-hidden-layer ANNs have been investigated-to speed-up the optimization procedure, only 5% of the data has been adopted at this stage. To this aim, in [70], it was suggested that the ratio between the number of data and the number of input and output variables should be greater than five, to attain an appropriate performance of MLP-in the present case, the mentioned ratio results are 1900/9 = 211, which are much larger than the minimum value suggested. Figure 6 provides a comparison of the behaviors of all the considered architectures at the end of training. Results are reported in terms of the root mean square error (RMSE), given by: which has been assumed as the loss function to be minimized during the training. In Equation (3): n is the total number of data; and for the i-th model, Q OG2 i is the load bearing capacity obtained with the Optum G2 software, whereas Q Model i is the value estimated through the MLP algorithm.
To assess the effects of the hyperparameters on the ANN accuracy, the plot in Figure 6 shows the final value of RMSE as a function of the total number of neurons in the MLP. The continuous line represents the solution obtained with the one-hidden-layer ANN, which is obviously uniquely defined by the number of neurons in the layer itself. The dashed lines represent the solutions for the two-hidden-layer ANNs, and for each of them, the label in the chart represents the number of neurons in its first hidden layer. Each point in this graph has been computed as the average value out of ten repetitions of the training process, to also assure robustness against stochastic effects. What can be seen from this analysis is that the ANN featuring one hidden layer only, almost surely provides the best performance. Furthermore, for a number of neurons larger than 20, there is no noticeable improvement in the final value of the RMSE. Accordingly, and in order to minimize the computational costs, the 8-20-1 ANN architecture has been adopted henceforth.
The continuous line represents the solution obtained with the one-hidden-layer ANN, which is obviously uniquely defined by the number of neurons in the layer itself. The dashed lines represent the solutions for the two-hidden-layer ANNs, and for each of them, the label in the chart represents the number of neurons in its first hidden layer. Each point in this graph has been computed as the average value out of ten repetitions of the training process, to also assure robustness against stochastic effects. What can be seen from this analysis is that the ANN featuring one hidden layer only, almost surely provides the best performance. Furthermore, for a number of neurons larger than 20, there is no noticeable improvement in the final value of the RMSE. Accordingly, and in order to minimize the computational costs, the 8-20-1 ANN architecture has been adopted henceforth. The effect of the proportioning of the data handled for training, validation, and testing of the MLP has been investigated next. As before, 5% of the data have been handled to evaluate the performance of MLP for different values of proportioning, and the results are shown in Figure 7 at varying values of the percentage of training data. The best performance has been obtained for 65% of the dataset employed for training, 30% for validation, and the remaining 5% for testing. Therefore, the numbers of samples used for the various stages of the MLP-based analysis are as follows: out of the total 38,000 data, 24,700 were used for training, 11,400 for testing, and 1900 for validation. The effect of the proportioning of the data handled for training, validation, and testing of the MLP has been investigated next. As before, 5% of the data have been handled to evaluate the performance of MLP for different values of proportioning, and the results are shown in Figure 7 at varying values of the percentage of training data. The best performance has been obtained for 65% of the dataset employed for training, 30% for validation, and the remaining 5% for testing. Therefore, the numbers of samples used for the various stages of the MLP-based analysis are as follows: out of the total 38,000 data, 24,700 were used for training, 11,400 for testing, and 1900 for validation.

Group Method of Data Handling (GMDH)
The GMDH algorithm, which was first proposed in [71], combines mathematical modeling approaches and black-box nonlinear system identification, aiming to learn the complex relationship between input and output in the provided dataset. By means of a polynomial transfer function (Volterra functional series), GMDH allows the neurons to

Group Method of Data Handling (GMDH)
The GMDH algorithm, which was first proposed in [71], combines mathematical modeling approaches and black-box nonlinear system identification, aiming to learn the complex relationship between input and output in the provided dataset. By means of a polynomial transfer function (Volterra functional series), GMDH allows the neurons to become more complex units.
Unlike MLP, this method can be considered a particular type of self-organized ANN, which employs natural selection to control the size, complexity, and accuracy of the network. The main fields of application of GMDH are the modeling of complex systems, function approximation, nonlinear regression, and pattern recognition (more information can be found in example [72]).
Even for this architecture, a preliminary analysis has been performed to identify the optimal values of the number of layers and of the maximum number of neurons in a layer, to enhance its effectiveness. Like for the MLP, for this stage of the analysis, 5% of the available data have been used. The effects of the aforementioned hyperparameters on the RMSE at the end of training are depicted in Figure 8. The graph shows that, when the number of layers is larger than six, the RMSE is only marginally affected. Moreover, it can be seen that an optimal value for the maximum number of neurons in a layer is 12. For values larger than this threshold, only a small variation of the RMSE is reported, which is arrived at by largely increasing the cost of training. To run the GMDH algorithm, a parameter called "selection pressure", which can be thought of as a threshold to determine the number of neurons in each layer, must be set. Its value ranges from zero to define the no-pressure mode (to select all the generated neurons), to one to define the maximum pressure mode (to select no neurons from all the generated ones). Hence, the lower the value of the selection pressure, the more complex the structure of the GMDH and the higher the related computational effort. A trade-off has thus been arranged, in such a way that its value does not negatively affect the performance of the algorithm while being maximized. According to the results collected in Figure 9, an optimal value of this hyperparameter has been set to 0.2. To run the GMDH algorithm, a parameter called "selection pressure", which can be thought of as a threshold to determine the number of neurons in each layer, must be set. Its value ranges from zero to define the no-pressure mode (to select all the generated neurons), to one to define the maximum pressure mode (to select no neurons from all the generated ones). Hence, the lower the value of the selection pressure, the more complex the structure of the GMDH and the higher the related computational effort. A trade-off has thus been arranged, in such a way that its value does not negatively affect the performance of the algorithm while being maximized. According to the results collected in Figure 9, an optimal value of this hyperparameter has been set to 0.2. rons), to one to define the maximum pressure mode (to select no neurons from all the generated ones). Hence, the lower the value of the selection pressure, the more complex the structure of the GMDH and the higher the related computational effort. A trade-off has thus been arranged, in such a way that its value does not negatively affect the performance of the algorithm while being maximized. According to the results collected in Figure 9, an optimal value of this hyperparameter has been set to 0.2.  As for the MLP, the effect of data proportioning for training and testing has been investigated. Figure 10 shows that, independently of the number of layers, the percentage of training data does not significantly affect the GMDH performance. Therefore, 50% of the entire dataset has thus been selected for it. As for the MLP, the effect of data proportioning for training and testing has been investigated. Figure 10 shows that, independently of the number of layers, the percentage of training data does not significantly affect the GMDH performance. Therefore, 50% of the entire dataset has thus been selected for it.

Combined Adaptive-Network-Based Fuzzy Inference System and Particle Swarm Optimization (ANFIS-PSO)
ANFIS is known to be a useful ML approach to solve function approximation problems [73]. ANFIS, which uses ANN standard training algorithms, is structured with ifelse rules and input-output fuzzy joined data [74]. To simulate complex nonlinear mapping, a trained ANN and fuzzy logic work together in different imprecise and unpredictable noisy spaces [75]. Further aspects about ANFIS are reported in [27].
The PSO is based on competition and collaboration among particles [76,77]. It performs a local search and leads to fast convergence in minimizing the objective function, see [78,79]. The most striking advantage of this optimization algorithm over others is that the use of swarming particles makes the process robust against the solution getting entrapped into local minima of the handled objective function, since each particle renews its position and pathway based on its and others' information.

Combined Adaptive-Network-Based Fuzzy Inference System and Particle Swarm Optimization (ANFIS-PSO)
ANFIS is known to be a useful ML approach to solve function approximation problems [73]. ANFIS, which uses ANN standard training algorithms, is structured with if-else rules and input-output fuzzy joined data [74]. To simulate complex nonlinear mapping, a trained ANN and fuzzy logic work together in different imprecise and unpredictable noisy spaces [75]. Further aspects about ANFIS are reported in [27].
The PSO is based on competition and collaboration among particles [76,77]. It performs a local search and leads to fast convergence in minimizing the objective function, see [78,79]. The most striking advantage of this optimization algorithm over others is that the use of swarming particles makes the process robust against the solution getting entrapped into local minima of the handled objective function, since each particle renews its position and pathway based on its and others' information.
In this work, PSO has been used to provide the membership functions of ANFIS, assumed to be Gaussian ones, thereby decreasing the computation costs for a given topology of ANFIS itself. A thorough explanation on the combined ANFIS-PSO algorithm can be found in [80].
In order to enhance the performance of this algorithm, the critical hyperparameters that need to be set are the number of iterations, or in other words, the stopping criterion, and the number of particles in the population handled at each iteration. Although exact rules for the swarm size selection are not reported in the literature, the main issue linked to it is that, when the problem dimension increases, the swarm size does increase as well, and may quickly become unmanageable [81].
By setting the maximum number of iterations to 5000 and the number of particles varying in a range between five and 95, Figure 11 shows the evolution of RMSE. It can be seen that a number of iterations equal to 5000 looks appropriate to converge to the final solution, independently of the number of particles. As depicted in the inset of Figure 11, after 5000 iterations, the solution linked to 65 particles has shown the best performance. The initial and final values and of the inertia weight also need to be set. This weight provides a trade-off between the concentration of particles in their neighborhoods, and the exploration of new points in the parameter space. Higher values of the weight allow more new points to be explored, while lower values allow the concentration capability of PSO to be boosted. The value of this hyperparameter has been linearly reduced during the run, so that the search effort is essentially focused on the exploration at the initial steps (governed by ), while it is focused more on exploitation at the latter phases (governed by ) of the run.  Figure 12 shows the sensitivity of the PSO performance to and . In the graph, different colors are related to different values of , whereas different line formats are associated to different values of . It clearly emerges that the red lines associated to = 0.8 provide the best performance. Furthermore, the continuous lines linked to = 0.2 give rise to the minimum final value of RMSE and, in all the cases, to the steepest descents in its evolution. The initial and final values W 1 and W 2 of the inertia weight also need to be set. This weight provides a trade-off between the concentration of particles in their neighborhoods, and the exploration of new points in the parameter space. Higher values of the weight allow more new points to be explored, while lower values allow the concentration capability of PSO to be boosted. The value of this hyperparameter has been linearly reduced during the run, so that the search effort is essentially focused on the exploration at the initial steps (governed by W 1 ), while it is focused more on exploitation at the latter phases (governed by W 2 ) of the run. Figure 12 shows the sensitivity of the PSO performance to W 1 and W 2 . In the graph, different colors are related to different values of W 1 , whereas different line formats are associated to different values of W 2 . It clearly emerges that the red lines associated to W 1 = 0.8 provide the best performance. Furthermore, the continuous lines linked to W 2 = 0.2 give rise to the minimum final value of RMSE and, in all the cases, to the steepest descents in its evolution. Other hyperparameters to tune in the analysis are the cognitive and social acceleration coefficients and , which describe the weighting of the stochastic acceleration terms that move particles towards the best objective so far ( ), and to the best objective of swarm particles ( ). By increasing the cognitive acceleration coefficient value, the attraction of particles towards the best objective is enhanced, while their appeal towards the best goal of swarm particles is reduced. Since the social acceleration coefficient does the exact opposite, the relation between the values of the two coefficients is critical and may affect the behavior of the algorithm. Concerning the ratios / and / , we have considered eight different values in the investigation, plus the threshold characterized by = . According to the results shown in Figure 13, which provides the evolution of RMSE during the optimization by PSO as a function of the cognitive and social acceleration coefficients and , the best performance has been obtained for = 1 and = 1.5.  Other hyperparameters to tune in the analysis are the cognitive and social acceleration coefficients C 1 and C 2 , which describe the weighting of the stochastic acceleration terms that move particles towards the best objective so far (C 1 ), and to the best objective of swarm particles (C 2 ). By increasing the cognitive acceleration coefficient value, the attraction of particles towards the best objective is enhanced, while their appeal towards the best goal of swarm particles is reduced. Since the social acceleration coefficient does the exact opposite, the relation between the values of the two coefficients is critical and may affect the behavior of the algorithm. Concerning the ratios C 1 /C 2 and C 2 /C 1 , we have considered eight different values in the investigation, plus the threshold characterized by C 1 = C 2 . According to the results shown in Figure 13, which provides the evolution of RMSE during the optimization by PSO as a function of the cognitive and social acceleration coefficients C 1 and C 2 , the best performance has been obtained for C 1 = 1 and C 2 = 1.5. Other hyperparameters to tune in the analysis are the cognitive and social acceleration coefficients and , which describe the weighting of the stochastic acceleration terms that move particles towards the best objective so far ( ), and to the best objective of swarm particles ( ). By increasing the cognitive acceleration coefficient value, the attraction of particles towards the best objective is enhanced, while their appeal towards the best goal of swarm particles is reduced. Since the social acceleration coefficient does the exact opposite, the relation between the values of the two coefficients is critical and may affect the behavior of the algorithm. Concerning the ratios / and / , we have considered eight different values in the investigation, plus the threshold characterized by = . According to the results shown in Figure 13, which provides the evolution of RMSE during the optimization by PSO as a function of the cognitive and social acceleration coefficients and , the best performance has been obtained for = 1 and = 1.5.  As was done for the other algorithms, the effect of the proportioning of data has been investigated. Figure 14, which shows the evolution of RMSE during the optimization by PSO, testifies that the optimal percentage of training data is 70%. Accordingly, out of the 38,000 data, 26,600 have been used for training and 11,400 for testing. As was done for the other algorithms, the effect of the proportioning of data has been investigated. Figure 14, which shows the evolution of RMSE during the optimization by PSO, testifies that the optimal percentage of training data is 70%. Accordingly, out of the 38,000 data, 26,600 have been used for training and 11,400 for testing.

Results and Discussion
In this section, the performances of the three selected ML tools are compared. As metrics for the said performances, the following statistical indices have been adopted to measure the discrepancy between ground-truth and the predicted values of the seismic load bearing capacity: R , MAPE, SI, and BIAS being, respectively, the coefficient of determination, the mean absolute percentage error, the scatter index, and the standard bias. Besides these indices, the RMSE introduced in Equation (3) has been adopted too. In these equations = 1, … , is an index running over the instances in the dataset. It is also worth recalling that Q is the numerical value of the dimensionless load bearing capacity = ⁄ generated by the Optum G2 software, while is the corresponding value provided by the trained ML tool (the overbar stands for the average value of the corresponding variable).

Results and Discussion
In this section, the performances of the three selected ML tools are compared. As metrics for the said performances, the following statistical indices have been adopted to measure the discrepancy between ground-truth and the predicted values of the seismic load bearing capacity: R 2 , MAPE, SI, and BIAS being, respectively, the coefficient of determination, the mean absolute percentage error, the scatter index, and the standard bias. Besides these indices, the RMSE introduced in Equation (3) has been adopted too. In these equations i = 1, . . . , n is an index running over the instances in the dataset. It is also worth recalling that Q OG2 i is the numerical value of the dimensionless load bearing capacity Q = q u /γB generated by the Optum G2 software, while Q Model i is the corresponding value provided by the trained ML tool (the overbar stands for the average value of the corresponding variable). Table 5 gathers the values of all the statistical indices introduced above to assess the performance of each ML approach. It can be seen that MLP is more accurate than GMDH and ANFIS-PSO in catching the structural response, as shown by the R 2 , RMSE, SI, and BIAS values. The same trend is also shown in the parity plots given in Figure 15, where the estimations of the seismic bearing capacity Q = q u /γB of the shallow strip footing are compared with the ground-truth values. Output provided by MLP is shown to be much less scattered than the others, and the linear interpolation of all the pairs of results is well aligned with the (perfect fit) bisector of the quadrant. In spite of the relatively small value of the BIAS, due to the fact that each single term in the sum is purposely taken with its own sign, the GMDH model performs worse, as reported by the large scatter and also by the deviation from the perfect fit line of the linear interpolant. aligned with the (perfect fit) bisector of the quadrant. In spite of the relatively small value of the BIAS, due to the fact that each single term in the sum is purposely taken with its own sign, the GMDH model performs worse, as reported by the large scatter and also by the deviation from the perfect fit line of the linear interpolant. The goodness of the fit of the results has been also studied: Figure 16 provides a comparison between the FELA and the foreseen seismic bearing capacity of the shallow strip footing. For this comparison, only 1% of the instances in the dataset have been randomly selected, so as to provide a clear view of the quality of data fitting. It results that the seismic bearing capacity computed by the ML techniques matches relatively well with their FELA counterparts, and the minor scattering around them is obviously in accordance with the results reported in Figure 15. Table 5. Performance of the ML algorithms, in term of the adopted statistical indices.  The goodness of the fit of the results has been also studied: Figure 16 provides a comparison between the FELA and the foreseen seismic bearing capacity of the shallow strip footing. For this comparison, only 1% of the instances in the dataset have been randomly selected, so as to provide a clear view of the quality of data fitting. It results that the seismic bearing capacity computed by the ML techniques matches relatively well with their FELA counterparts, and the minor scattering around them is obviously in accordance with the results reported in Figure 15. Figure 16. Performance of the ML algorithms: comparison between the FELA results and the forecasts provided by the three algorithms for 1% randomly selected data, in terms of the seismic bearing capacity = ⁄ of the shallow strip footing.

A Case Study
In this section, the results obtained in this study in terms of trained ANN-based models, are adopted for a practical engineering problem to assess the seismic bearing capacity of a strip footing placed over an unsupported void. It has been assumed that the strip footing, featuring =8 m, is located over homogeneous soil characterized by =300 kPa, =18 kN/m 3 , and =20°. As shown in Figure 17, the footing is placed near a square void, which is 4 m wide and 4 m high, and the center of which is placed at a depth of 12 m and at a distance of 16 m from the footing centerline. To predict the undrained seismic bearing capacity in the specific case =0.2, the ML method introduced in this research can be adopted as follows: First, the nondimensional input parameters of the problem need to be computed to obtain:  Figure 16. Performance of the ML algorithms: comparison between the FELA results and the forecasts provided by the three algorithms for 1% randomly selected data, in terms of the seismic bearing capacity Q = q u /γB of the shallow strip footing.

A Case Study
In this section, the results obtained in this study in terms of trained ANN-based models, are adopted for a practical engineering problem to assess the seismic bearing capacity of a strip footing placed over an unsupported void. It has been assumed that the strip footing, featuring B = 8 m, is located over homogeneous soil characterized by c 0 = 300 kPa, γ = 18 kN/m 3 , and ϕ = 20 • . As shown in Figure 17, the footing is placed near a square void, which is 4 m wide and 4 m high, and the center of which is placed at a depth of 12 m and at a distance of 16 m from the footing centerline. To predict the undrained seismic bearing capacity in the specific case k h = 0.2, the ML method introduced in this research can be adopted as follows: Figure 16. Performance of the ML algorithms: comparison between the FE casts provided by the three algorithms for 1% randomly selected data, in t ing capacity = ⁄ of the shallow strip footing.

A Case Study
In this section, the results obtained in this study in terms of trai els, are adopted for a practical engineering problem to assess the se of a strip footing placed over an unsupported void. It has been a footing, featuring =8 m, is located over homogeneous soil cha kPa, =18 kN/m 3 , and =20°. As shown in Figure 17, the footing void, which is 4 m wide and 4 m high, and the center of which is p m and at a distance of 16 m from the footing centerline. To predict bearing capacity in the specific case =0.2, the ML method intro can be adopted as follows: Figure 17. Geometry of the case study (measures in m).
First, the nondimensional input parameters of the problem ne obtain:  First, the nondimensional input parameters of the problem need to be computed to obtain: According to Figure 17, where the center of the void is shown to be placed at a depth of 12 m and at a distance of 16 m from the footing centerline, the additional nondimensional depth and eccentricity parameters amount to: For this case study, the Optum G2 software has provided the lower and upper bounds on the ultimate bearing capacity as 3193.65 kPa and 3301.85 kPa, respectively, so the mean value has been given as 3247.75 kPa. The ML methods here investigated have instead provided the estimations reported in Table 6, along with the relevant errors with respect to the aforementioned ground-truth value. As already pointed out in Section 5, MLP is shown to perform better than the two alternative tools, giving rise to a representative error smaller than 2%. Very similar results can be obtained for any other geometry falling within the interval of characteristic parameters allowed for while assembling the dataset of Table 4.

Conclusions
In this study, the seismic bearing capacity of a strip footing placed over an unsupported void has been assessed. Dimensionless factors describing the horizontal seismic acceleration, the soil strength and heterogeneity, the internal friction angle of the soil, the shape, size, depth, and eccentricity of the void have been accounted for.
Three ML techniques have been exploited to estimate the aforementioned bearing capacity: MLP; GMDH; and ANFIS-PSO. All the hyperparameters affecting the performance of the ML models have been investigated, to ensure the attainment of optimal solutions in all the cases. The results obtained by training the ML models have shown a good fitting of the seismic bearing capacity provided by numerical FELA simulations, here handled as the ground-truth data. More specifically, MLP, and ANFIS-PSO have performed better than GMDH. Overall, almost independently of the adopted statistical performance index used to assess the goodness of the fit of the models, MLP has been shown to be superior to the other models. This outcome has also been testified by means of concrete results regarding a case study.
A major concern regarding the use of a data-driven ML approach to foresee the information related to limit states of a structural system, typically highly sensitive to some parameters unknown or difficult to assess, is related to the quality of the dataset. In future activities, the proposed approach will be fed also with experimental data, to assess the credibility of the ML models. To this purpose, collecting empirical data of the seismic bearing capacity of the strip footing above voids would be worthwhile, to finally establish an effective tool to design strip footing, especially in urban areas.
In this study, only rectangular voids have been considered. Having established the reliability of the offered methodology, additional datasets will be considered in the future to allow for voids with a different geometry, aiming to increase the awareness of strip footing behavior over voids. Data Availability Statement: Data, models, and codes that support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A.
In this appendix, we collect some features of the optimal configurations of the selected neural network-based ML algorithms, so that they can be adopted in the future by other researchers for similar problems.

Appendix A.1. MLP
According to the results of the optimization procedure discussed in Section 4.1, the 8-20-1 MLP architecture has shown the best performance in predicting the seismic bearing capacity of the shallow strip footing above a void. The weights and biases relevant to the input-hidden and to the hidden-output connections are respectively listed in Tables A1 and A2. The prediction equation, practically used to link the input vector (consisting of c 0 /γB, kB/c 0 , D, S, α, β, ϕ, and k h ) to the output (Q = q u /γB), has resulted into: where: I i represents the inputs vector; m = 8 is the dimension of the input vector; h is hidden layer size; w iN is the connection weight between the ith input component and the Nth neuron of the hidden layer; w N is the connection weight between the Nth neuron of hidden layer and the (single) output; b hN is the bias at the Nth neuron of the hidden layer; b o is the bias at the output layer; and f sig is the tan-sigmoid transfer function. According to Tables A1 and A2, Equation (A1) can be re-written as: where: and weights N ij are provided in Table A1. According to Section 4.2, the optimal GMDH model has been shown to feature a maximum number of neurons in each layer equal to 12, and four layers only. Furthermore, the selection pressure has been set to 0.5.
The main advantage of GMDH is the provision of a polynomial relationship between the input and the output variables, which stands as a relatively practical and straightforward solution for the problem at hand. This relationship reads: where: X and Y are the inputs to a neuron, and weights a 1 , . . . , a 6 are all gathered in vector A. For each pair of variables X and Y, the entries of A are provided in Table A3, see also Figure A1.  Figure A1. Architecture of the adopted GMDH model.   As the equations linking the input and output variables for the ANFIS-PSO model are not as straightforward as with the other two cases, the corresponding MATLAB code for the calculation of the seismic bearing capacity of the shallow strip footing above a void is provided in Table A4. As the equations linking the input and output variables for the ANFIS-PSO model are not as straightforward as with the other two cases, the corresponding MATLAB code for the calculation of the seismic bearing capacity of the shallow strip footing above a void is provided in Table A4.