Probabilistic Evaluation of CPT-based Seismic Soil Liquefaction Potential: Towards the Integration of Interpretive Structural Modeling and Bayesian Belief Network

This paper proposes a probabilistic graphical model that integrates interpretive structural modeling (ISM) and Bayesian belief network (BBN) approaches to predict CPT-based soil liquefaction potential. In this study, an ISM approach was employed to identify relationships between influence factors, whereas BBN approach was used to describe the quantitative strength of their relationships using conditional and marginal probabilities. The proposed model combines major causes, such as soil, seismic and site conditions, of seismic soil liquefaction at once. To demonstrate the application of the propose framework, the paper elaborates on each phase of the BBN framework, which is then validated with historical empirical data. In context of the rate of successful prediction of liquefaction and non-liquefaction events, the proposed probabilistic graphical model is proven to be more effective, compared to logistic regression, support vector machine, random forest and naïve Bayes methods. This research also interprets sensitivity analysis and the most probable explanation of seismic soil liquefaction appertaining to engineering perspective.


Introduction
Determination of soil liquefaction potential is a fundamental step for seismic-induced hazard mitigation. In the last few decades, numerous researchers have attempted to present different methods that are based on in situ tests to predict the soil liquefaction potential, e.g. Seed and Idriss 1981;Seed and Idriss 1971;Robertson and Wride 1998;Youd and Idriss 2001;Juang et al. 2003;Moss et al. 2006;Idriss and Boulanger 2006; such as standard penetration test (SPT), cone penetration test (CPT), and techniques for shear wave velocity (Vs). The findings of the cone penetration test (CPT) have been adapted by many researchers from in situ tests as the basis for evaluating the liquefaction potential of the test method (e.g. Juang et al. 2003;Youd and Idriss 2001). A main significance of the CPT is almost continuous information given throughout the depth of the explored soil strata. The CPT is also known to be much accurate and repeatable than other types of in situ test methods. Artificial intelligence (AI) techniques as for example random forest (Kohestani et al. 2015), adaptive neuro-fuzzy inference system (ANFIS) (Xue and Yang 2013), relevance vector machine (RVM) (Samui 2007), artificial neural network (ANN) (Goh 1996;Juang et al. 2003;Samui et al. 2011) and support vector machine (SVM) (Goh and Goh 2007;Oommen et al. 2010;Pal 2006;Samui et al. 2011) models were developed to predict liquefaction potential based on in situ test database. Over conventional modeling techniques, the primary strength of AI techniques is their process of capturing nonlinear and complex correlation between system variables without having to presume the correlations between different variables of input and output. In the scope of assessing the occurrence of liquefaction, these techniques may be trained to learn the relationship between soil, site, and earthquake characteristics with the potential for liquefaction, needing no prior knowledge of the form of the relation. Mostly models are black box, i.e., a relationship between inputs and output is not present in these models. The Bayesian Belief Network (BBN) is a graphical model that enables a set of variables to be probabilistically connected (Pearl 1988). To address cause-effect relationships and complexities, BBN may provide an effective structure. BBN not only provides sequential inference (from causes to results) but also reverse inference (from results to causes). The benefits of BBNs include the following compared to other methods: (1) BBN achieves a combination of qualitative and quantitative analysis; (2) BBN allows reversal inference (from results to causes) and it is simple to obtain the ranking of factors affecting the casualties; (3) BBN has a good learning ability; (4) allows data to be combined with domain knowledge; and (5) Even with very limited sample sizes, BBN can demonstrate good prediction accuracy. Furthermore, its application in seismic liquefaction potential on CPT-based in-situ tests data is comparatively less e.g. (Ahmad et al. 2019a;Ahmad et al. 2020a;Ahmad et al. 2020b). The contributions of this paper are fourfold: (1) this article discusses the interdependence of different CPT-based seismic soil liquefaction variables, whereas the Bayesian Belief Network (BBN) approach uses conditional and marginal probabilities to describe the quantitative strength of their relationships; (2) the performance of the proposed model is comparatively assessed with four traditional seismic soil liquefaction modeling algorithms (logistic regression, SVM, RF, and Naïve Bayes); (3) the sensitivity analysis of predictor variables is presented owing to know the effect of input factors on the liquefaction potential; and (4) the most probable explanation (MPE) of seismic soil liquefaction with reference to engineering perspective is presented. This article consists of six major sections. Next section presents resarch methodology, Section 3 is devoted to the probabilistic graphical model development and evaluation measures. Results and discussion are presented in Section 4. Finally, in the last part, conclusions and future work are set out.

Interpretive Structural Modeling
Interpretive structural modeling (ISM) is a well-established technique that describes a situation or a problem to classify the relationships between particular issues. A collection of different elements that are directly and indirectly connected are organized into a structured comprehensive model in this approach (Sage 1977;Warfield 1974). The model thus created depicts the structure of a complex problem or issue in a carefully constructed pattern that implies graphics and words (Raj and Attri 2011;Ravi and Shankar 2005;Shankar et al. 2003). Different researchers have increasingly used this technique to depict the interrelationships between various elements relevant to the issues. The ISM approach includes the identification of variables that are important to the issue or problem. Then a contextually relevant subordinate relationship is identified. On the basis of a pair wise comparison of variables, after the contextual relationship has been determined, a structural self-interaction matrix (SSIM) is defined. After this, SSIM is converted into a reachability matrix (RM) and its transitivity is examined. A matrix model is obtained after transitivity embedding is complete. Then, the element partitioning and a structural model extraction called ISM are derived.

Bayesian Belief Network
Bayesian Belief Networks (BBN) is a graphical network of causal connections between different nodes. In BBN models, the network structure is a directed acyclic graph (DAG) that graphically represents the logical relationship between nodes, and the conditional probability of quantifying the strength of this relationship is the network parameter (Castelletti and Soncini-Sessa 2007;Ghribi and Masmoudi 2013;Masmoudi et al. 2019). The network structure and network parameter can be obtained via expert knowledge (Joseph et al. 2010;Nadkarni and Shenoy 2001) or training from data (Kabir et al. 2015). A sample BBN model is shown in Figure 1, where node X is the parent node of the child nodes Y and Z, and node Y is the parent node of the child node Z. Edges are represented by the arrows between the two nodes. The joint probability of Bayesian belief networks can be defined as the product of each node's edge probability: where P(X) is the prior probability that is the conditional probability without parent nodes, P(Y|X) is the conditional probability that is the occurrence probability of Y under the X conditions, P(Z|X,Y) is the conditional probability that is the occurrence probability of Z under the X and Y conditions. Figure 2 presents the graphical flow of the approach.  Figure 2. The process outline of the methodology of research used in the present study.

Data set and Predictor Variables
The data set used in this analysis was on the basis of the revised version of the CPT case history records collected by Boulanger and Idriss (Boulanger and Idriss 2014). The entire data set consists of 253 cases with a soil behaviour type index, Ic < 2.6, of which 180 are liquefied cases, another 71 are non-liquefied cases and the remaining 2 are doubtful cases (margin between liquefied and non-liquefied) in this research work. These case histories are derived from CPT measurements of 17 sites and field performance reports of major earthquakes (see Table A in appendix A). Liquefaction is caused by seismic parameters, site conditions, and soil properties that include a varied range of factors. So nine critical factors or variables used for the possible evaluation of liquefaction are chosen, namely earthquake magnitude (M) F1, peak ground acceleration (amax, g) F2, fines content (FC, %) F3, equivalent clean sand penetration resistance (qc1Ncs) F4, soil behaviour type index (Ic) F5, vertical effective stress (σ'v, kPa) F6, groundwater table depth (Dw, m) F7, depth of soil deposit (Ds, m) F8, thickness of soil layer (Ts, m) F9, and output is liquefaction potential, F10 in this paper according to Okoli and Schabram (Okoli and Schabram 2010) and Tranfield et al. (Tranfield et al. 2003). For more details of CPT case histories, viewers may refer to the Boulanger and Idriss reference (Boulanger and Idriss 2014). The statistical characteristics of the data set used in this study, such as minimum (Min.), maximum (Max.), mean, standard deviation (SD) and variation coefficient (COV), are shown in Table 1. Figure 3 explains the frequency histograms of measured parameters. For instance, 105 data samples were measured in the range of 2-4 with regard to the Ds parameter.  Figure 3. Frequency histograms of measured parameters Previous studies (Ahmad et al. 2019a;Ahmad et al. 2019b;Hu et al. 2015;Zhang 1998) showed detail understanding about the variables' selection and discretization. BBN has a good capability to deal with discrete variables, but is weak in continuous variables processing, so the nine significant factors and output (liquefaction potential, F10) require to be transformed into discrete values before the propose model is constructed accordance to the possible factor range and expert knowledge, as shown in Table 2. The output "0" represents non-liquefied case while "1" represents liquefied case is shown in Figure 3.

Probabilistic Graphical Model for CPT-based Seismic Soil Liquefaction Potential
The data set has been divided into training and testing datasets according to statistical aspects for example mean, maximum, minimum, etc. to build the models:  To build the models, a training data set is required. In this study, 201 (80%) CPT case histories of the data were used by authors for training set.  To predict the performance of the established models, a testing data set is required. In this study, the remaining 50 (20%) CPT case histories data is considered to be a testing data set. ISM technique suggests the use of domain or expert knowledge in the creation of the contextual relationships between the nine significant variables and contextual relationships are ultimately analyzed by field experts who have approved and represented by SSIM (see Table 3). Table 3. SSIM for seismic soil liquefaction variables.
A-column variable influences the row variable. O-no relationship among the row and column variables. In the next phase, the SSIM is changed to a binary matrix for seismic soil liquefaction factors, called the initial reachability matrix (IRM), by exchanging the original symbols with 1 or 0, as shown in Table B1 in Appendix B. When the IRM is obtained, the transitivity property is verified to get the final matrix of reachability (FRM). The transitivity check is the basic principle of the ISM methodology that if the 'a' variable is related to the 'b' variable and the 'b' variable is related to the 'c' variable, the 'a' variable is ultimately correlated with the 'c' variable. The new entries that are labeled as '1*' are implied after transitivity checking. The FRM with rank, driving and dependence powers is shown in Appendix B of Table B2. The variables used to derive multilevel hierarchy structure levels, along with their reachability set (Sr), antecedent set (Sa), and intersection set (Si), are shown in Table B3-B7 in Appendix B. The findings showed that there are five partition levels which are as follows: The liquefaction potential multilevel hierarchy structure is formed from the FRM. The transitivity relations between two variables, such as the direct links between the Ds and the Dw with liquefaction potential, are eliminated because the Ds and the Dw will influence the liquefaction potential through vertical effective stress. In the next phase, there is no conceptual inconsistency in the structural model so the ISM is developed for the soil liquefaction potential (see Figure 4). There is a restriction of no links between skipping-level nodes in the ISM model (for example, FC and liquefaction potential).

Fines content
Vertical effective stress Earthquake magnitude

Soil behaviour type index
Peak ground acceleration q c1Ncs Groundwater  A network model with an unknown structure or insufficient knowledge can be hard to create directly. To fix this issue, Liao et al.
[37] used ISM to develop a network diagram, which they specifically used as a BBN for evaluating outsourcing risk. This approach effectively processes the relationships between variables by splitting the problem into different levels, making the overall structure clear and easy to understand and ensuring a deeper understanding of the problem. In order to facilitate constructing a BBN diagram, the final network diagram obtained from ISM defines the interdependent relationships between factors at the same level or between two levels. The model system is built directly into Netica software distributed by Norsys Software Corp to define the quantitative intensity of their relationships. The graphical presentation is shown in Figure 5.

Evaluation and Prediction
To evaluate the propose model, It was compared with some other techniques using measurements of scalar performance.

Compared Methods
The proposed model was compared with other prediction methods, including four other methods widely used (Table 4).  (Breiman 2001) is a meta-learning scheme that integrates many independently developed base classifiers and participates in a voting process to obtain a prediction for the final class. Naive Bayes (NB) Naive Bayes (John and Langley 1995) assumes that the predictive variables, provided the target/dependent variable, are conditionally independent.

Evaluation Measures
Four scalar measurements are used, i.e., accuracy, precision, recall and F-measure. There are four possible outcomes for a single prediction in the binary class scenario, i.e., liquefaction and non-liquefaction. The correct classification is true negative (TN) and true positive (TP). If the output is incorrectly predicted as negative, a false positive (FP) occurs, If the result is wrongly labeled as negative, a false negative (FN) occurs. Accuracy (acc) is a calculation of the total number of accurate predictions. The acc's value is calculated as follows: TP TN acc TP FN FP TN Precision refers to the proportion of correctly classified positive cases and recall is referred to as the portion of correctly classified actual positive cases. A pair of contradictory measures is precision (pr) and recall (re). The pr is generally large, while re is not large, or vice versa. The confusion matrix (see Table 5) can be used to evaluate these as: The best value for the F-score is 1 and the worst value is 0.

Comparative Performance of Multiple Learners Based on Test Dataset
The prediction results of the proposed model, i.e., BBN-ISM, LR, SVM, RF and NB models, were obtained on the test set. Subsequently, as shown in Table 6, each model's confusion matrix was calculated. The values on the main diagonal indicated the correctly predicted number of samples. The acc, pr, re, and F-score were determined on the basis of Equations (2)-(5) mentioned in Table 7, based on Table 6. The results in Table 7 show that the developed model gave the best predictive performance, with much higher acc than other models (from 4 % to 16 % improvement over other models). The performance of the LR model is just secondary to the proposed model. In addition, the accuracy degrees of BBN-ISM were found highest and up to 78%, followed by 72% accuracy of the LR model. Comparing their values of pr, re, and F-score, BBN-ISM model performed better than LR, SVM, RF and NB models. Therefore, the rank was BBN-ISM > LR > SVM > NB based on their overall prediction results.

Most Probable Explanation
Using the Netica function to decide which situation is most probable to cause soil liquefaction potential, the most probable explanation (MPE) can be found and the established model can be used to obtain the most probable explanation. For instance, if the soil liquefaction is "yes" as shown in Figure 6, the function "most probable explanation" is used to identify the set that is most probable to cause "soil liquefaction" which is [peak ground acceleration = medium, equivalent clean sand penetration resistance = medium, thickness of soil layer = thin, earthquake magnitude = strong, soil behaviour type index = silty sand, fines content = less, vertical effective stress = small, groundwater table = shallow, and depth of soil deposit = shallow]. This shows explicitly that the set is indeed well associated with the judgment of engineering.

Sensitivity Analysis
Netica can effectively determine the extent to which any node's findings can affect the beliefs of another node, based on the findings that are currently entered. These findings may have mutual information (entropy reduction) or an expected reduction in the actual variance. In this study, to determine the impact of each factor on the liquefaction potential, a sensitivity analysis was performed on nine input factors with variance of beliefs. Based on the sensitivity analysis, a basic event that has a relatively large contribution to the probability of a resulting event makes it easier to reduce the probability of these basic events by considering effective measurements, thereby reducing the probability of a resulting event. The target node "soil liquefaction" is selected in Netica for sensitivity analysis, and the results are shown in Table 8. Table 8 presents that the mutual info of the "equivalent clean sand penetration resistance" node is the greatest, i.e., 0.13920, which indicates that it has the strongest influence on "soil liquefaction," potential followed by "peak ground acceleration," "soil behaviour type index," and so on which have mutual info equal to 0.04439 and 0.03655 respectively, Groundwater whereas the "depth of soil deposit" is bared minimum sensitive factor with a mutual info equal to 0.00004; those findings are strongly consistent with the literature.

Conclusions and Future Prospect
In this paper, probabilistic evaluation of CPT-based seismic soil liquefaction was carried by systematically integrating ISM and the BBN. The models were trained and tested based on Boulanger and Idriss database compiles from various soil liquefaction in different countries. The proposed model predicts the seismic soil liquefaction using major contributing factors on soil liquefaction. The most important conclusions of the present research work are as follows: 1. The accuracy of the proposed model is 78% and the F-score is 0.845 for liquefaction data and 0.621 for non-liquefaction data. In comparison with LR, SVM, RF, and NB models, the proposed model has better prediction ability and, because of a simple graphical result, its implementation is simpler. 2. The MPE of seismic soil liquefaction is that the peak ground acceleration = medium, equivalent clean sand penetration resistance = medium, thickness of soil layer = thin, earthquake magnitude = strong, soil behaviour type index = silty sand, fines content = less, vertical effective stress = small, groundwater table = shallow, and depth of soil deposit = shallow, which suits well in accordance with engineering practice. 3. Sensitivity analysis shows the most important parameters on the soil liquefaction: qc1Ncs and PGA determine the strongest, followed by Ic, Ts, σ′v, FC, M, Dw, and Ds. It is probably due to the fact that the variation of these parameters is not very much. Since the CPT case histories database have class imbalanced and the sampling biased in training and testing data set may lead anecdotal results to some degree. Nevertheless, these anecdotal findings regarding seismic soil liquefaction potential evaluation are greatly insightful from a preliminary viewpoint. In addition, owing to the ISM shortcomings, such as ignoring relationships between the nodes of the skipping-level, and there is no feedback circuit between any two levels, some significant node relationships will be ignored. In the future, the causal mapping approach should be employed to change the structure and to refine the prediction performance results, taking into account the ISM shortcomings.    Li  V7  V4,V5,V6,V7,V10  V7  V7  V8  V4,V5,V6,V8,V10  V8  V8  V9  V9,V10  V9  V9  V10 V10 V1,V2,V3,V4,V5,V6,V7,V8,V9,V10 V10 L1     The process outline of the methodology of research used in the present study. Interpretive structural modeling of liquefaction potential.

Figure 5
Liquefaction potential graphical model.

Figure 6
MPE of seismic soil liquefaction.