Stability Analysis of Rock Slope Based on Improved Principal Component Analysis Model: Taking Fuwushan Slope as an Example

Aiming at the problems of low accuracy, low e ﬃ ciency, and many parameters required in the current calculation of rock slope stability, a prediction model of rock slope stability is proposed, which combines principal component analysis (PCA) and relevance vector machine (RVM). In this model, PCA is used to reduce the dimension of several in ﬂ uencing factors, and four independent principal component variables are selected. With the help of RVM mapping the nonlinear relationship between the safety factor of slope stability and the principal component variables, the prediction model of rock slope stability based on PCA-RVM is established. The results show that under the same sample, the maximum relative error of the PCA-RVM model is only 1.26%, the average relative error is 0.95%, and the mean square error is 0.011, which is far lower than that of the RVM model and the GEP model. By comparing the results of traditional calculation method and PCA-RVM model, it can be concluded that the PCA-RVM model has the characteristics of high prediction accuracy, small discreteness, and high reliability, which provides reference value for accurately predicting the stability of rock slope.


Introduction
Slope sliding is a common geological disaster phenomenon, which has great harm. Once it occurs, it will seriously threaten people's lives and property and various engineering safety, causing great losses [1,2]. Some slope instability disasters, which are located in China, are shown in Figure1. In order to effectively control the slope instability, researchers have carried out a lot of slope stability evaluation work, in order to reduce the loss caused by slope sliding and save the cost of disaster prevention and mitigation.
Slope stability is affected by many uncertain factors, such as natural and human factors, and there is a complex nonlinear relationship between them. How to establish an accurate rock slope stability evaluation model considering multiple factors has always been the focus of engineering [3][4][5][6][7][8]. At present, the research on slope stability mainly focuses on numerical simulation, theoretical analysis, and experimental research.
Numerical research: Zhang et al. [9] established a threedimensional geological model of a mining slope with the help of DIMINE simulation platform and DTM model, which can obtain the geological conditions of any section of the slope; Guo et al. [10] used UDEC to study the influence of dry wet cycle on slope stability. The results show that the cohesion is more affected by dry wet cycle than internal friction angle; Wang et al. [11] based on Swedish slice method, combined with DEM data and GIS components, realized the search of slope sliding surface; Yang and Zhao [12] took a landslide in Sichuan Province as the research object, based on the simplified Bishop method and FLAC 3D, and simulated the deformation and stress of the slope in the process of sliding; Xiao et al. [13], respectively, used the simplified Bishop method and Fellenius method to analyze the slope stability under earthquake action. The results show that the safety factor obtained by the Bishop method is 6% higher than that by the Fellenius method. Based on FLAC 3D platform, Xue et al. [14] developed local and overall strength reduction programs to evaluate the stability of heterogeneous slope. Bo et al. [15] studied the stability and deformation of artificial mountain slope on soft soil foundation by using numerical analysis method, taking Danshan mountain piling project in Zhenjiang City as an example. However, numerical analysis method cannot solve the problem of random, variable, and fuzzy dynamic change of rock slope system and has some shortcomings, such as complex calculation process, large amount of calculation, and difficult to guarantee calculation accuracy; the shortcomings of numerical analysis are also revealed.
Theoretical research: Deng et al. [16] introduced the Hoek Brown criterion into the stability analysis of jointed slope and combined with interval theory to obtain the threshold value of safety factor; Lei and Zheng [17] deeply analyzed the concept of seepage force and effective stress applied in Swedish slice method; Fang [18] discussed the law of the minimum solution of slice method by comparing the calculation results of various common slope safety factors; Wang [19] introduced the tangential force and normal force between strips into the calculation of the Janbu method, modified the Janbu method, and improved the accuracy of the calculation results; Deng et al. [20] proposed a new slope sliding surface search method based on the Janbu method and random angle, which has the advantages of easy programming and wide simulation range. Yang et al. [21], based on the stress field obtained by numerical calculation, carried out the slope limit equilibrium finite element stability analysis and determined the most dangerous sliding surface position and safety factor of the slope, but the safety factor value calculated by the limit equilibrium method is lower than the actual value due to the limitation of factors and assumptions. The traditional numerical calculation and balance analysis method is difficult to include the influence of many factors, and the result error is large and the calculation efficiency is low.
In the aspect of experimental research, Liu et al. [22] showed that the slope deformation tended to be stable after 2 months through field monitoring of the slope in the earthquake area, and the earthquake had a great impact on the horizontal deformation of the shallow rock mass on the slope; Huang et al. [23] modified the shear strength formula of expansive soil based on the in situ shear test and evaluated the slope stability with the modified formula; Wu et al. [24] designed an indoor model test based on the slope of a mining area and studied the dynamic change law of slope during excavation; Zhou et al. [25] explored the stability of underwater slope with the aid of centrifugal test, and the results showed that the limit slope angle of underwater fine sand slope was smaller than that of silty slope.
With the rapid development of computer technology, many scholars began to apply machine learning algorithm to rock slope stability prediction. Liu et al. [26] analyzed and predicted the slope of Chongqing Wanliang Expressway by using the mathematical model of grey correlation degree method, and the results have good applicability and credibility. Although the predicted value of this method is accurate, the grey model has the disadvantages of complex calculation process and long calculation time; Jiang et al. [27] used BP neural network to learn and predict a large number of rock slope samples in Chongqing area. The research shows that the prediction results of this method have high accuracy and good adaptability. However, the neural network method has some problems, such as slow learning speed and  2 Geofluids excessive dependence on learning samples. Li et al. [28] established the support vector machine (SVM) model for predicting the surface deformation of rock slope and used the model to predict the surface deformation of rock slope in Fushun. However, the model has some defects, such as low generalization ability of kernel function and difficult to determine. Therefore, it is urgent to establish a more efficient and reasonable machine learning model. Relevance vector machine (RVM) is a popular machine learning method in recent years. It has the advantages of high precision, high efficiency, and small sample size. However, when the input sample dimension is large, it will reduce the learning efficiency of RVM and increase the calculation cost [29,30]. Therefore, this paper uses the feature extraction ability of principal component analysis (PCA) to reduce the dimension of data and selects less and linearly independent influencing factors as new input variables for prediction [31]. The RVM model is used to learn the new input variables, and the rock slope stability prediction model based on PCA-RVM is established. The example of Fuwushan slope is used to verify the analysis, which provides a new way for rock slope stability prediction. under the premise of little information loss, it recombines multiple indicators with certain correlation into a group of less comprehensive indicators so that a small number of simplified variables can reflect most of the information in the original variables [32]. PCA calculation steps are as follows:

Method Principle
(1) Constructing m × n-order matrix is the number of samples; n is the number of influencing factors of each sample : ð1Þ The original data is standardized, and the standardized matrix is generated automatically.
(2) The covariance matrix is established according to the standardized matrix R. The calculation formula is where R i j ði = 1, 2, ⋯, m, j = 1, 2, ⋯, pÞ is the correlation coefficient of X i and X j .
(3) Since R is a positive definite matrix, m nonnegative eigenvalues of characteristic equation |λE − R | = 0 are obtained, that is, λ 1 ≥ λ 2 ≥ ⋯≥λ m ≥ 0. Under the premise of constant total variance, the contribution rate of the i-th principal component [33], when the cumulative contribution rate of q principal components exceeds 85%, it can be considered that these principal components can contain most of the total information.
(4) After the principal component analysis, the relationship between the initial variable x 1 , x 2 , ⋯, x n and n comprehensive index factor y 1 , y 2 , ⋯, y n is as follows: In the formula, c i j and y i are not related to each other, and c in satisfies c 2 i1 + c 2 i2 + ⋯+c 2 in = 1. Therefore, the number of initial variables is reduced to achieve the purpose of dimension reduction.

Correlation Vector Machine.
Suppose that the training set is fx n , t n g N n−1 , where x n ∈ R d and t n ∈ R are input vector values and output scalar values, respectively, and t n is distributed independently. The relationship between the input value x and the target value t can be expressed as follows: where ω is the weight vector and ω = ½ω 0 , ω 1 , ⋯, ω N T and ζ n is the additional Gaussian noise with zero mean, which is independent of each other, that is to say, it satisfies the following Gaussian distribution: where variance σ 2 is unknown and needs to be obtained by iterative updating. Pðt n | xÞ = Nðt n | yðx n Þ, σ 2 Þ obeys Gaussian distribution. From equations (4) and (5), it can be concluded that where t = ðt 1 , ⋯, t N ÞT, Φ is the structure matrix of order N × ðN + 1Þ set in advance, and Φ = ½φðx 1 Þ,

Geofluids
With a large number of parameters used, overadaptation may occur in the evaluation of MLE (maximum likelihood estimation) ω and σ 2 . In order to avoid similar phenomenon, some mandatory conditions can be added to the parameters. Suppose that the parameter ω i obeys the Gauss conditional probability distribution with mean value 0 and variance a −1 i .
where a = ða 0 , a 1 , ⋯, a N Þ is the N + 1-dimensional hyperparametric vector. Suppose that super parameter a and noise parameter σ2 obey gamma prior probability distribution where According to the Bayesian theory, the posterior probability distribution of training sample set is as follows: where Pðω, a, σ2 | tÞ cannot be calculated directly by integral, so it is decomposed into two parts: The posterior distribution of weight vector ω can be obtained from the above formula: where posterior mean ⋯, a N Þ. Because Pða, σ 2 | tÞ cannot be calculated directly by decomposition, the Dirac delta function is introduced to do approximate calculation, which is expressed as The optimal solution of Pða, σ 2 | tÞ are a MP and σ 2 MP : By solving equations (13) and (14), the following formula is obtained: The maximum estimates of Pða, σ 2 | tÞ, Pðt | a, σ 2 Þ, and PðaÞ Pðσ 2 Þ in equation (15) are obtained: The deviation guide of formula (16) can be obtained: Let (17) be equal to 0 and r n = 1 − a n ∑ nn : The results are as follows: In the actual process, the super parameters a n and σ 2 are updated through equations (18) and (19) to complete the RVM learning. In the iterative process, most of a n tends to infinity, which can be obtained through formula μ = σ − 2 ∑ΦTt, and the corresponding ω value tends to zero. Assuming that the sample to be predicted is x * , the predicted value t * can be obtained from By simplifying equation (20), it can be concluded that where expected value y * = μTφðx * Þ, variance σ 2 * = σ 2 MP + φðx * ÞT∑φðx * Þ, and the real value of t * is calculated by equation (21).

PCA-RVM Model of Rock Slope Stability
3.1. Sample Data. Scholars at home and abroad divide slope stability into two categories: failure slope and stability slope [4,34,35]. The factors that affect the stability of rock slope are complex and various. This paper introduces the principal component analysis method to explore the relationship between the influencing factors and slope stability, analyzes 4 Geofluids and reduces the dimension of each factor, and retains the main influencing factors and substitutes them into the RVM model for prediction. Select the rock weight (γ), cohesion (C), internal friction angle (φ), slope height (H), slope angle (α), and pore water pressure (γ w ). These six factors are the input factors of rock slope stability, and the safety factor (F s ) is the output factor. In this paper, 30 groups of slope data in literature [29] are sorted out, and 1~23 groups are used as learning samples and 24~30 groups are used as prediction samples (see Table 1) to make the prediction model. Finally, the third section of Fuwu mountain slope is taken as an example to verify the analysis. The 30 groups of data in Table 1 were standardized, and the Bartlett sphericity test value was 0.000, which was less than the significance level of 0.05. The results showed that the sample data could be used for factor analysis. Principal component analysis is performed on the input variables in Table 1 to obtain the correlation coefficient matrix among the variables (see Table 2).
According to Table 2, γ, C, φ, and H have a strong linear correlation. For example: γ with C, φ, and H the correlation coefficients were 0.469, 0.429, and 0.659, respectively; and with α, γ, the correlation coefficients of γ w were 0.382 and -0.299. It has a good correlation with the first three factors, but the correlation with the last two factors gradually decreases, which even shows a negative correlation. It can be preliminarily determined that γ has a good correlation with the first three factors. In order to ensure that the selected variables can contain most of the information of the original data, it is necessary to obtain the actual contribution rate and cumulative contribution rate of each influencing factor to the slope stability, as shown in Table 3.
It can be seen from Table 3 that the cumulative contribution rate of the first four principal components has reached 88.47%, more than 85%, indicating that the first four principal components can effectively replace the information contained in the original data. In order to intuitively compare the contribution rate of each factor, the actual contribution rate and cumulative contribution rate of each component are shown in Figure 2.
The score coefficient matrix obtained by the maximum difference method is shown in Table 4, so the comprehensive score of each principal component can be calculated, and the variable expressions of the four principal components are as follows: Therefore, γ, C, φ, and H four principal components h are used as input variables to establish the RVM prediction model, which not only reduces the dimension of variables but also improves the operation speed and ensures the minimum loss of information carried by the initial variables.

Establish Prediction Model.
The PCA-RVM prediction model is established by using the data corresponding to the four principal components after dimension reduction as the input value and the safety factor (F s ) as the output value. In order to obtain a more accurate model, the Gauss kernel width needs to be optimized. After the model is adjusted, the error between the predicted results and the actual values is relatively small when the width of Gaussian core is between 1.76 and 1.82. In order to further improve the model accuracy, the kernel width interval is subdivided, and the kernel width is calculated as 1.76, 1.77, 1.78, 1.79, 1.80, 1.81, and 1.82. The average relative error of predicted samples corresponding to different kernel width values after subdivision is shown in Figure 3. From Figure 3, with the Gaussian kernel width σ, the average relative error of the prediction results is the smallest when the value is 1.78. So take σ = 1:78, the number of iterations is 1000.
Combined with the established model, 25~30 samples are calculated and analyzed. In order to verify the accuracy of the model and ensure the same sample conditions, the prediction results of the GEP prediction model and the RVM prediction model are analyzed and compared. It can be seen from Table 5 that the prediction results of the GEP model have the largest error, of which the maximum relative error is as high as 37.07%. The maximum relative error of RVM model is 8.14%. The maximum relative error of the PCA-RVM model is only 1.26%, and the error fluctuation range of each sample point is small. Therefore, the prediction accuracy of the PCA-RVM model is much higher than that of the other two models.
In order to compare the predicted results of the three models more intuitively, the predicted safety factors of each model are compared with the actual safety factors, as shown in Figure 4.
It can be seen from Figure 3 that the predicted value of the GEP model deviates greatly from the actual value in general, and the deviation of samples 25 and 27 is obvious, and only a few sample points are close to the actual value. The predicted value of the RVM model is basically consistent with the actual value, and the error of no. 25 and no. 26 sample points is large. The PCA-RVM model has the highest prediction accuracy, and the predicted values of each sample point almost coincide with the actual values. In order to compare the overall prediction accuracy and dispersion of the three models, the mean square error (FMSE) and average relative error (ARE) of the prediction results of each model are compared in Table 6. It can be seen from Table 6 that the PCA-RVM model is lower than the other two models in mean square error and mean relative error. In conclusion, compared with the GEP model and the RVM model, the PCA-RVM model has lower discreteness and higher overall accuracy.

Case Calculation
Taking the Fuwu mountain slope of a project as an example, the prediction model is compared with the traditional calculation formula.
Establish prediction model.

Physical and Mechanical Parameters of Rock and Soil
Mass of Slope. Plastic red clay (Q 4 el ): γ = 16:5 kN/m 3 , φ = 8:5°, and C = 30 kPa. According to the requirements of the code, the cohesion reduction factor is 0.5 and the internal friction angle reduction factor is 0.8 in the process of slope stability calculation φ = 6:8°, C = 15 kPa.

Failure
Mode of Slope. The whole slope is divided into AB, BC, CD, DE, and EF.
After the later excavation, the slope of section AB is composed of red clay and strong sandstone. The strongly    weathered sandstone belongs to extremely broken rock mass, and the moderately weathered rock mass belongs to relatively broken rock mass, which can occur circular sliding of soil layer, sliding along the rock soil boundary, and sliding inside the rock weathering line. After the later excavation of the BC section, the slope is composed of a small amount of red clay and strongly weathered sandstone. The strongly weathered sandstone belongs to extremely broken rock mass, and the moderately weathered rock mass belongs to relatively broken rock mass, which can slide in circular arc shape, along the geotechnical boundary and inside the weathering line.
After the later excavation, the slope of the CD section is composed of a small amount of red clay and strongly weathered sandstone. The strongly weathered sandstone belongs to extremely broken rock mass, and the moderately weathered rock mass belongs to relatively broken rock mass, which can slide in circular arc shape, along the geotechnical boundary and inside the weathering line.
After the later excavation of the DE section, the slope is composed of red clay and strongly weathered sandstone. The strongly weathered sandstone belongs to extremely broken rock mass, and the moderately weathered rock mass belongs to relatively broken rock mass, which can slide in circular arc shape, along the geotechnical boundary and inside the weathering line.
After the later excavation, the slope of the EF section is composed of miscellaneous fill, red clay, and strongly weath-ered sandstone. The strongly weathered sandstone belongs to extremely broken rock mass and can slide in circular arc.

Slope Stability
Analysis. The slope length of AB section is about 20 m, the slope height is about 17.38-36.5 m, and the slope aspect is 286°. The overburden of this section is residual slope red clay and strongly weathered sandstone, and the underlying bedrock is moderately weathered sandstone. The rock mass is relatively broken, and the occurrence of the rock is 148°∠53°. It is the reverse slope of rock. The occurrence of rock joint 1 in the slope is 311°∠76°. The occurrence of joint 2 is 145°∠80°. There is no external structural plane in the slope. The angle between joint 1 and slope is about 25 degrees, and the rock mass may be cut out along joint fissure 1.
The length of BC section is about 31 m, the height of this section is about 36.5-40.7 m, and the aspect is 286°. The overburden of this section is residual slope red clay and strongly weathered sandstone, and the underlying bedrock is moderately weathered sandstone. The rock mass is relatively broken, and the occurrence of the rock is 148°∠53°. It is the reverse slope of rock. The occurrence of rock joint 1 in the slope is 311°∠76°. The occurrence of joint 2 is 145°∠80°. There is no external structural plane in the slope. Joint 1 will have a tangential angle of about 25 degrees with an inclination of 76°> slope angle of foundation pit 63°. There is no free cutting surface, and there is no possibility of large-scale bedding cutting out of the rock mass, only the phenomenon of block falling caused by joint fracture cutting.
The slope length of CD section is about 62 m, the maximum height of vertical grading is about 33.8~40.6 m, and the slope aspect is 286°. The overburden of this section is residual slope red clay and strongly weathered sandstone, and the underlying bedrock is moderately weathered sandstone. The rock mass is relatively broken, and the occurrence of the rock is 148°∠53°. It is the reverse slope of rock. The occurrence of rock joint 1 in the slope is 311°∠76°. The occurrence of joint 2 is 145°∠80°. There is no external structural plane in the slope. Joint 1 will have a tangential angle of about 25 degrees with an inclination of 76°> slope angle of foundation pit 63°. There is no free cutting surface, and there is no possibility of large-scale bedding cutting out of the rock mass, only the phenomenon of block falling caused by joint fracture cutting.
The slope length of DE section is about 43 m, the maximum height of vertical grading is about 14.5-33.8 m, and the  slope aspect is 286°. The overburden of this section is residual slope red clay and strongly weathered sandstone, and the underlying bedrock is moderately weathered sandstone. The rock mass is relatively broken, and the occurrence of the rock is 148°∠53°. It is the reverse slope of rock. The occurrence of rock joint 1 in the slope is 311°∠76°. The occurrence of joint 2 is 145°∠80°. There is no external structural plane in the slope. Joint 1 will have a tangential angle of about 25 degrees with an inclination of 76°> slope angle of foundation pit 73°. There is no free cutting surface, and there is no possibility of large-scale bedding cutting out of rock mass. It can only be cut by joints and fissures to produce block falling phenomenon.
The slope length of EF section is about 15 m, the maximum height of the slope is 3.76~7.7 m, slope is rock soil mixed slope, and the slope direction is 21°. The occurrence of the strata is 148°∠53°. It is tangential to the main slope of the slope. At present, the toe of the slope is a concrete rubble retaining wall. The construction period of the wall is more than 5 years, and the stability of the retaining wall is good.

Slope Stability
Calculation. The transfer coefficient method is used to calculate the landslide thrust and the residual sliding force of lateral geotechnical pressure: where P n -residual sliding force per unit width of the Nth fast track (kN/m); P i -the residual sliding force per unit width of the i and i + 1 calculation blocks (kN/m), when P i < 0 (i < n) that Pi = 0; c i -the standard value of bond strength of rock and soil mass on sliding surface of block i is calculated (kPa); φ i -the standard value of internal friction angle of rock and soil mass on the sliding surface of block i is calculated (°); l i -the length of slide surface of the i calculation block (m); ψ i:1 -transfer coefficient of i:1 calculation block to i calculation block; T i -section i calculates the sliding force caused by gravity and other external forces per unit width of the strip (kN/m); R i -section i calculates the antisliding force caused by gravity and other external forces per unit width of the strip (kN/m); θ i , θ i−1 -the inclination angle of sliding surface of i and i:1 is calculated (°); G bi -the vertical additional load per unit width (kN/m) of the i calculation strip; when the direction is downward, the value is positive; when the direction is inward, the value is negative; G i -the weight per unit width of the i calculation block (kN/m); Q i -calculation of horizontal load per unit width of block i (kN/m). The failure mode of "sliding along circular arc inside soil layer" is selected for calculation, and the stability coefficients   The errors between the results and the traditional method are 5.2%, 3.7%, 0.7%, 1.5%, and 9.2%, respectively. The results show that the errors are conservative in design and calculation, which also verifies the reliability of the model and provides reference value for subsequent calculation and design.

Conclusion
(1) In this paper, the PCA and RVM models are used to predict the stability of rock slope. PCA is used to process the original data, and the six influence factors are reduced into four main influence factors, which reduce the complexity of the algorithm; RVM is applied to establish the mapping relationship between influence factors and slope stability after dimension reduction, so as to predict slope stability. The PCA-RVM model simplifies the complex problems and makes the prediction process more efficient and concise (2) The results show that the PCA-RVM model is superior to the GEP model and the RVM model in terms of mean square error and mean relative error in predicting rock slope stability, with higher accuracy and lower discreteness. In the aspect of slope stability, the PCA-RVM model has high credibility, and the predicted value is basically consistent with the actual value, which can provide reference for the prevention and control of slope disasters (3) Taking Fuwushan slope as an example, combined with the traditional slope stability calculation method, compared with the calculation results of the PCA-RVM model, the difference between the prediction results and the calculation results is small. However, because the model is based on a small amount of data, there is still a certain deviation between the predicted results and the actual value of the slope safety factor. Therefore, it is of great significance to widely collect the engineering case data to improve the accuracy and practicability of the model

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.