Predicting the Deformation of a Slope Using a Random Coefficient Panel Data Model

: Engineering constructions in coastal areas not only affect existing landslides, but also induce new landslides. Variation of the water level makes the coastal area a geological hazard-prone. Prediction of the slope displacement based on monitoring data plays an important role in early warning of potential landslide and slope failure, and supports the risk management of hazards. Given the complex characteristic of the slope deformation, we proposed a prediction model using random coefficient model under the frame of panel data analysis, so as to take the correlation among monitoring points into consideration. In addition, we classified the monitoring data using Gaussian mixture model, to take the temporal-spatial characteristics into consideration. Monitoring data of Guobu slope was used to validate the model. Results indicated that the proposed model have a better performance in prediction accuracy. We also compared the proposed model with the BP neural network model and temporal – temperature model, and found that the prediction accuracy of the proposed model is better than those of the two control models.


Introduction
Landslides and slope failures are worldwide natural disasters [1][2][3].In coastal areas, the construction of hydraulic constructions may change the geological environment of banks slopes, and thus influence the stability of slopes.Hazards such as landslides and slope failures are also affected by variations in water level.Water not only activate existing landslides on the slope, but also may increase the possibility of new landslides, making the water area a geological high-risk area [4].For example, resulting from the construction of the Grand Coulee Dam in Roosevelt Lake, America, from 1941 to 1953, more than 500 landslides occurred [5].The same problems also occurred in China's Three Gorges Reservoir [6][7][8].Since the Three Gorges Reservoir has been constructed, due to the rise in water level in the reservoir, more than 4 thousand geological hazards have occurred from Yichang to Jiangjin along the Yangtze River, most of which were landslides, with a total volume of more than 4 billion m 3 , such as the famous Qianjiangping landslide, the Shuping landslide, and the Muyubao landslide [9][10][11].
Once a slope near water areas such as reservoirs and lakes fails, it may induce a series of secondary disasters such as disastrous waves, and destroy the nearby hydraulic constructions [12][13][14].For instance, a historical event occurred at the Vajont Reservoir in Italy in 1963 [15,16].The landslide formed an impulse wave that over-topped the dam and damaged two villages downstream.A recent example was occurred in Greenland in 2017, where a huge landslide impacted the Karrat Fjord and generated a wave which traveled 32 km to Nuugaatsiaq [17,18].Monitoring systems and early warning systems have been widely used to control the risk of such natural hazards.By analyzing the monitoring data recorded from the monitoring systems, previous studies have found that the possibility of failure of bank slope in coastal areas is not only relevant to the geological factors, but also to the water level fluctuations.Therefore, in this paper, we gave a specific insight to the reservoir bank slope, with consideration of fluctuation of the water level.
Slope deformation is the most intuitive indicator for characterizing the inherent evolution of landslides.Predicting the deformation of slope is of great importance for detecting potential slope failures and landslides.Previous studies have attempted a number of methods to predict the slope deformation, including numerical simulations and statistical analyses based on monitoring data.The current numerical methods for evaluating the slope stability include the finite element method [19][20][21], the distinct element method [22,23], the limit equilibrium method [24], and so on.Researchers have also carried out large deformation analyses for slope instability.For example, Esposito et al. proposed a numerical method based on practical equivalent continuum approach, so as to analyze the deformation of a gravitational slope in the Lepini Mountains, Italy [25].Guallini et al. conducted a structural and kinematic analysis for the deformational system observed in the South Polar Layered Deposits [26].However, due to the complexity of the structure of slopes, numerical simulations can hardly reflect the actual internal mechanisms of slope deformation.
Statistical analysis based on monitoring data quantify the relation between the slope deformation and the relative explanatory variables obtained monitoring system [27].The advantage of statistical methods is that the prediction model uses real-time monitoring data, and can easily update up-to-the-minute monitoring data in the model [28][29][30].Previous studies have attempted numbers of monitoring methods.For instance, Liu et al. used a InSAR technology to identify landslides in Badong, and found the correlation between landslide movement and water level variation [31].
With the increase in monitoring sensors installed in the slope, mass monitoring data are obtained from the monitoring system.With the developing of computing science, intelligent computing methods, such as gray theory, artificial neural networks, and support vector machines, are applied to monitoring data modeling [32][33][34][35].Also, prediction models are improved from single monitoring point models to multiple monitoring points models.It should be noted that the displacement at each monitoring point relates to the displacement in the surrounding area.However, none has consider the correlations among different monitoring points in prediction models of slope displacement.
In this paper, to consider the correlations among monitoring points, we proposed a prediction method under the frame of panel data analysis, combining random coefficient model and Gaussian mixture model.We first classified the monitoring points into four groups, based on their data characteristics using a Gaussian mixture model, so as to take the temporal-spatial characteristics of monitoring points into account.With the clustering results, we established the random coefficient model for each group.A case study of the Guobu slope [36], a bank slope at the Laxiwa reservoir, was conducted to validate the model.Specifically, as the stability of slope near water areas is affected by the fluctuation of water level, the reservoir water level was selected as an explanatory variable in the proposed model.We also compared the prediction accuracy of the proposed model with the BP neural network model and temporal-temperature model, which are one of the most commonly used artificial models and statistical models, respectively.
The paper is organized as follows: Section 2 introduces the model development, including the random coefficient model and the clustering method.Section 3 presents the geological information of the Guobu slope study area and the monitoring dataset selected in this paper.Prediction results of the proposed model, as well as a comparison with other methods, are given in Section 4. Section 5 concludes the paper with highlighted remarks.

Model Development
In this section, to consider the correlations among monitoring points, we developed the prediction model based on the random coefficient model (RCM) and Gaussian mixture model (GMM) under the frame of panel data analysis.As the first step, we classified the monitoring points into several groups according to the data characteristics using the Gaussian mixture model, in order to take the temporal-spatial characteristics of monitoring points into account.With the clustering results, we established the random coefficient model for each data group.Specifically, as the stability of slope near water areas is affected by the fluctuation of water level, the reservoir water level was selected as an explanatory variable in the proposed model.The flow chart of the proposed model for predicting the displacement of slope is shown in Figure 1.

Data Clustering Based on Gaussian Mixture Model
To correlations among data at different monitoring points, we classified the monitoring points into several groups according to the data characteristics using Gaussian mixture model (GMM).
The GMM is a probabilistic model that assumes all the data points are generated from a mixture of several Gaussian distributions with unknown parameters, which means the data can be fitted by mixtures of several Gaussian distributions using maximum likelihood estimation method.The model aims to identify the underlying subpopulations within the overall population.Using one-dimensional dataset as an example, the probability density function of variable x obeys a mixture of two Gaussian distributions: where k = 1, 2 counts the two Gaussian distributions, the kth prior probability is {p 1 = 1/2, p 2 = 1/2}, {µ k } denotes the mean value, and σ denotes the standard devi- ation of the two Gaussian distributions.{x n } N n=1 are independent samples in the dataset which follow Gaussian distributions.k n denotes the unknown cluster tag of the nth test.
In the case where σ and {µ k } are known, the posterior probability of nth test k n can be written as with θ = {{µ k }, σ}.In the case where {µ k } is unknown and σ is known, {µ k } can be obtained from {x n } N n=1 .Then, the the iterative algorithm of {µ k } that maximizes the likelihood estimation can be determined: (3) Considering L as the natural logarithm of the likelihood, the derivative of logarithm likelihood of µ k is: with p k|n = P(k n = k|x n , θ) being the Gaussian density.Ignoring the items in ∂ ∂µ k P(k n = k|x n , θ), the second derivative can be approximated as: Then, the initial µ 1 , µ 2 can be iterated to µ ′ 1 , µ ′ 2 using the approximate Newton-Raphson steps: For a multi-dimensional dataset, the mixture density of Gaussian distributions can be written as: where k counts the Gaussian distribution, i counts the dimension of data, n counts the data sequence, I denotes the total number of the data's dimension, π k the weighting, µ These equations form the basis of the Expectation-Maximization algorithm used to estimate the parameters of the GMM iteratively.The GMM uses the principle of maximum likelihood estimation to iteratively update the parameters of multiple Gaussian distributions that best fit the observed data.By classifying data points into different Gaus-sian components, GMM captures the underlying structure of the data, which can exhibit complex dependencies and multi-scale patterns often seen in fractional processes.

Random Coefficient Model
For the panel data analysis, the dataset can be considered as a three dimensional dataset, which contains a time series and a data panel.Figure 2 illustrates the structure of a panel data model.The random coefficient panel data model (RCM) allows regression coefficients to vary across different individuals or entities.This model is particularly useful for panel data, which contains multiple observations over time for the same entities, capturing individual-specific effects.Equation ( 10) expresses a two dimensional panel regression model: where y it is the displacement data panel, x kit denotes the explanatory variables, t is the time index, i is the cross section index, k is the explanatory variables index, β ki includes is the operator between each individual and common mean value, and u is a random interference term.Assuming Integrating the NTth observation data point, the following equation can be obtained: where the number of panel, T is time series, the i-th diagonal block is ψ i = X i ∆X ′ i + σ 2 i I T , and Xγ + u is the compound error term.
The estimator of β with the ordinary least squares is biased.Once 1 NT X ′ X converges into a non-zero constant matrix, we can hence obtain a consistent ineffective estimation.The optimal linear unbiased estimator of β can be removed from the GLS: βGLS is an effective estimation of β which follows an asymptotic normal distribution.
The variance of the estimator is:

Study Area
The Laxiwa hydropower station is located in Laxiwa valley, Qinghai Province, China.It is known as the largest hydro-power station constructed on the Yellow River.The construction of the station has been prepared since October 2001.The excavation on the dam's abutment began in September 2004, and the impoundment started in March 2009.After the impoundment, the water level rose by about 200 m from 2254 m.The first two electric generators were put into function in May of the same year [37,38].
The Guobu slope is a northwest-oriented slope with an average angle of 43 • , and is located at the right bank of upstream of the dam at Laxiwa station.Figure 3 illustrates the geographical location of the Guobu slope, and the relative location between Laxiwa hydropower station and Guobu slope.Remarkable deformation, frequent rockfalls, and small scale collapses have been observed on the Guobu slope in 3 months after the impoundment of the dam at Laxiwa station [39].
As the deforming slope is very close to the dam site (about 700 m from the dam) and has a large volume (about 700 m high and 1000 m wide in average), once the slope fails, the event will destroy the dam and the hydropower station.Also, the associated waves created by the induced landslide would flood the downstream area, and threaten people and constructions located downstream.Similar disasters had occurred in history; in 1993 in Vajont, Italy, failure of the left bank slope of the Vajont reservoir over-topped the dam and caused more than 2000 casualties downstream [40].According to the characteristics of the slope surface, previous research has estimated that the total volume of potentially unstable mass can be 120 to 150 Mm 3 ; that is, about one half of that of the Vajont landslide [36].

Dataset
The monitoring system of deformation of the slope was built quickly after the deformation had been observed.The deformation of the slope was monitored since then.Different techniques were used to monitor the situation of the slope: displacement monitoring sensors, boreholes, exploration tunnels, and so on.All monitoring results reveal that the slope is deforming continuously, and the risk of slope failure exists.Due to space limitations, in this paper, we only selected the surface displacement monitoring dataset of 12 monitoring points to establish the prediction model, i.e., TP1-8, TP2-7, TP2-8, TP-9, TP3-12, TP3-13, TP4-2, TP4-6, TP4-7, TP5-5, TP5-8, and TP5-9.The data were collected from 29 May 2011, to 13 November 2013.The sensor that recorded the displacement data was a prism.Figure 4 shows the geological positions of selected monitoring points on the slope.The scale of the picture is marked by the yellow line at the bottom of the figure.The monitoring points are evenly distributed on the topping of the slope.Table 1 exhibits the coordinates (x, y) and altitude z of the 12 monitoring points selected in this paper.The Cartesian coordinate system is used to describe the location of monitoring points.The origin of (x, y) is 36.006• N, 101.149 • E.  The displacement data of the selected 12 monitoring points are used to develop the prediction model.Figure 5 the time variation of monitored slope displacement at each monitoring point.In general, the displacement data at all monitoring points increase over time.The varying tendencies of monitoring data at monitoring points TP1-8, TP2-7, TP2-8 and TP-9 fluctuate significantly, whereas the varying tendencies of monitoring data at other monitoring points are relatively smooth.It can be seen from the figure that the displacements of TP1-8, TP2-7, TP2-8 and TP-9 are much smaller than those of other monitoring points.The maximum displacements of these four monitoring points are lower than 250 mm, the maximum displacements at other monitoring points are larger than 2500 mm.To ensure the visual presentation quality, we here presented data of the 12 monitoring points using two subfigures.Resulting from the differences of data varying range and sensors installed in the slope, the accuracy of monitoring data at each monitoring points are different.It should be noted that the characteristics of monitoring data may affect the performance and accuracy of prediction model.In addition, it can be seen from Figure 5 that the displacements at most monitoring points during the water filling period were more significant than the period with stable water level, which verified the necessity of considering the variation of water level in the model development.

Prediction Using RCM-GMM Model
We first classified the monitoring points into several groups according to the characteristics of monitoring data using the Gaussian mixture model, and then modeled the displacement of monitoring points at each group using the random coefficient model.According to the clustering results estimated using the Gaussian mixture model, we then modeled the monitoring data of monitoring points in each cluster using the random coefficient model.The mathematical details of the random coefficient model were introduced in Section 2.2.In total, 90% of the dataset was selected as training data, and 10% was selected as testing data.We first developed the prediction model using the training dataset, and then validated the model using the testing dataset.
Figure 8 displays the monitored and modeled displacement of the slope for each monitoring point.The black solid line represents the monitoring data, and the red dashed line denotes the modeled dataset.The test dataset is highlighted by the gray area in the figures.For monitoring points TP3-12, TP3-13, TP4-2, TP4-6, TP4-7, TP5-5, TP5-8, and TP5-9, the time evolution of monitoring data varies smoothly, and the modeled data fits quite well with the monitoring data.For monitoring points TP1-8, TP2-7, TP2-8 and TP2-9, the time evolution of monitoring data has significant fluctuations, the modeled data generally fits the midline of monitoring data.
where δ i denotes the monitored data of the training dataset, where i counts the series number of the training dataset, δ is the average of the monitored data of the training dataset, δ denotes the modeled data of the training dataset, and N counts the total number of data in the training dataset.It can be seen from the figure that the coefficients of determination of most monitoring points are higher than 0.9 and close to 1, which represents a fairly good prediction accuracy.However, the coefficients of determination of monitoring points TP2-8 and TP2-9 are significant lower than those of other monitoring points.The root mean square error (RMSE) represents the derivation between the estimator and the observer of the testing dataset, whose expression can be written as: where j is the series number of the testing dataset, M is the total number of data in the testing dataset, δ j is the monitored data of the testing dataset, and δj the modeled data of the testing dataset.As shown in Figure 9, the RMSE of the monitoring points TP1-8, TP2-7, TP2-8, TP4-7, and TP-9 are lower than 10 mm.The RMSE of TP3-12, TP5-5, and TP5-9 ranges between 50 mm and 100 mm, and they are obvious larger than those of other monitoring points.The lower error at monitoring points TP2-8 and TP2-9 are attributed to a combination of measurement source, interference, quality historical data, and specific geographic or structural conditions, etc.The historical data available for these two points are more consistent comparing with other monitoring points which are influenced by structural vibrations or geographic variations, providing a better basis for accurate predictions.The displacement of slope of these three monitoring points are larger than 2000 mm, that is,the ratios of RMSE to observed dataset are still very small.For all monitoring points, the varying range of the ratio of RMSE to observed dataset is between 0.086% and 11.866%, which demonstrates the high prediction accuracy of the proposed model.Understanding the specific reasons would require a detailed analysis of the monitoring setup and the conditions at each point.This paper focus on the global performance of proposed model rather than the specific performance at each monitoring point, we hence do not enter the discussion here.

Comparison of the Prediction Performance of RCM with BP and TT
In addition to the random coefficient model proposed in this paper, we also predicted the displacement of slope for all 12 monitoring points using temporal-temperature model (TT) and BP neural network model (BP), which are one of the most common statistical model and artificial methods, respectively.The choice of the random coefficient model, BP neural network model, and temporal-temperature model balances demonstrating the added value of the proposed model, covering a methodological spectrum, ensuring domain relevance, and maintaining practical feasibility.The proposed model is developed by combining the random coefficient model and Gaussian mixture model.Comparing it with the traditional random coefficient model demonstrates the benefits of classifying the dataset using the Gaussian mixture model.BP neural network model is a well-established method in machine learning for time series prediction, and it is often used as a benchmark due to its strong performance in capturing non-linear relationships.Temporal-temperature model is the most commonly used statistical model.Hence, these two models are selected as representations of machine learning methods and statistical methods, respectively.These comparisons provide a clear and compelling case for the effectiveness and innovation of the proposed model.
Figure 10 shows the predicted displacement of slope determined by TT, BP and RCM for each monitoring point.The black solid line represents the monitored data.The red dashed line, green dashed line, and pink dashed line denote the predicted data estimated by TT, BP and RCM, respectively.The black dashed line separates the training dataset and testing dataset into two parts: data at the left side is the training dataset, and data at the right side is the testing dataset.In general, the predicted dataset of TT has a significant deviation from those of BP and RCM, and the predicted dataset of BP and RCM are fairly close.To quantify the prediction precision of the temporal-temperature model (TT), BP neural network model (BP), and random coefficient model (RCM), we calculated the coefficients of determination of these three models for each monitoring point, which are exhibited in Table 2.For all three models, the coefficients of determination of most monitoring points are greater than 0.9.For monitoring points TP2-8 and TP2-9, the coefficients of determination of all these three models are relatively low, which results from the high fluctuation of the monitoring data.For monitoring points TP2-7, TP2-8, TP2-9, TP3-12, TP5-5, TP5-8, and TP5-9, BP has the largest coefficient of determination; for monitoring points TP1-8, TP3-13, TP4-2, TP4-6, and TP4-7, RCM has the largest coefficient of determination.In general, BP and RCM have better performance in prediction precision than TT.
Figure 11 shows the root mean square error of each monitoring points of the temporaltemperature model, BP neural network model and random coefficient model, respectively.The root mean square error of the temporal-temperature model is significantly higher than those of BP neural network model and random coefficient model; the two models withdrawal have similar values of root mean square error.That is, the BP neural network model and random coefficient model have similar prediction accuracies, and both have better performances than temporal-temperature model.In addition, the accuracies for each monitoring points are different.The prediction performances for TP1-8, TP2-7, TP2-8, and TP2-9 are fairly good for all these three prediction models.To further estimate the prediction abilities of these three models, we selected 5%, 10%, and 15% of the dataset as test data, and calculated the error indicators of the prediction ability under different prediction length.The error indicators used here include: the average of mean square error (AMSE), the symmetric mean absolute percentage error (ASMAPE), and the mean absolute percentage error (AMAPE) of the prediction results of all monitoring points.The expressions of AMSE, ASMAPE, and AMAPE are as follows: where δ ij is the predicted data; δij is the monitored data; i and j count the monitoring point and the data sequence of each monitoring point, respectively; N and M denote the total number of monitoring points and the total quantity of data of each monitoring point, respectively.Figure 12 illustrates the AMSE, ASMAPE, and AMAPE of the above three models under the prediction length of 5%, 10%, and 15%.The varying range of AMSE, ASMAPE, and AMAPE of these three models are: 505-40995, 0.018-0.00422,and 0.0174-0.0418,respectively.It can be seen from the figure that the error indicators of all these three models increase with the increase in the prediction length, that is, the prediction accuracies decrease with the increase in the prediction length.In addition, the prediction accuracy of the random coefficient model is higher than those of BP neural network model and temporal-temperature model.

Conclusions
Predicting the displacement of slope based on monitoring data is an effective method to evaluate the risk of slope failure and landslide.The displacement at each monitoring point relates to the displacement in the surrounding area.The objective of this study was to take the correlations between monitoring data at different monitoring points into account in the prediction model.
With this objective in mind, we provide a prediction method under the frame of panel data modeling based on Gaussian mixture model and random coefficient model.
Many existing models used time series data from individual monitoring points without considering the cross-sectional dimension (i.e., different monitoring points) simultaneously.However, monitoring points within a slope may be influenced by common factors and hence are not entirely independent.Employing panel data analysis incorporates the correlation among monitoring points, which enhances the model's robustness by capturing both the temporal dynamics and spatial heterogeneity in slope deformation data.In addition, by classifying the monitoring data using a Gaussian mixture model, the proposed model effectively captures and distinguishes different temporal-spatial patterns in the deformation data.This classification enables a more nuanced understanding and modeling of the deformation process, allowing the model to account for complex and heterogeneous behaviors in the slope.
Using the Gaussian mixture model, we first classified the monitoring data of different monitoring points into 4 clusters considering the temporal-spatial characteristics of monitoring points.With the clustering results, we established the prediction model for each cluster using the random coefficient model.We use monitoring data of Guobu slope to validate the model, which is famous slope located on the bank of Yellow river near Laxiwa hydro station and with a high risk of slope failure.90% of the dataset are selected as training data, and 10% are used as testing data.Results show that the coefficients of determination of the proposed model are larger than 0.9 and the ratios of RMSE to recorded data is lower than 1%, indicating the high prediction performance of the proposed model.We also compared the prediction accuracy of the proposed model (RCM) with BP neural network model (BP) and temporal-temperature model (TT), which are one of the most commonly used artificial model and statistical model, respectively.Results showed that the prediction accuracy of RCM is slightly higher than that of BP, and significantly better than that of TT.

Figure 1 .
Figure 1.Flow chart of the proposed model for predicting the displacement of slope.
value and variance of Gaussian distribution, and x (n) i is the variance.The iterative formulas of σ (k) i and π k are:

Figure 2 .
Figure 2. Sketch of the random coefficient panel data model.

Figure 4 .
Figure 4.The positions of the selected monitoring points.

Figure 6
Figure 6 shows the time variation of the water level in the reservoir.The water level was relatively stable during two periods: the water level fluted around 2430 m during June 2011 to December 2011; and the water level fluted around 2445 m during August 2012 to November 2013.The water level increased gradually from 2430 m to 2445 m from December 2011 to August 2013.In addition, it can be seen from Figure5that the displacements at most monitoring points during the water filling period were more significant than the period with stable water level, which verified the necessity of considering the variation of water level in the model development.

Figure 6 .
Figure 6.Time variation of the water level in the reservoir.

Figure 7
shows the clustering results of the monitoring points using the Gaussian mixture model.The scale of the picture is marked by the yellow line at the bottom of the figure.Mathematical details of the clustering methods have been presented in Section 2.1.The 12 monitoring points are classified into four groups, based on similarities of characteristics of data sequence: cluster 1 contains TP1-8, TP2-7, and TP2-8; cluster 2 contains TP2-9; cluster 3 contains TP3-12, TP4-2, and TP5-5; cluster 4 contains TP3-13, TP4-6, TP4-7, TP5-8, and TP5-9.

Figure 7 .
Figure 7. Clustering results of the monitoring points.

Figure 9
Figure 9 shows the coefficient of determination and root mean square error (RMSE) of results modeled by RCM for each monitoring point.The blue line denotes the coefficient of determination and the green bar denotes the RMSE.The coefficient of determination represents the correlation between monitored data and predicted data of the training dataset, and reflects the prediction precision of the prediction model.The expression of the coefficient of determination R and the

Figure 9 .
Figure 9.The coefficient of determination and RMSE of results modeled by RCM for each monitoring point.

Figure 12 .
Figure 12.Comparison of the (a) AMSE, (b) ASMAPE, and (c) AMAPE of the three prediction models under different predicting length.

Table 1 .
The spatial coordinates (x, y, z) of the selected monitoring points.

Table 2 .
The coefficients of determination of TT, BP and RCM for each monitoring point.
Figure 11.Root mean square error of TT, BP and RCM for each monitoring point.