Landslide Susceptibility Assessment in Active Tectonic Areas Using Machine Learning Algorithms

: The eastern margin of the Tibetan Plateau is one of the regions with the most severe landslide disasters on a global scale. With the intensification of seismic activity around the Tibetan Plateau and the increase in extreme rainfall events, the prevention of landslide disasters in the region is facing serious challenges. This article selects the Bailong River Basin located in this region as the research area, and the historical landslide data obtained from high-precision remote sensing image interpretation combined with field validation are used as the sample library. Using machine learning algorithms and data-driven landslide susceptibility assessment as the methods, 17 commonly used models and 17 important factors affecting the development of landslides are selected to carry out the susceptibility assessment. The results show that the BaggingClassifier model shows advantageous applicability in the region, and the landslide susceptibility distribution map of the Bailong River Basin was generated using this model. The results show that the road and population density are both high in very high and high susceptible areas, indicating that there is still a significant potential landslide risk in the basin. The quantitative evaluation of the main influencing factors emphasizes that distance to a road is the most important factor. However, due to the widespread utilization of ancient landslides by local residents for settlement and agricultural cultivation over hundreds of years, the vast majority of landslides are likely to have occurred prior to human settlement. Therefore, the importance of this factor may be overestimated, and the evaluation of the factors still needs to be dynamically examined in conjunction with the development history of the region. The five factors of NDVI, altitude, faults, average annual rainfall, and rivers have a secondary impact on landslide susceptibility. The research results have important significance for the susceptibility assessment of landslides in the complex environment of human–land interaction and for the construction of landslide disaster monitoring and early warning systems in the Bailong River Basin.


Introduction
Globally, landslides remain the most dominant type of geological hazard in mountainous areas, causing a significant loss of life and damage to infrastructure every year [1,2], especially with intense seismic activity, extreme rainfall, and snowmelt events leading to an increasing trend of group-occurring landslides [3][4][5].How to prevent landslide disasters with high efficiency and precision and low cost in vast mountainous areas has become an important challenge for the construction of geological disaster prevention systems in many countries.In the research on preventing regional landslide disasters, susceptibility assessment has become an important element that cannot be ignored and replaced [6].Landslide susceptibility assessment is an important method for evaluating the possibility of landslide occurrence.By analogy with the environment in which landslides have occurred in the past, the likelihood of future landslides in each surveying unit (grid or slope unit) is evaluated [7].The results have important practical value for disaster prevention planning, decision-making, and deployment of monitoring equipment within the region [8].The traditional assessment of landslide susceptibility mainly relies on on-site investigation, recording by researchers, and feedback from ground contact monitoring equipment [9].Although the evaluation results of these methods are accurate and reliable, they are inefficient and require a great deal of manpower and financial resources, especially in high-altitude mountainous areas with complex terrain conditions, which heavily rely on geological surveys to obtain landslide inventories.It is unrealistic to conduct a landslide susceptibility assessment based on investigation and monitoring in mountainous areas ranging from hundreds to tens of thousands of square kilometers.This phenomenon has been completely improved with the popularization of high-precision remote sensing image, DEM (Digital Elevation Model), geological map, rainfall, and other refined data, as well as the significant improvements in software and technical performance, such as Geographical Information System platforms [10][11][12].
In recent years, methods of landslide susceptibility assessment driven by big data have significantly taken a dominant position and begun to play an increasingly important role.In addition to the development of air-space-ground remote sensing technology that provides a large amount of high-quality geographic data for landslide disaster analysis, end-to-end machine/deep learning technology can automatically fit labeled datasets without the need for in-depth research on complex landslide mechanisms to achieve prediction of potential locations of landslide occurrence [13].Models for evaluating landslide susceptibility have emerged in large numbers with the deepening of research [14,15].However, it is widely believed that different models have adaptability issues for the environment and scale [16,17].Therefore, selecting advantageous models suitable for regional features from a large number of models has become a key step in conducting susceptibility assessments in complex regions.The rapid development of artificial intelligence (AI) technology in recent years has provided broad prospects for solving this issue and has achieved success in various geological hazard assessments [14,18,19].In addition, the advantage of using machine learning for landslide susceptibility assessment is that it can quantitatively reveal the relative contribution of different factors to landslide development among numerous influencing factors [12,13].The significance of regional disaster prevention lies precisely in the targeted prevention and control of landslide development based on specific factors, which is also the most important value of landslide susceptibility assessment.
The eastern margin of the Tibetan Plateau is a globally renowned zone of rapid terrain change, with active tectonic movements, strong river incision and downcutting, and high population density [1,2], which makes it one of the most severe areas for landslide disasters in the world.The complex geological environment, extremely variable climate, and excessive human activities have caused landslides in the region to be characterized by suddenness, group occurrence, and multi-type concurrency, making prediction and management extremely difficult.Due to various factors such as terrain conditions, vegetation coverage, and disturbance by human activities, it is difficult to systematically catalog and evaluate the susceptibility of landslides in this area.In order to effectively mitigate the risk of landslide disasters and carry out systematic monitoring in the key areas, this article selects the Bailong River Basin on the eastern margin of the Tibetan Plateau as the research area.Using machine learning methods, the optimal model for evaluating susceptibility in alpine-gorge areas is selected from various algorithms, and quantitative evaluation of the main factors that affect landslide development is attempted.The final results can provide scientific reference for the monitoring and early warning of landslides in the complex human-land interaction environment on the eastern margin of the Tibetan Plateau.

Study Area
The Bailong River is located at the intersection of the eastern margin of the Tibetan Plateau and the West Qinling Mountains (Figure 1) and is a secondary tributary of the Yangtze River and a primary tributary of the Jialing River.With the development of largescale fault zones and intense topographic relief, the Bailong River Basin has become one of the most serious regions for landslide disasters in China under the combined effects of Remote Sens. 2024, 16, 2724 3 of 16 earthquakes, heavy rainfall, and human engineering activities [20,21].The total population within the watershed exceeds one million, mainly distributed in the valley and hillslope zones on both sides of the Bailong River.Long-term tracking investigations have found that due to high population density and a scarcity of land resources, the widespread utilization of old landslides as buildings, roads, and agricultural land is one of the main reasons for serious landslide disasters.The risk of landslide reactivation forming is a serious threat to the lives and properties of local residents.For example, on 12 July 2018, a large-scale landslide in Nanyu Township, Zhouqu County was reactivated, and the landslide deposit blocked most of the Bailong River channel to form a dam and backwater, causing the water level of the Bailong River to rise by 8 m in a short period of time, flooding the bridges, roads, hydropower stations, and most residential buildings in Nanyu Township [22]; and on 19 July 2019, the Yahuokou landslide, which was a long strip-shaped landslide formed along a groove of the fault zone, reactivated with a length of up to 2.0 km and an average width of less than 100 m.After the landslide occurred, it destroyed the village roads and the factory buildings at the lower edge of the slope that were perched on the old landslide body [23]; on 16 August 2020, a continuous heavy rainfall event in the Bailong River Basin triggered a large number of shallow landslides, causing serious economic losses and ecological damage; and on 18 January 2021, the Lijie landslide in Beishan Village, Zhouqu County, was reactivated, which led to the emergency evacuation of thousands of people in the township [24].The above cases are only the tip of the iceberg of landslide disasters in the Bailong River Basin.
The Bailong River is located at the intersection of the eastern margin of the Tibetan Plateau and the West Qinling Mountains (Figure 1) and is a secondary tributary of the Yangtze River and a primary tributary of the Jialing River.With the development of largescale fault zones and intense topographic relief, the Bailong River Basin has become one of the most serious regions for landslide disasters in China under the combined effects of earthquakes, heavy rainfall, and human engineering activities [20,21].The total population within the watershed exceeds one million, mainly distributed in the valley and hillslope zones on both sides of the Bailong River.Long-term tracking investigations have found that due to high population density and a scarcity of land resources, the widespread utilization of old landslides as buildings, roads, and agricultural land is one of the main reasons for serious landslide disasters.The risk of landslide reactivation forming is a serious threat to the lives and properties of local residents.For example, on 12 July 2018, a large-scale landslide in Nanyu Township, Zhouqu County was reactivated, and the landslide deposit blocked most of the Bailong River channel to form a dam and backwater, causing the water level of the Bailong River to rise by 8 m in a short period of time, flooding the bridges, roads, hydropower stations, and most residential buildings in Nanyu Township [22]; and on 19 July 2019, the Yahuokou landslide, which was a long stripshaped landslide formed along a groove of the fault zone, reactivated with a length of up to 2.0 km and an average width of less than 100 m.After the landslide occurred, it destroyed the village roads and the factory buildings at the lower edge of the slope that were perched on the old landslide body [23]; on 16 August 2020, a continuous heavy rainfall event in the Bailong River Basin triggered a large number of shallow landslides, causing serious economic losses and ecological damage; and on 18 January 2021, the Lijie landslide in Beishan Village, Zhouqu County, was reactivated, which led to the emergency evacuation of thousands of people in the township [24].The above cases are only the tip of the iceberg of landslide disasters in the Bailong River Basin.The serious geological disasters that occur in the Bailong River Basin arise first from the geological structural conditions.Driven by the far-field stress caused by the expansion of the Tibetan Plateau, the collision between the Bailongjiang Block and the Bikou Block The serious geological disasters that occur in the Bailong River Basin arise first from the geological structural conditions.Driven by the far-field stress caused by the expansion of the Tibetan Plateau, the collision between the Bailongjiang Block and the Bikou Block formed structural characteristics of the basin [25], with a stratigraphic structure mainly composed of Carboniferous to Triassic limestone nappes and lower Silurian weak phyllite.Since 1.7 Ma, the rapid downcutting of rivers has led to the abundant exposure of weak phyllite and carbonaceous slate.As a result, earthquakes and rainfall have become the main triggering factors for landslides in the limestone nappe and phyllite areas, respectively.
The dry-hot valley climate within a local area is another important reason for the frequent occurrence of landslides.Since the 1950s, extensive deforestation has directly contributed to the exacerbation of the dry-hot valley climate in the basin, with frequent occurrences of extreme rainfall events.Therefore, under the coupling and synergistic effect of internal and external factors, this area has become an ideal site for landslide development.
Despite the extreme development of historical landslides in the Bailong River Basin, a systematic landslide inventory is still lacking.The previous records of landslides in this area mainly include sites of landslides on both sides of highways and river channels that pose a threat; however, there is a lack of specific parameter indicators such as range and area, making it difficult to use these records for susceptibility assessment.Historical landslide inventories are an important foundation for studying spatial distribution patterns, susceptibility assessment, risk evaluation, and formation mechanisms [26].Qi utilized the popularization of high-resolution remote sensing images and compiled a historical landslide database of the entire watershed for the first time through remote sensing interpretation combined with on-site verification, as shown in Figure 1 [27].This inventory covers 6609 landslides in the Bailong River Basin, including 140 giant landslides, 2619 large landslides, 3176 medium-sized landslides, and 674 small landslides.Visual inspection reveals that landslides in this area are mainly distributed along fault zones and on both sides of river valleys, closely related to the scope of human distribution and activity.Therefore, it is urgent to conduct a landslide susceptibility assessment in this area to provide a foundation for systematic disaster prevention and reduction planning.

Selection of Evaluation Factors
The reason why the complex geological environment and the scope of human activities easily become the ideal development areas for landslides is that these areas contain almost all the necessary factors that induce and affect the occurrence of landslides.The Bailong River Basin is located at the junction of the eastern margin of the Tibetan Plateau and the West Qinling Mountains.It is the transition zone of China's first-and second-order terrain steps.During the formation of the Qinling Mountains in the Indosinian and the uplift of the Tibetan Plateau in the Cenozoic, it experienced strong crustal uplift and thrust compression, resulting in the formation of a wide distribution of fault fracture zones, accompanied by the erosion of rivers, resulting in the formation of a landscape with ridges and valleys.The region has a typical mountain climate, and extreme rainfall events are the main external factors inducing landslides.In recent years, intensive human activities have had a profound impact on the formation and development of landslides in the study area.Based on this background, the landslide control and inducing factors considered in this paper involve a wide range of fields, including terrain (slope, aspect, altitude, local relief, plane curvature, profile curvature, topographic wetness index, surface roughness, water system), geological structure (lithology, fault layer, topographic/bedding-plane intersection angle (TOBIA) [28]), material conditions (land use, soil type, Normalized Difference Vegetation Index (NDVI)), and human activities and rainfall (road, population density, annual rainfall).Each factor may become a direct or indirect inducing condition of landslides.Table 1 briefly describes the types and sources of these factors.The extraction of geomorphic factors is based on DEM data with a resolution of 12.5 m (from Japan Aerospace Exploration Agency's ALOS satellite); the geological factors are derived from the 1:50,000 and 1:100,000 geological maps of China; the soil data and land use data are from the global land-cover product with fine classification system at 30 m using time-series Landsat imagery [29]; NDVI data is from GF-1 8 m resolution satellite data; road data is obtained by interpreting Google Earth image with 1 m resolution; and rainfall data are from the monitoring data of rainfall stations in Bailong River Basin.

Data Preprocessing
Among the selected factors, some of them may have strong correlation to a certain extent, showing an approximate linear relationship, which will disrupt the regression and fitting process of the model and affect the stability of the model operation [30].The strong correlation of factors has an impact on the regression and fitting process: the regression and fitting of the model aims to find the relationship between each factor and the target variable.Assuming that the correlation between the two factors is too close, the change of one factor may lead to the corresponding change of the other factor when dealing with the conditional factor and the target variable, so it is impossible to fix other conditions to analyze the impact of a single factor on the output result, because one factor in the fitting will be mixed with the impact of another factor.Therefore, collinearity analysis is the first step in regression and fitting.

Selection of Machine Learning Algorithms
With the rapid development of artificial intelligence technologies, machine learning has flourished in various industries since the 1990s with significant achievements, including the implementation of neural networks, the development of boosting algorithms, and increased accessibility to internet-derived and digital data.Machine learning was first used in the field of landslides in the early to mid-2000s, where machine learning modeling demonstrated strong superiority in terms of arithmetic power and predictive accuracy compared to traditional physical-based models, heuristic models, and statistical models, with many machine learning models having predictive accuracies of 90% or more.A lot of popular machine learning algorithmic architectures appeared in the following years, including Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, Bernoulli-NB and Gaussian-NB, Bagging Classifier, K-Nearest Neighbors, and Gradient Boosting algorithms, etc. [19].However, to date, there is no consensus on which machine learning algorithms are "best suited" to predict landslide susceptibility areas.Many studies have shown that the prediction accuracy of data-driven landslide susceptibility modeling is affected not only by the quality of the data of landslide inventory and landslide condition factors but also by the underlying quality of the machine learning algorithms used [31].
Therefore, in order to compare the adaptability of different models in complex geological environments, we selected 17 models that are most widely used in landslide susceptibility assessment to screen the most suitable models for this study area.The characteristics of these algorithms are shown in Table 2.
Table 2. Selected algorithms and characteristics.

Random Forest Classifier
Uses many classification trees to stabilize model predictions.Each decision of a tree is further based on a randomly selected predictor, and the predictions of category assignments are determined by a majority vote of all trees.The proportion of trees predicting the existence of landslides in the set can be used as an indicator of landslide susceptibility.

Bagging Classifier
An ensemble algorithm that establishes multiple instances estimated by black boxes on a random subset of the original training set and then, aggregates these predictions to form the final prediction.

K-Neighbors Classifier
For the training set, the categories of the individual instances have been determined.During the classification process, for new instances, predictions are made through majority voting based on the categories of their K nearest neighbor training instances.

Decision Tree
An instance-based inductive learning method that can refine a tree-like classification model from a given unordered training sample.
Extra Tree A variant of Random Forest.

Gradient Boosting
This model applies the gradient descent technique to the regression tree.The principle is to treat the value of the basic learner (regression tree) in each iteration on x as the negative gradient of the loss function space on x, and the coefficient before the basic learner is treated as the step size to approximate the minimum value of the error function space.

XGBoost
Uses the boosting technique to randomly divide the initial sample set into k parts, and then divides each subset into a training set and a validation set by a 2:1 ratio to generate a decision tree.

AdaBoost
An integrated learning technology that can turn a weak learner into a strong learner with higher prediction accuracy.

Logistic Regression
A fitting method for classifying records based on the values of conditional variables to estimate the probability of an event occurring.

Linear Discriminant Analysis
Involves the projection of high-dimensional pattern samples into the space of the best discriminating vectors to extract categorical information and compress the dimensionality of the feature space.

SGDClassifier
Achieved in a "one-vs-all (OVA)" manner by combining multiple binary classifications.

Bernoulli-NB and Gaussian-NB
Based on the concept of Bayesian probability, assuming that each attribute is independent of all other attributes to obtain the probability of each feature, and using a higher probability as the prediction result.

Quadratic Discriminant Analysis
Here, the assumptions made are more stringent than those of Logistic Regression, but when these assumptions are met, discriminant analysis can be used as a useful alternative or supplement to Logistic Regression.

Passive Aggressive Classifier
An online learning algorithm used for regression and classification.Compared to Support Vector Machine, it is easy to use and works faster, but cannot provide high accuracy like Support Vector Machine.

Perceptron
A supervised learning algorithm based on binary classification that can predict whether the input represented by a digital vector belongs to a specific class.

Parameter Preprocessing
The assessment of landslide susceptibility prediction results is mainly used to test the performance and operation of landslide susceptibility prediction models subjected to various uncertainty issues and is a very important step in the modeling of landslide susceptibility prediction [3].The results are evaluated in order to prove the validity of the study; therefore, almost every paper includes a performance evaluation of landslide susceptibility prediction.Currently, there are numerous evaluation parameters in susceptibility modeling, and the widely used evaluation parameters include the following: i. ACC, which indicates the rate of correctness of all samples involved in the modeling; ii.PPV, which is the proportion of correctly predicted positive instances to the proportion of all predictions that are positive; iii.TPR, which indicates the rate of recall or the rate of checking for completeness; iv.TNR, indicating the ability to predict negative instances; v. AUC, indicating the area under the receiver operating characteristic curve (ROC); the value indicates the degree of fit and reflects the balance between sensitivity and specificity; and vi.TSA, which indicates the correctness rate of the samples that do not participate in model training, reflecting the generalization of the model.Of the above, ACC is the most widely cited.This paper also uses this method to evaluate the model.ACC = (TP + TN)/(TP + FN + FP + TN) These terms are listed in Table 3 and are defined as follows: True positive (TP): the predicted class is positive, and the prediction is consistent with the actual situation; False positive (FP): the predicted class is positive, and the prediction is opposite to the actual situation; True negative (TN): the predicted class is negative, and the prediction is consistent with the actual situation; False negative (FN): the predicted class is negative, and the prediction is opposite to the actual situation.

Collinearity Analysis of Factors
In this paper, the collinearity analysis of machine learning is used to calculate the heat map of the correlation matrix of the selected parameters (Figure 2), which is based on Seaborn Python 2.7, https://seaborn.pydata.org/generated/seaborn.heatmap.html#seaborn.heatmap(accessed on 15 October 2021).The main purpose of visual calculation is to eliminate the collinearity problem of the selected parameters.In the matrix heat map, it can be found that there is a high correlation between SR and LR, and we removed the parameters with a correlation coefficient > 0.7, as proposed by Dormann et al. [30].Therefore, in this study, we exclude the factor of surface roughness index (SR) and select the other 17 factors for model screening.
born.heatmap (accessed on 15 October 2021).The main purpose of visual calculation is to eliminate the collinearity problem of the selected parameters.In the matrix heat map, it can be found that there is a high correlation between SR and LR, and we removed the parameters with a correlation coefficient > 0.7, as proposed by Dormann et al. [30].Therefore, in this study, we exclude the factor of surface roughness index (SR) and select the other 17 factors for model screening.

Evaluation and Optimization of Models
The initial model is trained using the data in the cross-validation training data set, and then, the results of the models are ranked according to the average accuracy score (ACC) of the test data in the cross-validation dataset.Finally, the model evaluation score is generated, as shown in Figure 3.Among the evaluation results of all models, the results

Evaluation and Optimization of Models
The initial model is trained using the data in the cross-validation training data set, and then, the results of the models are ranked according to the average accuracy score (ACC) of the test data in the cross-validation dataset.Finally, the model evaluation score is generated, as shown in Figure 3.Among the evaluation results of all models, the results of the ensemble models are better than those of other models, and the calculation results of all models are greater than 0.6.Among these, RandomForestClassifier has the highest score of 0.9, followed by BaggingClassifer at 0.87.Remote Sens. 2024, 16, x FOR PEER REVIEW of the ensemble models are better than those of other models, and the calculation of all models are greater than 0.6.Among these, RandomForestClassifier has the score of 0.9, followed by BaggingClassifer at 0.87.In order to improve the accuracy of these models, this paper chooses to opti model with the top four scores in the initial prediction.The optimization method parameter grid method and grid search cross-validation method to fit the model, best super parameter through the ACC score, cross-validate the training set of th In order to improve the accuracy of these models, this paper chooses to optimize the model with the top four scores in the initial prediction.The optimization method uses the parameter grid method and grid search cross-validation method to fit the model, find the best super parameter through the ACC score, cross-validate the training set of the model 10 times according to the optimal super parameter of each model (Figure 4), and reorder the model according to the average accuracy score of the test data.After optimization, the performance of all four models has been improved (Figure 5).BaggingClassifier has the largest improvement, surpassing RandomforestClassifier as the best model.For BaggingClassifier, the ACC test data of 10 cross-verifications is 0.91, and the average AUC is 0.96.The performance of the other three models has also improved after optimization, but the extent of improvement is relatively small.After evaluation, the predicted probability of the BaggingClassifier model was finally chosen to map the landslide susceptibility of the Bailong River Basin.In order to improve the accuracy of these models, this paper chooses to optimize the model with the top four scores in the initial prediction.The optimization method uses the parameter grid method and grid search cross-validation method to fit the model, find the best super parameter through the ACC score, cross-validate the training set of the model 10 times according to the optimal super parameter of each model (Figure 4), and reorder the model according to the average accuracy score of the test data.After optimization, the performance of all four models has been improved (Figure 5).BaggingClassifier has the largest improvement, surpassing RandomforestClassifier as the best model.For Bag-gingClassifier, the ACC test data of 10 cross-verifications is 0.91, and the average AUC is 0.96.The performance of the other three models has also improved after optimization, but the extent of improvement is relatively small.After evaluation, the predicted probability of the BaggingClassifier model was finally chosen to map the landslide susceptibility of the Bailong River Basin.

Landslide Susceptibility Mapping
The results of landslide susceptibility assessment in the Bailong River Basin predicted by the BaggingClassifier model are shown in Figure 6.The prediction result of susceptibility is a probability value between 0 and 1.According to the classification principle of the natural discontinuity method, the susceptibility result is divided into five intervals: very low (0-0.08),low [0.08-0.19),moderate [0.19-0.39),high [0.39-0.88),and very high

Landslide Susceptibility Mapping
The results of landslide susceptibility assessment in the Bailong River Basin predicted by the BaggingClassifier model are shown in Figure 6.The prediction result of susceptibility is a probability value between 0 and 1.According to the classification principle of the natural discontinuity method, the susceptibility result is divided into five intervals: very low (0-0.08),low [0.08-0.19),moderate [0.19-0.39),high [0.39-0.88),and very high [0.88-1).The proportion of area occupied by each susceptibility zone within the watershed varies.According to the statistics of the area and proportion of susceptibility areas, the area ratio of the first four types of susceptibility gradually decreases in proportion to the area as the level increases.The very low susceptibility areas accounted for more than half of the total area of the region, accounting for 52.86%, and the low susceptibility areas accounted for 21.94%.The moderate and high susceptibility areas accounted for the lowest proportion in the whole basin, accounting for 8.32% and 5.88%, respectively.The area of very high susceptibility areas reached 2009.6 km 2 , accounting for 11% of the total basin area.The distribution of susceptibility in the basin shows the characteristics of a large proportion of low and moderate susceptibility areas and a concentrated distribution of high susceptibility areas.

Landslide Susceptibility Mapping
The results of landslide susceptibility assessment in the Bailong River Basin predicted by the BaggingClassifier model are shown in Figure 6.The prediction result of susceptibility is a probability value between 0 and 1.According to the classification principle of the natural discontinuity method, the susceptibility result is divided into five intervals: very low (0-0.08),low [0.08-0.19),moderate [0.19-0.39),high [0.39-0.88),and very high [0.88-1).The proportion of area occupied by each susceptibility zone within the watershed varies.According to the statistics of the area and proportion of susceptibility areas, the area ratio of the first four types of susceptibility gradually decreases in proportion to the area as the level increases.The very low susceptibility areas accounted for more than half of the total area of the region, accounting for 52.86%, and the low susceptibility areas accounted for 21.94%.The moderate and high susceptibility areas accounted for the lowest proportion in the whole basin, accounting for 8.32% and 5.88%, respectively.The area of very high susceptibility areas reached 2009.6 km 2 , accounting for 11% of the total basin area.The distribution of susceptibility in the basin shows the characteristics of a large proportion of low and moderate susceptibility areas and a concentrated distribution of high susceptibility areas.The prediction results of landslide susceptibility in the Bailong River Basin show that the low susceptibility areas in the basin are mainly located in the north and south areas of Tanchang, the south of Diebu, and the south of Wenxian, which are mainly areas with high vegetation coverage and sparse distribution of faults.The vegetation has a certain protective effect on the surface.The distribution range of high and very high landslide The prediction results of landslide susceptibility in the Bailong River Basin show that the low susceptibility areas in the basin are mainly located in the north and south areas of Tanchang, the south of Diebu, and the south of Wenxian, which are mainly areas with high vegetation coverage and sparse distribution of faults.The vegetation has a certain protective effect on the surface.The distribution range of high and very high landslide susceptibility areas is closely related to the distribution of river systems and faults in the basin.In the upstream area of Wudu, the very high susceptibility areas are mainly distributed along the main course and primary tributaries of the Bailong River.In the upstream of the Bailong River, very high susceptible areas are mainly distributed along the banks of the main course and primary tributaries of the Bailong River.In the downstream of Wudu, the high and very high susceptibility areas at the bend of the Bailong River are still distributed along both sides of the river.However, there are also large areas of very high susceptibility areas in the area north of Wenxian.Overall, the distribution of high and very high susceptibility areas in the Bailong River Basin are closely related to the fault zones and river system.In terms of spatial distribution, the high and very high susceptibility areas are closely related to the distribution of high population density and high road density areas in the Bailong River Basin (Figure 7), which poses a great challenge to disaster prevention and control in the basin.The susceptibility assessment results in this paper can be used as reference data for decision-makers and government managers in land management and allocation.susceptibility areas is closely related to the distribution of river systems and faults in the basin.In the upstream area of Wudu, the very high susceptibility areas are mainly distributed along the main course and primary tributaries of the Bailong River.In the upstream of the Bailong River, very high susceptible areas are mainly distributed along the banks of the main course and primary tributaries of the Bailong River.In the downstream of Wudu, the high and very high susceptibility areas at the bend of the Bailong River are still distributed along both sides of the river.However, there are also large areas of very high susceptibility areas in the area north of Wenxian.Overall, the distribution of high and very high susceptibility areas in the Bailong River Basin are closely related to the fault zones and river system.In terms of spatial distribution, the high and very high susceptibility areas are closely related to the distribution of high population density and high road density areas in the Bailong River Basin (Figure 7), which poses a great challenge to disaster prevention and control in the basin.The susceptibility assessment results in this paper can be used as reference data for decision-makers and government managers in land management and allocation.

On-Site Verification of Susceptibility Assessment Results
Under the existing conditions, field survey is one of the auxiliary means to verify the landslide susceptibility assessment results, and the accuracy of the evaluation results can be judged intuitively through field survey.The high and very high susceptibility areas of the evaluation results are the focus of our attention and are also the key areas for future disaster prevention, land planning, and management.Therefore, we chose to investigate and verify the high and very high susceptibility zones of this susceptibility assessment result.
Considering that the study area is mainly controlled by fracture zones, whether the evaluation results can perform well along the fracture zones is an important test criterion.Therefore, in this paper, we choose the Zhouqu area located in the middle reaches of the Bailong River Basin as a susceptibility validation area (Figure 8a).This area is controlled by fracture zones with dramatic topographic relief, broken rock units, and extremely developed landslides, and has experienced serious landslide disasters in the historical period.The high and very high susceptibility areas in the susceptibility assessment results include landslides that have occurred and slopes with destabilization potential, and the validation of the two types of slopes shows that serious landslides have occurred in almost all of the high and very high susceptibility zones and that famous landslides in the study area, such as Suoertou landslide (Figure 8c), Xieliupo landslide (Figure 8d), Yahuokou landslide (Figure 8e), Lijie landslide (Figure 8i), etc., are in the very high susceptibility zones, and they are accompanied by rock fragmentation and local destabilization.This indicates that the susceptibility evaluation results obtained in this paper using machine learning are in line with the actual situation, have high credibility, and can be used as

On-Site Verification of Susceptibility Assessment Results
Under the existing conditions, field survey is one of the auxiliary means to verify the landslide susceptibility assessment results, and the accuracy of the evaluation results can be judged intuitively through field survey.The high and very high susceptibility areas of the evaluation results are the focus of our attention and are also the key areas for future disaster prevention, land planning, and management.Therefore, we chose to investigate and verify the high and very high susceptibility zones of this susceptibility assessment result.
Considering that the study area is mainly controlled by fracture zones, whether the evaluation results can perform well along the fracture zones is an important test criterion.Therefore, in this paper, we choose the Zhouqu area located in the middle reaches of the Bailong River Basin as a susceptibility validation area (Figure 8a).This area is controlled by fracture zones with dramatic topographic relief, broken rock units, and extremely developed landslides, and has experienced serious landslide disasters in the historical period.The high and very high susceptibility areas in the susceptibility assessment results include landslides that have occurred and slopes with destabilization potential, and the validation of the two types of slopes shows that serious landslides have occurred in almost all of the high and very high susceptibility zones and that famous landslides in the study area, such as Suoertou landslide (Figure 8c), Xieliupo landslide (Figure 8d), Yahuokou landslide (Figure 8e), Lijie landslide (Figure 8i), etc., are in the very high susceptibility zones, and they are accompanied by rock fragmentation and local destabilization.This indicates that the susceptibility evaluation results obtained in this paper using machine learning are in line with the actual situation, have high credibility, and can be used as reference data for future disaster prevention and mitigation planning and land management.
reference data for future disaster prevention and mitigation planning and land management.

Discussion
One of the advantages of machine learning modeling is that the interpretability of the model can be used to quantitatively evaluate the importance of the selected factors [13,32].In this way, we can better understand the importance of various factors in landslide development and thus, take targeted prevention and control measures.The calculated importance weights of all factors are shown in Figure 9.There is no doubt that each factor plays a role in landslide development.Among all the factors, the distance to a road has the highest degree of impact, while the impact of land use type has the lowest effect.The impact of engineering construction on landslide development is unexpected.The impact of human activities on geological hazards in the Bailong River Basin has always been a concern for researchers.Engineering excavation will cause the readjustment of slope stress structures, which will seriously affect the stability of slopes [7,33].In many studies of landslide susceptibility assessment, despite the repeated emphasis on the important influence of human activities, roads are not viewed as the most critical influence factor compared to the importance of condition factors such as slope, elevation, rainfall, and lithology [3,34], which is significantly different from the results of this paper.However, if the background of the Bailong River Basin is understood, roads may be acceptable as the factor most closely associated with landslide development.We regard that the importance of roads may be overestimated in the landslide susceptibility assessment of this area, and we did not consider that the dynamic process of the factors may be the reason for the overestimation or underestimation of the assessment results of some factors.Old and ancient landslides in the Bailong River Basin account for 73% of the total number of

Discussion
One of the advantages of machine learning modeling is that the interpretability of the model can be used to quantitatively evaluate the importance of the selected factors [13,32].In this way, we can better understand the importance of various factors in landslide development and thus, take targeted prevention and control measures.The calculated importance weights of all factors are shown in Figure 9.There is no doubt that each factor plays a role in landslide development.Among all the factors, the distance to a road has the highest degree of impact, while the impact of land use type has the lowest effect.The impact of engineering construction on landslide development is unexpected.The impact of human activities on geological hazards in the Bailong River Basin has always been a concern for researchers.Engineering excavation will cause the readjustment of slope stress structures, which will seriously affect the stability of slopes [7,33].In many studies of landslide susceptibility assessment, despite the repeated emphasis on the important influence of human activities, roads are not viewed as the most critical influence factor compared to the importance of condition factors such as slope, elevation, rainfall, and lithology [3,34], which is significantly different from the results of this paper.However, if the background of the Bailong River Basin is understood, roads may be acceptable as the factor most closely associated with landslide development.We regard that the importance of roads may be overestimated in the landslide susceptibility assessment of this area, and we did not consider that the dynamic process of the factors may be the reason for the overestimation or underestimation of the assessment results of some factors.Old and ancient landslides in the Bailong River Basin account for 73% of the total number of landslides, and the formation time of these landslides may be far older than the activity history in this region, which dates from the beginning of the human industrial age.Therefore, the formation of these old and ancient landslides is not the result of human activities.As for the high value for the influence of DTR, we believe that the main reason for this is that the combination of complex topography, limited land resources, and rapid population expansion forced most residents to migrate to and settle on high hillslope areas, and these places of settlement and production activities are generally sites of large-scale ancient landslides and old landslide deposits in the historical period.In recent years, almost all villages have built hardened roads.Therefore, the importance of DTR in the factor evaluation is actually a feedback effect of landslide-based geomorphological processes on human activities, and landslide hazards bring risks and scarce land resources at the same time.Therefore, how to consider the dynamic connection between the historical landslide formation process and the condition factor is a key component to continuing to improve the reliability of the assessment results in the future.Nevertheless, the results of this paper are still of great revelation, because large landslides in the Bailong River Basin are predominantly ancient landslides and old landslides, and there is a potential risk of landslide reactivation under the influence of human activities, rainfall, and other factors.In fact, the reactivation of landslides has already caused serious consequences, and the likelihood of the occurrence of risk is even greater considering the close correlation that exists between roads and landslides.Therefore, there is a need for a systematic assessment and management of landslide and road relationships in order to avoid and mitigate the potential risks created by landslide utilization.landslides, and the formation time of these landslides may be far older than the activit history in this region, which dates from the beginning of the human industrial age.There fore, the formation of these old and ancient landslides is not the result of human activities As for the high value for the influence of DTR, we believe that the main reason for this i that the combination of complex topography, limited land resources, and rapid popula tion expansion forced most residents to migrate to and settle on high hillslope areas, an these places of settlement and production activities are generally sites of large-scale an cient landslides and old landslide deposits in the historical period.In recent years, almos all villages have built hardened roads.Therefore, the importance of DTR in the facto evaluation is actually a feedback effect of landslide-based geomorphological processes o human activities, and landslide hazards bring risks and scarce land resources at the sam time.Therefore, how to consider the dynamic connection between the historical landslid formation process and the condition factor is a key component to continuing to improv the reliability of the assessment results in the future.Nevertheless, the results of this pape are still of great revelation, because large landslides in the Bailong River Basin are pre dominantly ancient landslides and old landslides, and there is a potential risk of landslid reactivation under the influence of human activities, rainfall, and other factors.In fact, th reactivation of landslides has already caused serious consequences, and the likelihood o the occurrence of risk is even greater considering the close correlation that exists betwee roads and landslides.Therefore, there is a need for a systematic assessment and manage ment of landslide and road relationships in order to avoid and mitigate the potential risk created by landslide utilization.NDVI is the second most important factor affecting landslide susceptibility with a 9% contribution.The statistical results show that in areas with vegetation coverage of les than 50% (Figure 10b), the proportion of landslide area is higher than that of non-landslid areas, while in areas with vegetation coverage of more than 50%, the proportion of non landslide areas is significantly higher than that of landslide areas.This result clearl demonstrates the constraining effect of vegetation in the process of landslide develop ment.The grid structure formed by vegetation roots is particularly effective in controllin shallow landslides [35].Considering the explosive growth of geological disasters cause by human damage to forest resources and the ecological environment in the region i recent decades, ecological environment treatment and returning farmland to forest can b considered the priority choice for constraining geological disasters in the Bailong Rive Basin.The influence of faults on landslide development ranked fourth, accounting for 8% NDVI is the second most important factor affecting landslide susceptibility with a 9% contribution.The statistical results show that in areas with vegetation coverage of less than 50% (Figure 10b), the proportion of landslide area is higher than that of nonlandslide areas, while in areas with vegetation coverage of more than 50%, the proportion of non-landslide areas is significantly higher than that of landslide areas.This result clearly demonstrates the constraining effect of vegetation in the process of landslide development.The grid structure formed by vegetation roots is particularly effective in controlling shallow landslides [35].Considering the explosive growth of geological disasters caused by human damage to forest resources and the ecological environment in the region in recent decades, ecological environment treatment and returning farmland to forest can be considered the priority choice for constraining geological disasters in the Bailong River Basin.The influence of faults on landslide development ranked fourth, accounting for 8% of the weight.It should be specified that the influence of fault zones on landslide development cannot be completely replaced by faults, including the influence of lithological fracture and crustal uplift.From the analysis of the results, it can be seen that the proportion of landslide areas is significantly higher than that of non-landslide areas within 2000 m of a fault (Figure 10d).The influence of annual rainfall on the distribution of landslides is special.The distribution of landslides is concentrated in areas with annual rainfall in the range of 300 mm to 700 mm.Paradoxically, the proportion of landslides in low-rainfall areas is higher than that in high-rainfall areas (Figure 10e).There are three main reasons for this phenomenon: i. the rainfall in the Bailong River Basin is greatest in the Bikou area in the south and Diebu area in the north, however, as a stable craton block, the Bikou block experiences no violent tectonic activity, and very few landslides develop there, whereas the Diebu region has mostly tectonically formed ancient landslides with a lower density of landslides; ii.ecological damage reduces the stability threshold of the hillslope, which can induce a landslide under the condition of low rainfall, and ecological damage in the Bailong River Basin is concentrated in the area from Zhouqu to Wudu; thus the critical threshold of landslide occurrence in this area is low; iii. the section from Zhouqu to Wudu within the Bailong River Basin has a dry-hot valley climate, with concentrated rainfall, conditions that are more likely to induce landslides.However, it must also be acknowledged that the low resolution of rainfall data may also be the reason for this phenomenon.The influence of distances to a river is similar to that of distance to a fault in that the range of within 2000 m of a river is a high incidence area for landslides (Figure 10f), and in addition to erosion of the foot of the slope, the transport of material is necessary for the continuation of landslides.
of the weight.It should be specified that the influence of fault zones on landslide devel opment cannot be completely replaced by faults, including the influence of lithologica fracture and crustal uplift.From the analysis of the results, it can be seen that the propor tion of landslide areas is significantly higher than that of non-landslide areas within 2000 m of a fault (Figure 10d).The influence of annual rainfall on the distribution of landslides is special.The distribution of landslides is concentrated in areas with annual rainfall in the range of 300 mm to 700 mm.Paradoxically, the proportion of landslides in low-rainfal areas is higher than that in high-rainfall areas (Figure 10e).There are three main reasons for this phenomenon: i. the rainfall in the Bailong River Basin is greatest in the Bikou area in the south and Diebu area in the north, however, as a stable craton block, the Bikou block experiences no violent tectonic activity, and very few landslides develop there, whereas the Diebu region has mostly tectonically formed ancient landslides with a lower density of landslides; ii.ecological damage reduces the stability threshold of the hillslope, which can induce a landslide under the condition of low rainfall, and ecological damage in the Bailong River Basin is concentrated in the area from Zhouqu to Wudu; thus the critica threshold of landslide occurrence in this area is low; iii. the section from Zhouqu to Wudu within the Bailong River Basin has a dry-hot valley climate, with concentrated rainfall conditions that are more likely to induce landslides.However, it must also be acknowl edged that the low resolution of rainfall data may also be the reason for this phenomenon The influence of distances to a river is similar to that of distance to a fault in that the range of within 2000 m of a river is a high incidence area for landslides (Figure 10f), and in ad dition to erosion of the foot of the slope, the transport of material is necessary for the con tinuation of landslides.

Conclusions
The serious geological disasters that have occurred in the eastern margin of the Ti betan Plateau have made government managers and decision-makers face great chal lenges in urban planning, land management, ecological governance, and disaster preven tion.The susceptibility assessment of landslide disasters as a quantitative and effective analysis result has a certain reference value.As the representative achievements of the 21st-century technological revolution, machine learning and artificial intelligence are

Conclusions
The serious geological disasters that have occurred in the eastern margin of the Tibetan Plateau have made government managers and decision-makers face great challenges in urban planning, land management, ecological governance, and disaster prevention.The susceptibility assessment of landslide disasters as a quantitative and effective analysis result has a certain reference value.As the representative achievements of the 21st-century technological revolution, machine learning and artificial intelligence are becoming the dominant technologies leading the progress of human civilization.Artificial intelligence has shown great advantages in all professions.How to apply artificial intelligence in the field of geological disaster prevention has always been a problem for practitioners.In this paper, the Bailong River Basin on the eastern margin of the Tibetan Plateau is taken as a typical study area, and 17 kinds of machine learning algorithms are selected for susceptibility assessment modeling, evaluation, and optimization.The results show that the BaggingClassifier and RandomforestClassifier models perform well, and the accuracy of validation is more than 90%.Using BaggingClassifier to generate predictions of Bailong River Basin landslide susceptibility, the findings are applied to divide the basin into very low, low, moderate, high, and very high susceptibility areas, accounting for 52.86%, 21.94%, 8.32%, 5.88%, and 11% of the basin, respectively.The high and very high landslide susceptibility areas are mainly distributed in the middle and lower reaches of the Bailong River from Zhouqu to Wenxian, close to the river banks, and are areas with high population density and road density in the basin, which indicates that the risk of landslide in the Bailong River Basin is high, and there is the possibility of river blocking, directly causing casualties and property losses.The interpretability of machine learning permits us to quantitatively assess the importance of the influencing factors, and the results of the quantitative assessment of the factors show that the six factors of distance to roads, NDVI, elevation, distance to faults, average annual rainfall, and distance to rivers have the greatest influence on landslide susceptibility, while geomorphologically related factors have a lesser combined influence on landslides.

Figure 1 .
Figure 1.The distribution of historical landslides in the Bailong River Basin.

Figure 1 .
Figure 1.The distribution of historical landslides in the Bailong River Basin.

Figure 3 .
Figure 3. Ranking of model accuracy scores.

Figure 3 .
Figure 3. Ranking of model accuracy scores.

Figure 3 .
Figure 3. Ranking of model accuracy scores.

Figure 5 .
Figure 5. Ranking of ACC scores after model optimization.

Figure 6 .
Figure 6.Distribution map of landslide susceptibility assessment in Bailong River Basin.

Figure 6 .
Figure 6.Distribution map of landslide susceptibility assessment in Bailong River Basin.

Figure 7 .
Figure 7. Statistical relationship between road density and population density in different susceptibility areas.

Figure 7 .
Figure 7. Statistical relationship between road density and population density in different susceptibility areas.

Figure 9 .
Figure 9. Quantitative evaluation results of importance of influencing factors.

Figure 9 .
Figure 9. Quantitative evaluation results of importance of influencing factors.

Table 1 .
Fields and characteristics of the spatial database.