Landslide Susceptibility Mapping Based on Deep Learning Algorithms Using Information Value Analysis Optimization

: Selecting samples with non-landslide attributes signiﬁcantly impacts the deep-learning modeling of landslide susceptibility mapping. This study presents a method of information value analysis in order to optimize the selection of negative samples used for machine learning. Recurrent neural network (RNN) has a memory function, so when using an RNN for landslide susceptibility mapping purposes, the input order of the landslide-inﬂuencing factors affects the resulting quality of the model. The information value analysis calculates the landslide-inﬂuencing factors, determines the input order of data based on the importance of any speciﬁc factor in determining the landslide susceptibility, and improves the prediction potential of recurrent neural networks. The simple recurrent unit (SRU), a newly proposed variant of the recurrent neural network, is characterized by possessing a faster processing speed and currently has less application history in landslide susceptibility mapping. This study used recurrent neural networks optimized by information value analysis for landslide susceptibility mapping in Xinhui District, Jiangmen City, Guangdong Province, China. Four models were constructed: the RNN model with optimized negative sample selection, the SRU model with optimized negative sample selection, the RNN model, and the SRU model. The results show that the RNN model with optimized negative sample selection has the best performance in terms of AUC value (0.9280), followed by the SRU model with optimized negative sample selection (0.9057), the RNN model (0.7277), and the SRU model (0.6355). In addition, several objective measures of accuracy (0.8598), recall (0.8302), F1 score (0.8544), Matthews correlation coefﬁcient (0.7206), and the receiver operating characteristic also show that the RNN model performs the best. Therefore, the information value analysis can be used to optimize negative sample selection in landslide sensitivity mapping in order to improve the model’s performance; second, SRU is a weaker method than RNN in terms of model performance.


Introduction
Faced with current human societal challenges, it is more important than ever for geoscientists to use their understanding of the earth to benefit the society [1]. The most notable development in the field of mathematical geoscience in the last decade has been the introduction of big data and artificial intelligence algorithms. The ability of machine learning (ML) algorithms to handle nonlinear problems has tremendous advantages in dealing with complex geoscience problems [2][3][4]. As a result, ML is now being fully utilized in geoscience fields. For example, Wang et al. used unsupervised ML algorithms to identify multielement geochemical anomalies [5], and Yu et al. used hierarchical clustering, Land 2023, 12, 1125 2 of 22 singularity mapping, and the Kohonen neural network to identify Ag-Au-Pb-Zn polymetallic mineralization-associated geochemical anomalies [6]. In general, we are primarily focused on geological events that have a significant impact but occur infrequently, such as earthquakes, typhoons, vein formation, and landslides.
Landslides are natural disasters that pose a serious risk to human lives and property and represent one of the most destructive categories of natural disasters that occur globally [7]. Mountainous areas are especially affected by landslides, whose controlling mechanisms are the complex geological and geographical conditions present in that landscape. Seventy percent of China's area is mountainous, providing favorable conditions for landslide occurrences, resulting in casualties and considerable economic losses [8][9][10][11]. As a consequence, landslide susceptibility mapping (LSM), which can analyze possible spatial areas for landslide occurrence, is an effective technique for land managers to mitigate the effects of landslides [12,13].
Machine learning is a subdivision of artificial intelligence (AI) that uses computer technologies to analyze and predict information by learning from the training dataset. A variety of ML methods have been used for LSM, including Bayesian networks, decision trees, support vector machines, random forests, and artificial networks [14][15][16][17][18]. It is to be noted that in recent years, in the implementation and development of natural hazard modelling, researchers have begun to consider the use of hybrid models. Hybrid models combine individual models with metaheuristic algorithms, allowing the hybrid model to eliminate the weak points inherent to the individual models to obtain more accurate results. For example, adaptive neuro-fuzzy system-gradient-based optimization (ANFIS-GBO) is applied to the spatial modelling of flood hazards [19]; cuckoo optimization algorithmmulti-layer perceptron (COA-MLP) and SailFish optimizer-multi-layer perceptron (SFO-MLP) approaches are applied to the landslide susceptibility assessment [20]; and ANFIS integrated three optimization algorithms (ant colony optimization (ACO), genetic algorithm (GA), and particle swarm optimization (PSO)) applied to flood susceptibility maps [21]. A variety of machine learning and deep learning models have been used to improve the accuracy of LSM. In recent years, to obtain better deep learning and machine learning models, researchers have adopted a variety of improved methods, such as the deep-learning optimization algorithm [22], the hybrid ensemble-based deep-learning framework [23], and the class-weighted algorithm combined with ML models [24].
Deep learning models have been increasingly applied in the modeling of environmental variables, such as environmental remote sensing [25], PM2.5 prediction [26], and water temperature prediction [27]. Recurrent neural networks (RNNs) are a specific kind of neural network that not only considers the previous moment's input but also gives the network a "memory" function for the previous content. Based on this unique function of the RNN approach, the order of data input will affect the model's effectiveness. Exploring a sequential data representation method can take advantage of the memory function of RNNs, which allows for thorough exploration of the prediction potential of RNNs. RNNs have been applied to LSM. Thi Ngo et al. applied RNN and CNN techniques for an LSM of Iran at the national scale [28]. Liming Xiao et al. used long short-term memory (LSTM) to predict landslide susceptibility along the China-Nepal Highway [29]. The common variants of RNNs are LSTM [30] and gated recurrent units (GRUs) [31]. Recently, a simple recurrent unit (SRU) was proposed as a new RNN variant that has a faster processing speed than the LSTM and GRU approaches. The use of the new RNN variant, using an SRU, has less application in LSM, and its specific performance in LSM should be further studied.
Traditional binary classifiers for machine learning usually require two sets of samples with corresponding labels, including positive and negative samples [32]. There are often imperfect cases in the practical applications, however, most commonly manifesting when only positive and unlabeled samples are used in the training dataset. For non-landslide samples, there still needs to be a specific definition and a reasonable method to obtain them. In general, the study area is divided into landslide and non-landslide areas. Furthermore, samples from non-landslide areas can be drawn randomly from non-landslide areas. These unlabeled samples cannot be directly considered negative samples, because the areas of these samples are likely to be the only areas where disasters have not yet occurred [33]. At present, the issue of non-landslide sample selection has received some attention. Yang et al. [34] used Bayesian optimization algorithms to optimize the proportion of landslide samples. Chang et al. [35] selected non-landslide samples multiple times and investigated the uncertainty of non-landslide sample selection. Huang et al. [36] selected the non-landslide samples from the non-landslide area with a low landslide susceptibility level based on a semi-supervised multiple-layer perceptron model. Overall, there is no universally accepted method for optimizing non-landslide sample selection due to the differences in study areas and the logic and mechanisms behind different algorithms, which need to be studied thoroughly.
Therefore, the main innovation of this study is to optimize the selection of negative samples using information value analysis. Information value analysis determines the input order of the data by calculating the influence factors and fully explores the prediction potential of RNNs with memory function. In addition, SRU has been less studied on LSM, and both RNN and SRU models are constructed to explore the prediction performance of SRU through a comparative study.  (Figure 1). The land area of the region contains 1354.71 square kilometers. Mountainous areas are distributed in the northwest and southwest of the district, accounting for 35.84% of the total area of the region. Plains are distributed in the southeastern, south-central and west-central parts of the district, accounting for 43.53% of the total area of the district. The region's waters account for 20.63% of the total area of the region. Xinhui has a southern subtropical maritime monsoon climate, abundant rainfall, sufficient sunshine, and mild and humid conditions year round. The average annual temperature is 22.4 • C, with the highest and lowest historical temperatures of 38.3 • C and 0.1 • C, respectively. The annual average precipitation is 1808.3 mm. The precipitation is concentrated from April through September. The average annual sunshine hours are 1734.1 h.

Study Area
The list of landslides used in this paper, completed by the Guangdong Geological Survey Institute, consists of 178 landslides and locations of high-risk points (Figure 1), of which the landslide samples occurred from 2017 to 2020. Most of the landslides are classified as sliding landslides. All the landslides in this study can be classified as moderate (400-1000 m 2 ) and small (<400 m 2 ). In addition, there are rock landslides and earth landslides. According to the report, these landslides were triggered by rainfall events that occurred after anthropogenic activity.

Datasets
Heckmann et al. [37] stated that the increase in the samples accounted for has had a positive impact on the LSM and has increased the model's effectiveness. However, the training samples used for LSM are insufficient in many cases. To solve this problem, we collaborated with geologists to collect historical landslide points and locations with significant potential for landslides throughout the whole region, totaling 178 points. We used these points as samples to improve the effectiveness of the model.
In this study, 15 landslide influencing factors were considered, including elevation, slope, aspect, plan curvature, profile curvature, degree of relief, land use, rock type, topographic wetness index (TWI), terrain ruggedness index (TRI), topographic position index (TPI), normalized difference vegetation index (NDVI) on 15 April 2014, distance to faults, distance to rivers, and distance to roads. Detailed information on the landslide influencing factors is shown in Table 1. The following describes the preparation for each influencing factor. In this study, 15 landslide influencing factors were considered, including elevation, slope, aspect, plan curvature, profile curvature, degree of relief, land use, rock type, topographic wetness index (TWI), terrain ruggedness index (TRI), topographic position index (TPI), normalized difference vegetation index (NDVI) on 15 April 2014, distance to faults, distance to rivers, and distance to roads. Detailed information on the landslide influencing factors is shown in Table 1. The following describes the preparation for each influencing factor. The elevation, slope, aspect, plan curvature, profile curvature and degree of relief were extracted from a digital elevation model (DEM) obtained from the Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (AS-TER GDEM V2) (http://www.gscloud.cn, accessed on 11 March 2021). Slope, aspect, plan curvature, profile curvature, and degree of relief were calculated in the MapGIS 10.2 software. The TWI and TPI were generated by the SAGA 6.1 software. The distance to roads The elevation, slope, aspect, plan curvature, profile curvature and degree of relief were extracted from a digital elevation model (DEM) obtained from the Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM V2) (http://www.gscloud.cn, accessed on 11 March 2021). Slope, aspect, plan curvature, profile curvature, and degree of relief were calculated in the MapGIS 10.2 software. The TWI and TPI were generated by the SAGA 6.1 software. The distance to roads and the distance to rivers were produced by ArcGIS based on topographic maps at a scale of 1:50,000. The distance to faults was produced by ArcGIS based on engineering geological maps at a scale of 1:50,000. We obtained NDVI data for the study area from the USGS (https://earthexplorer.usgs.gov, accessed on 20 March 2021). Land use data and rock type data were provided by the collaboration with geologists. All factors were converted into a raster form with a spatial resolution of 20 m. The descriptions of these factors are shown in Table 1. Figure 2 shows the spatial distribution of these factors. and the distance to rivers were produced by ArcGIS based on topographic maps at a scale of 1:50,000. The distance to faults was produced by ArcGIS based on engineering geological maps at a scale of 1:50,000. We obtained NDVI data for the study area from the USGS (https://earthexplorer.usgs.gov, accessed on 20 March 2021). Land use data and rock type data were provided by the collaboration with geologists. All factors were converted into a raster form with a spatial resolution of 20 m. The descriptions of these factors are shown in Table 1. Figure 2 shows the spatial distribution of these factors.    Figure 3 shows the process diagram used in this study. There are six steps in this process: (1) selecting the landslide influencing factors, (2) selecting typical negative samples and representing landslide data in series based on the information values (IVs), (3) preparing both the training and testing datasets by random partitioning, (4) constructing RNN and SRU models, (5) evaluating and comparing the landslide models, and (6) constructing a landslide susceptibility map.

Information Value Analysis
Information value analysis is a data exploration technique that helps determine which columns in a dataset have predictive power or influence on the value of a specified dependent variable. Information value is a very useful concept for variable selection during model building. The roots of the IVs are in the information theory that was proposed by Claude Shannon [38,39]. The IV analysis is a popular tool in the banking and bond ratings fields [40,41]. The effectiveness of landslide models can be enhanced by introducing IV into the processing of landslide factors for LSM. The correlation coefficient can be calculated as follows: where n 1 is the total number of landslide rasters, n 0 is the total number of non-landslide rasters, n i1 is the number of landslide rasters of class x i for variable x, and n i0 is the number of non-landslide rasters of class x i for variable x. In practice, the standard rule of using the IVs is shown in the Table 2.
where n1 is the total number of landslide rasters, n0 is the total number of non-landslide rasters, ni1 is the number of landslide rasters of class xi for variable x, and ni0 is the number of non-landslide rasters of class xi for variable x. In practice, the standard rule of using the IVs is shown in the Table 2.   In traditional neural network models, the layers are fully connected from the input layer to the hidden layer to the output layer, and the nodes between each layer are unconnected [42,43]. Recurrent neural networks (RNNs) are a class of Artificial Neural Networks (ANNs), and RNNs are intended to be used to process sequential data ( Figure 4). Specifically, the network remembers the previous information input and then applies it to the calculation of the current output. The nodes between the hidden layers are no longer connectionless but connected, and the input of the hidden layers includes not only the output of the input layer but also the output of the hidden layer at the previous moment. works (ANNs), and RNNs are intended to be used to process sequential data (F Specifically, the network remembers the previous information input and then app the calculation of the current output. The nodes between the hidden layers are n connectionless but connected, and the input of the hidden layers includes not output of the input layer but also the output of the hidden layer at the previous m Traditional recurrent neural networks are often implemented using Elman n or Jordan networks, both of which are similar and are three-layer networks. Th network and the Jordan network are also known as "simple recurrent networks [44,45]. Let , , and ℎ be the input vector, the output vector, and the hidden la tor, then we can obtain Traditional recurrent neural networks are often implemented using Elman networks or Jordan networks, both of which are similar and are three-layer networks. The Elman network and the Jordan network are also known as "simple recurrent networks" (SRN) [44,45]. Let x t , y t , and h t be the input vector, the output vector, and the hidden layer vector, then we can obtain where U and W are parameter matrices, b is the bias vector, and σ h and σ y are activation functions.

Simple Recurrent Unit
The SRU is a variant of the recently proposed RNN, and the SRU and the related work aim to propose and explore simple, fast, and more explanatory RNNs (Figure 4) [46]. Compared to other RNN variants, such as LSTM and GRU, SRU can achieve faster training speeds due to its designed structure. Figure 5 shows the basic structure of the SRU. The SRU is built on the same "gate" structure as the LSTM and GRU, but the difference is that SRU removes the limitation of parallelization of that LSTM and GRU adhere to, resulting in a much faster processing speed. The SRU has two components: "light recurrence" and "high network". Let x t , f t , C t , r t , and h t be the input vector, the forget gate vector, the current state from light recurrence, the reset gate vector, and the hidden layer vector. The light recurrence can be summarized as Equations (5)-(7), and the high network can be summarized as Equations (8) and (9).
where W and b are the parameter matrices. The value is the pointwise multiplication operation [47].
work aim to propose and explore simple, fast, and more explanatory RNNs (Figure 4) [46]. Compared to other RNN variants, such as LSTM and GRU, SRU can achieve faster training speeds due to its designed structure. Figure 5 shows the basic structure of the SRU. The SRU is built on the same "gate" structure as the LSTM and GRU, but the difference is that SRU removes the limitation of parallelization of that LSTM and GRU adhere to, resulting in a much faster processing speed. The SRU has two components: "light recurrence" and "high network." Let , , , , and ℎ be the input vector, the forget gate vector, the current state from light recurrence, the reset gate vector, and the hidden layer vector. The light recurrence can be summarized as Equations (5)-(7), and the high network can be summarized as Equations (8) and (9).
where and are the parameter matrices. The value ⊙ is the pointwise multiplication operation [47].

Selection of Landslide Influencing Factor
For LSM models, inputting more data does not necessarily result in a better model, as too much redundancy in the influencing factors considered will reduce the model's predictive capability [48]. Therefore, it is crucial to correctly select the landslide influencing factors [49]. The IV analysis method has been described above, and Table 3 shows the analysis of these influencing factors using Equations (1) and (2).   Table 2 shows the standard rule of using the IV analysis. All IVs are higher than 0.02, indicating that all influencing factors have certain predictive power for the occurrence of landslides. Based on the above results, the TRI has the highest IV of 0.8827, indicating that it may be the dominant factor, and most of the other factors are between 0.1 and 0.4, proving that they also have a positive correlation with the landslide occurrence.

Factor Importance Ranking
From the above introduction of the architecture of RNNs and SRUs, it is clear that RNNs are effective in processing data that have sequential properties due to their special recurrent hidden states. Therefore, constructing models using RNNs should consider the problem of data redundancy and the input sequence of data. In this study, we propose a landslide data representation of RNNs, as shown in Figure 5. According to the results in Section 3.3, first, the IVs of all the influencing factors are arranged in a descending order, and then the influencing factors are ranked via their level of importance. Then, each pixel in the study area is converted into a continuous sample. Thus, the data are the input into the model in the previously ranked order of importance. Due to the special architecture inherent to RNNs, the previous input data are related to the latter input, and the key information of each influencing factor that induces landslides is passed along the next hidden state.

Selection of Negative Sample
Landslides are geological events that occur infrequently but are hazardous to our society, and we can further define landslides as being rare events [50]. Identifying classes of rare events and representing them from a large quantity of data are challenging due to the insufficient number of positive samples and the absence of negative samples [51]. The lack of positive samples has been improved by adding the risk points above, and in this section, negative samples are selected by the weight of the evidence (WOE) method.
The WOE is calculated by Equation (1), from which it can be seen that the difference between the ratio of the number of landslides contained in the current class to the number of all landslide occurrences and the ratio of the number of non-landslide samples contained in the current class to the number of all non-landslide samples in this study is the logarithm of the two ratios. The larger the WOE is, the greater the probability of landslide events happening for the pixels belonging to this interval, and the opposite relation results in the probability of landslides being smaller.
To obtain the area for selecting the negative samples, the WOEs of the 15 influencing factors for all pixels were summed and averaged in order to obtain a WOE map of the study area, and then the region was divided into two areas: positive WOE and negative WOE ( Figure 6). To verify the effectiveness of this method, two groups of negative samples were selected: one group was randomly selected in the area of negative WOE region, and the other group was randomly selected directly in the study area. The number of negative samples in both groups was the same as the number of positive samples. The WOE is calculated by Equation (1), from which it can be seen that the difference between the ratio of the number of landslides contained in the current class to the number of all landslide occurrences and the ratio of the number of non-landslide samples contained in the current class to the number of all non-landslide samples in this study is the logarithm of the two ratios. The larger the WOE is, the greater the probability of landslide events happening for the pixels belonging to this interval, and the opposite relation results in the probability of landslides being smaller.
To obtain the area for selecting the negative samples, the WOEs of the 15 influencing factors for all pixels were summed and averaged in order to obtain a WOE map of the study area, and then the region was divided into two areas: positive WOE and negative WOE ( Figure 6). To verify the effectiveness of this method, two groups of negative samples were selected: one group was randomly selected in the area of negative WOE region, and the other group was randomly selected directly in the study area. The number of negative samples in both groups was the same as the number of positive samples.

Evaluation and Comparison of Models
The validation of model strength or weakness is a key condition for assessing model performance. The fitting accuracy has been considered a significant feature and is obtained by comparing the model predictions with the true values in the training dataset. The analysis and evaluation of models using the receiver operating characteristic (ROC)

Evaluation and Comparison of Models
The validation of model strength or weakness is a key condition for assessing model performance. The fitting accuracy has been considered a significant feature and is obtained by comparing the model predictions with the true values in the training dataset. The analysis and evaluation of models using the receiver operating characteristic (ROC) curves are common in many related studies. The ROC curve is plotted by including the statistical index values of the false-positive and true-positive ratios. The area under the ROC curve (AUC) represents the model's predicted value. The AUC values range between 0.5 and 1.0, with larger areas indicating a better spatial prediction performance of the model [52]. Statistical indicators such as accuracy (ACC), Matthews correlation coefficient (MCC), F1 score, and recall are added to evaluate the predictive ability of the model, and these are calculated as follows [53][54][55]: where TP and TN represent true positives and true negatives, and FP and FN denote false positives and false negatives, respectively. The values of ACC, recall, and F1 score range between 0 and 1. MCC ranges between −1 and 1. The higher the ACC, F1, and MCC values, the better the predictive ability of the model. The process of constructing the training and testing datasets is as follows: both of our datasets include 178 positive samples and 178 negative samples in order to construct the training and validation sets for the ML process; 70% of the positive samples (124) and negative samples (124) are used for training, and the remaining 30% (54 and 54) are used for testing. After training and testing the models, four machine learning models were evaluated using five criteria: AUC, ACC, MCC, F1 score, and recall. Table 4 shows the performance of the models. To verify that the method can work across data, we used the five-fold cross-validation, and Table 5 shows the averages of the statistical metrics of the five-fold cross-validation. The results show that the performance of the RNN model and SRU model are higher than that of the RNN_random model and SRU_random model in all four statistical metrics, indicating that the dataset constructed with negative samples selected by information value analysis model fitting performance is significantly higher than that of the dataset with randomly selected negative samples. Regarding the ACC, the RNN model performs the best and achieves its highest ACC of 0.8598, which is over 0.0748 higher than that of the SRU (0.7850). The RNN model also achieves the highest MCC and F1 score (0.7206, 0.8544), which are 0.1257 and 0.0445 higher than those of the SRU model. In addition, it can be seen that the ML models trained with the IV analysis dataset outperform the ML model trained with the randomly selected negative samples dataset in terms of the RNN and SRU. This is evidenced by the fact that all statistical indicators for the ML models trained with the information value analysis dataset are greater than the ML model trained with the randomly selected negative samples dataset by more than 0.2. Figure 7 plots the ROC curves of the four models. It can be seen that the AUC values of both the RNN model and the SRU model are above 0.90. In contrast, the AUC values of both the RNN_random model and the SRU_random model are low, indicating that the RNN and SRU techniques combined with the information value analysis show excellent predictive power for LSM. In addition, the RNN model achieves the highest AUC value (0.928), which is superior to the other models. Figure 8 shows the accuracy and loss curves of four models, which are used to check the robustness of the results. When the model is optimized to the most stable level, the curves are presented as follows: as the epoch increases, the two accuracy curves gradually increase and level off; the two loss curves gradually decrease and level off (the loss curve of the training set decreases and the loss curve of the test set increases, indicating that the model may have an overfitting problem). All four models are optimized to the most robust level without overfitting problems. trained with the information value analysis dataset are greater than the ML model trained with the randomly selected negative samples dataset by more than 0.2. Figure 7 plots the ROC curves of the four models. It can be seen that the AUC values of both the RNN model and the SRU model are above 0.90. In contrast, the AUC values of both the RNN_random model and the SRU_random model are low, indicating that the RNN and SRU techniques combined with the information value analysis show excellent predictive power for LSM. In addition, the RNN model achieves the highest AUC value (0.928), which is superior to the other models.  Figure 8 shows the accuracy and loss curves of four models, which are used to check the robustness of the results. When the model is optimized to the most stable level, the curves are presented as follows: as the epoch increases, the two accuracy curves gradually increase and level off; the two loss curves gradually decrease and level off (the loss curve of the training set decreases and the loss curve of the test set increases, indicating that the model may have an overfitting problem). All four models are optimized to the most robust level without overfitting problems.

Landslide Susceptibility Maps
When LSM is used for comparison, the maps should be classified using quantitative methods [56]. The model output was analyzed and processed using ArcGIS. The maps were divided into five groups: very high, high, medium, low, and very low using the Jenks

Landslide Susceptibility Maps
When LSM is used for comparison, the maps should be classified using quantitative methods [56]. The model output was analyzed and processed using ArcGIS. The maps were divided into five groups: very high, high, medium, low, and very low using the Jenks natural breaks classification method to finally obtain the landslide susceptibility maps (Figure 9). Among the four maps, most of the historical landslide and high-risk sites in Figure 9a-c are in the high landslide susceptibility areas, which are mainly located in the north, southwest, and southeast due to the mountainous terrain in the northwest and southwest of the study area and the strong human engineering activities in the northeast. According to the statistical indicators, the map shown in Figure 9a, which was constructed by the RNN model, is the best, compared to the map shown in Figure 9b, which was constructed by the SRU. Figure 9a does not have too many high susceptibility areas and does not predict low susceptibility areas such as rivers in the study area ( Figure 2) as high susceptibility areas. Figure 9c, d also predict that some river areas are moderate and high susceptibility areas, which are not in accordance with the geomorphological conditions of the study area. Therefore, the map shown in Figure 9a is believed to be the best portrayal of the real-world conditions. The visual data analysis initially shows the excellent results of the spatial predictive ability of the RNN model encompassing the LSM of the study area. The model evaluation results can still be described using mathematical-statistical methods ( Table 6). LSM produces a model that focuses on high-susceptibility areas and models them simply and efficiently [57]. The evaluation of the practicability of models focuses on two groups, those with a rating of high and very high. First, we introduce the concept of landslide density The visual data analysis initially shows the excellent results of the spatial predictive ability of the RNN model encompassing the LSM of the study area. The model evaluation results can still be described using mathematical-statistical methods (Table 6). LSM produces a model that focuses on high-susceptibility areas and models them simply and efficiently [57]. The evaluation of the practicability of models focuses on two groups, those with a rating of high and very high. First, we introduce the concept of landslide density (LD), which is the frequency ratio, referring to the ratio of the percentage of landslides (IV + V) to the percentage of groups (IV + V) in Table 6. It can be seen that the RNN model is more practical than the SRU model because although the RNN model covers fewer landslide and high-risk points than the SRU model (lower than 3.37%), the high susceptibility regions are much smaller than in the SRU model (lower than 16.31%). The low LD value of the high susceptibility regions of the SRU model also reflects the weak range of real-world applications when compared to that of the RNN model. The RNN_random model and SRU_random model cover too few landslide and high-risk points, indicating that the practical applications of these two models are poor.

Uniqueness of the Study Area
Although Xinhui District is neither an active seismicity area nor an extremely fragile geological environment area, and its climate is not special, its geographic location determines its unique economic location and its research value, as shown in Figure 1. As a new growth pole in the Guangdong Coastal Economic Belt and a destination for industrial transfer from the east to the west of the Guangdong-Hong Kong-Macao Greater Bay Area, the Xinhui District has become an important node district at the strategic intersection of the Guangdong Coastal Economic Belt and the Guangdong-Hong Kong-Macao Greater Bay Area in China, which is both an enormous opportunity and a great challenge. There will be more and more human activities in the Xinhui District, posing a very big challenge to future economic development and land use. Reasonable land planning cannot be separated from reliable geological hazard investigation and evaluation. Therefore, assessing the landslide susceptibility and the potential impacts of landslides on the economic environment can lay the foundation for optimizing the land use patterns and reducing the geological risk in the future.

Optimization of Non-Landslide Sample Selection
A variety of ML methods have been applied to LSM, with good results in recent years. However, previous studies have mostly focused on applying and comparing various ML methods to improve the performance of the models, but the selection of negative samples used to construct the models has affected the architecture construction of ML models. Randomly selecting non-occurring locations as negative samples will lead to considerable pollution, and conducting unsupervised cluster analysis to select negative samples still results in them being specified artificially, which also leads to a great deal of uncertainty in the resultant performance of the model. Therefore, we use the IV analysis to calculate the influencing factors based on historical landslide points to obtain negative samples that have less pollution to produce the landslide susceptibility maps.
The data in this study are different from the positive and negative sample problem that occurs in supervised learning; however, a positive and unlabeled (PU) problem occurs where there are only definite positive and unlabeled samples. It can only be assumed that the unlabeled samples may be negative samples without a level of certainty. Information value analysis was used to obtain the WOE for the entire study area as a basis for the selection of the negative samples. The final result comparison shows that this method works well, and that the negative data pollution is effectively limited. The groups of influencing factors within each pixel contain important data to consider both the positive and negative influences they have on landslides, and the negative value indicates that the importance is not in accordance with the daily logic. Therefore, we use the WOE with a proportional correction IV as an indicator for the most important factors for determining the order of the data for the input into the RNN model. The results indicate that the two slope-related factors, the TRI and profile curvature, were the most important factors in determining whether there was a chance of the occurrence of a landslide at that pixel location.
The problem of non-landslide sample selection has received attention, and many methods have been proposed recently, such as determining the proportion of non-landslide and landslide samples (because the value of negative samples is weaker than that of landslide samples, more non-landslide samples should be selected to improve the accuracy and avoid the imbalance of positive and negative samples caused by too many nonlandslide samples), selecting non-landslide sample sets several times to find the best non-landslide sample set and using semi-supervised learning models. This study obtains negative samples with less pollution through the IV analysis. Overall, various studies on optimizing non-landslide sample selection have achieved satisfactory results. However, due to the differences in study areas and the logic and mechanisms behind different algorithms, there is no universally accepted method for optimizing non-landslide sample selection. A comparative study using different methods for selecting non-landslide samples under the same conditions should be considered in the future.

Comprehensive Comparison of the Various Methods
Four datasets were input into the models, and Figure 9 shows that the dataset using less noisy negative data performs significantly better than the dataset with more noisy negative data in regard to their ROC, ACC, MCC, recall, and F1 values. After that, the traditional RNN model was compared to the newly proposed SRU model (which both use datasets that contain less noisy negative data) to produce two landslide susceptibility maps. Both models have excellent accuracy (AUC > 0.900), but from Tables 4 and 5, the RNN model generates a more reasonable area of high susceptibility for landslide events and identifies more historical points. Therefore, the map helps regional managers make effective decisions, and this study improves the prediction performance of deep learning techniques represented by RNNs in LSM.

Conclusions
This paper focuses on landslide susceptibility mapping (LSM) in the Xinhui District based on the RNN and SRU methods. Using the information value analysis, 15 landslide influencing factors were calculated, and their order of input in the recurrent neural network was determined. Then, the negative data were selected by the information value (IV) analysis. The 178 historical landslide and high-risk points were randomly divided into a training set and a test set for the model calculation, and the final landslide susceptibility maps were produced by the RNN and SRU for comparison purposes. The results led to the following conclusions: (1) the IV analysis method can improve the performance of machine learning methods in LSM by optimizing the selection of negative samples; (2) both the RNN and SRU models obtain excellent results in LSM (AUC > 0.900), but the LSM performance of the SRU, a newly proposed variant of RNNs, is weaker than the traditional RNN model in LSM; and (3) the RNN can produce accurate landslide susceptibility maps in areas that have the geography similar to that of the Xinhui District.
However, there are some limitations to be addressed in further studies, such as better consideration of the existing geomechanical properties, which are not well considered. Moreover, in addition to the characteristics of the non-landslide sample itself, whether the surrounding environment of the non-landslide area also influences the performance of the model needs to be better determined. In the future, more focus will be made on selecting more scientific non-landslide samples by increasing the influencing factors and analyzing the mutual influence of the surrounding environment, etc., to ensure the accuracy of the LSM results.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author, upon reasonable request.