Debris flow susceptibility mapping in mountainous area based on multi-source data fusion and CNN model – taking Nujiang Prefecture, China as an example

ABSTRACT Efforts to evaluate the susceptibility of debris flows in large areas, especially in mountainous regions, are often hampered by the alpine and canyon terrain. This paper proposes a convolution neural network (CNN) model named dense residual shuffle net (DRSNet). It is successfully applied to Nujiang Prefecture in Yunnan Province of China, a typical alpine area with frequent debris flows. DRSNet uses digital elevation model, remote sensing, lithology, soil type and precipitation data as input. First, dense connection and residual structure were used to extract the shallow features of various data. Next, channel shuffle, fuse block and fully connection were applied to strengthen the correlation between different shallow features and give inner danger scores. Finally, precipitation as the activation factor was introduced giving the valleys susceptibility. To verify the feasibility of DRSNet, comparative tests were conducted on 7 CNN models and 3 other machine learning (ML) methods. Experimental results show that DRSNet can achieve 78.6% accuracy in debris flow valley classification, which is at least 7.4% higher than common CNN models and 15.2% higher than other ML methods. This article provides new ideas for debris flow susceptibility evaluation.


Introduction
Debris flow is a kind of natural disaster mainly caused by heavy rainfall (Takahashi 1981;Coe, Kinner, and Godt 2008), mostly in mountainous areas (Perov et al. 2017;Cabral et al. 2021), with a sudden and unpredictable outbreak. It poses a significant threat to the personal and property safety of mountain residents (Hürlimann et al. 2019;Hou et al. 2021). This has prompted a growing number of researchers to use various methods to evaluate the debris flow susceptibility (DFS) of valleys in mountainous regions in order to better carry out disaster prevention and control (Musumeci et al. 2021). The DFS evaluation methods can be roughly divided into three categories: traditional statistical, numerical simulations and machine learning (ML) methods.
The traditional statistical methods assume that the triggering factors of debris flows are generally the same. These methods use some specific functions to fit these factors. The fitted function is used to evaluate the susceptibility of valleys. There are several such methods, including the analytic hierarchy process (Kazakis, Kougias, and Patsialis 2015), gray relational analysis (Wei et al. 2018), evidence weight method (Sharma et al. 2022), rough set (Li et al. 2018), fuzzy evaluation method (FEM) (Wang et al. 2017;Yang et al. 2018), entropy method (Lombardo et al. 2016), etc. Almost all statistical-based methods require pre-set factors to evaluate the DFS. However, the selected factors vary from area to area due to the different geological conditions of different regions. Moreover, even the same method applied to the same area may use different factors. For example, Wang et al. (2017) and Yang et al. (2018) chose the longitudinal slope and bending coefficient of the gully bed, respectively in the DFS assessment of Wenchuan by using FEM. As a result, some factors can only achieve good results in specific areas, and these methods do not have good generalization performance.
The methods based on numerical simulation use outcomes (velocity, flushing distance, area of alluvial fan, etc.) modeled by computers to evaluate DFS. Such methods can be roughly divided into three categories in terms of solution methods. Eulerian methods include FLO-2D (Choi 2018), Flow-3D (Huang, Zhang, and Xiang 2022), Debris2D (Hsu and Liu 2019), MassMov2D (Iannacone et al. 2011), etc. Lagrange methods include DAN (Vagnon and Ferrero 2018), 3dDMM (Shen et al. 2020), etc. Smooth particle hydrodynamics includes DAN3D (Choi et al. 2019), PAS-TOR model (Pastor et al. 2018) and the limit equilibrium method (Rahman, Tabassum, and Islam 2021). Other methods including cell models (Gregoretti et al. 2019), cellular automata models (Tiranti et al. 2018), and topographic gradient methods (Gruber 2007) are also used. To obtain better simulation results, these methods always require specific detailed data. For example, using 3dDMM needs to predefine debris flow paths. Cellular automata model needs detailed topographic data acquired through fieldwork. Thus, it is not very convenient to use these methods to evaluate hundreds of valleys due to the difficulty in data collection.
In recent years, with the improvements in GIS system (Jamali, Naeeni, and Zarei 2020;Jamali, Tabatabaee, and Randhir 2021) and the development of ML techniques, more and more researchers applied ML methods to the DFS calculation Li et al. 2021). The widely used methods include back propagation (BP) neural network, support vector machine (SVM), random forest (RF). Although these methods performed better than traditional statistical and numeric simulation methods due to their superior non-linear fitness ability, the research ideas have not changed. It is notable that the inputs of these models are still some statistical factors such as aspect, curvature, normalized difference vegetation index (NDVI), etc. Thus, subjectivity in factor selection still exists.
Despite the fact that the CNN model is one of the most important ML models, few studies apply this method to DFS assessment. To the best of our knowledge, the work closest to ours is two papers Qin et al. 2022). They all used the CNN model based on LeNet (LeCun et al. 1998) to conduct DFS calculation, but there are two significant problems in the papers. First, LeNet is a relatively old model proposed in 1998. However, magnificent development has been witnessed in CNN models in recent years, and many state-of-art structures have been proposed. Second, the inputs of CNN models in the two articles are still some selected factors instead of the data itself. In these papers, vegetation cover, aspect, slope, etc. extracted from remote sensing (RS) or digital elevation model (DEM) data were used to form 2D-matrices as the input of the CNN model. However, what makes CNN superior is its ability in automatic feature selection from raw data. Thus, the constructed 2D-matrices by manually selected features in the above-mentioned two papers may not fully exploit the capabilities of the CNN models.
To solve the problems mentioned above in DFS, we propose a convolutional neural network using dense and residual structure, as well as channel shuffle, named dense residual shuffle net (DRSNet), to conduct DFS assessment. This is the first paper that combines the raw data (i.e. DEM data instead of slope, aspect that derived from it) and the CNN model to conduct DFS mapping on a regional scale. Besides, CNN model can accept a lower resolution of input than the numerical simulation methods (Boreggio, Bernard, and Gregoretti 2022), resulting in trades-off among the time cost of data collection and the scope of DFS evaluation. This study examined 4 ML models (DRSNet, BP, SVM, RF) and 7 typical CNN models, and their performance was compared by cross-validation. Additionally, mid-feature visualization was used to show the feature extracted by the CNN model.
In this article, Nujiang Prefecture, a typical debris flows hardest hit region, is selected as the study area. Relatively few studies on DFS on a regional scale have been conducted in this area. Moreover, the climate and the geological conditions are similar to other high-incidence areas of debris flow, such as Nepal and Pakistan. Hence, the proposed method has the potential to apply in these areas.
The specific objectives include: (1) to validate the effectiveness of DRSNet in debris flow valleys classification and (2) to conduct the DFS mapping in the study area. The hypothesis that the CNN model is feasible for DFS assessment is tested.
The structure of this paper is organized as follows. Section 2 describes the characteristics of the study area, including geology, climate, etc. Section 3 introduces the proposed network structure and the specific implementation steps. Experiment results are given in Section 4. Discussion is conducted in Section 5, and conclusions are drawn in Section 6.

Study area
Nujiang Lisu Autonomous Prefecture (Nujiang Prefecture for short) is located on the southwest border of China (at about 98°08'55''E to 99°38'38''E and 25°33'09''N to 28°23'51''N). Ravines and mountains crisscross the area, and the Dulong River, Nu River and Lancang River flow through it. The total area of Nujiang Prefecture is about 14,585 km 2 . The location map of the study area is illustrated in Figure 1.
Nujiang Prefecture is one of the hardest hit areas by debris flows. The frequent occurrence of debris flows has much to do with the geographical location and topography. Nujiang Prefecture is located at the junction of two major plates in Asia, Europe and the Indian Ocean, and the neotectonic movement is active (Li et al. 2019). As a typical mountainous area, more than 98% of the landscape of Nujiang Prefecture is mountain and the terrain with a slope greater than 25°accounts for 76.6% of the total area. The maximum height difference is about 4.4 km. In terms of climate, there are distinct wet and dry seasons. There are two rainy seasons in the study area, from February to April and from May to October. The runoff of rainy seasons dominates more than 80% of the gross annual precipitation (Zhang et al. 2007). Rainfall can be as high as 600 mm during only one month, and most are heavy rainfall, which is the trigger factor of most debris flows.
Although the number of casualties has decreased in recent years with the improvement of the disaster prevention capability and awareness of government and residents, it is still difficult to avoid substantial economic losses, which hinders local economic development.

Materials and methods
The experiment process involves four parts: (1) constructing debris flows dataset based on watershed; (2) classifying the valleys by K-means according to their elevation difference, main groove length and catchment area; (3) constructing the DRSNet model (4) evaluating the performance of the model, generating DFS map for the study area and analyzing the features extracted by CNN model. The flowchart of the experiment process is shown in Figure 2. 3.1. Debris flows dataset construction DEM, RS, lithology, soil and precipitation data are used in the experiment. DEM data comes from NASA with a resolution of 12.5 m. RS data comes from the Gaofen-1 satellite (GF-1), which has 4 bands with a resolution of 16 m. Lithology and soil data come from ISRIC (International Soil Reference and Information Center) with a resolution of 90 m. Daily precipitation data comes from TPDC (National Tibetan Plateau Data Center). The data overview of the study area is presented in Figure 3.
Debris flows in mountainous areas always occur in the range of valleys. Valley unit, also called watershed unit, has been successfully applied in previous studies (Shi et al. 2016;Zêzere et al. 2017). The data extraction process is stated as follows. First, the watershed units were generated by Strahler stream order (Strahler 1957). The watershed units were used to extract DEM data of each valley. Then, the extracted DEM of the valley was used as the mask to extract other data. Finally, 1 layer of DEM data, 4 layers of RS data, 1 layer of lithology data, 1 layer of soil data and max daily precipitation of the valley were extracted. During the extraction of RS data, orthorectification is needed to eliminate the geometric distortion. Figure 4 shows the detailed process of data extraction.
These extracted data can be divided into two groups: data related to internal factors and data related to external factor. The internal factors refer to the inherent property of a valley, such as aspect, slope which can be derived from the DEM or NDVI that can be derived from RS data. The external factor refers to the activation condition of debris flows, mainly precipitation. It is under specific internal conditions and external condition that a debris flow may occur.
DEM, RS, lithology, soil type and precipitation data of 672 valleys in the study area were extracted. Among them, 164 valleys were selected as the dataset used for training and testing. In these 164 valleys, 82 debris flow valleys were used as positive samples, and the other 82 non-debris flow valleys were negative samples. The selection process of positive samples is as follows. In the Yunnan Disaster Reduction Yearbook and related news reports, a total of 132 debris flow records were retrieved. Unfortunately, only 82 valleys can be located precisely. These 82 valleys routed by debris flows were selected as the positive samples. As for the 82 negative samples, since there were no specific recordings about where debris flows did not occur, we set the following standards to pick up negative samples manually: (1) There were farmland or residential areas nearby; (2) The nearest village next to the valley had no record of debris flow. Since in areas without human activity, it is difficult to determine whether debris flow had occurred. Disasters may have occurred without being observed. Therefore, selecting a valley with human activity and no record of debris flow as a negative sample is more reasonable.
On the one hand, studies find that although many major debris flows occurred in large valleys (Horton et al. 2019), small valleys (catchment area ≈1 km 2 ) may also induce severe disasters (Bertolo and Wieczorek 2005). On the other hand, many previous studies (Yang et al. 2011;Banihabib, Tanhapour, and Roozbahani 2020) have pointed out that the shape factors of a valley (form factor, catchment area, main groove length, etc.) have a strong relationship with the DFS. Thus, to better distinguish the relationship between DFS and valleys with various shapes, we divided the positive and negative samples into three categories by K-means according to their elevation difference, main groove length and catchment area. The classification results are shown in Table 1.
After data classification came two problems. First, the data of different classes are unbalanced, which is not suitable for training the CNN model. To solve this problem, we adapted data enhancement methods, including rotating and flipping. The second problem is the total amount of data is still too small for CNN models which generally need thousands of images for training. Although . Data source used in the study. (f) is a precipitation schematic diagram of any day. The extent of each map is the study area (from 98°07'55"E to 99°38'38"E and from 25°33'09"N to 28°23'51"N).
data enhancement can alleviate this problem, the model also needs a unique design to settle it. Techniques in model design are detailed in Section 3.3. Figure 5 shows the process of data classification and enhancement.

CNN model
This section gives a brief introduction to CNN models. A typical CNN model contains three parts: the convolution layer, the pooling layer and the fully connection layer.
The most important part is the convolution layer. The function of the convolution layer is to extract features from the given data. The output of this layer is feature maps. The value of the feature maps is controlled by the convolutional kernels which can be optimized by back propagation. There are two main parameters related to the kernels: kernel size and stride. In extracting features from DEM and remote sensing data, kernel size means the range of feature extraction, and stride means to what extent interconnections between each small range are considered. Thus, it is better to set the value of stride smaller than the shape of the kernel size to fully analyze the relationship between each small range.
In DFS assessment, the pooling layer simplifies the calculation and extracts more fine-grained features. In this paper, features refer to factors related to debris flows and non-debris flows. The usage of the pooling layer is detailed in Section 3.3.1. The fully connection layer acts as the classifier to give the DFS. A typical convolutional layer and maxpooling layer are shown in Figure 6.

DRSNet
DRSNet is a CNN model designed specifically for DFS assessment. We argue that it is the valley's specific conditions (e.g. geometry structure, material conditions) under activation that a debris flow occurs. Thus, the overall structure of DRSNet can be divided into two phases: inner danger calculation and susceptibility under activation (rainy). The structure of DRSNet is given in Figure 7. Phase 1 of DRSNet calculates the inner danger of a valley. This phase first extracts shallow features from DEM, RS, lithology and soil data. Then,   fusion is conducted by channel shuffle and fuse blocks. Finally, a vector of length 6 is considered to be the inner danger features. The 6 numbers are also the similarity scores, which indicate the similarity between a valley and 6 different types of valleys. The idea here is that the more similar a given valley to valleys of class 0,1 and 2, which debris flows have occurred, the higher the inner danger of the valley. DRBlock and GeoBlock are detailed in Section 3.3.1. Channel shuffle and fuse block are detailed in Section 3.3.2.
Phase 2 of DRSNet is to calculate the susceptibility of a valley. The insight here is that DFS is the result of inner danger factors being activated by external tigers. Thus, we combine the internal danger score of Phase 1 and the max daily precipitation of the valley to give the DFS of a valley.
3.3.1. DRBlock and GeoBlock for feature extraction DRBlock (Dense-Residual Block) and GeoBlock are designed to extract shallow features from raw data. The raw data can be divided into two groups: DEM/RS data which have more abundant information and lithology/soil data which have relatively simple information.
Since abundant information (aspect, slope, elevation difference, NDVI, etc.) can be extended from DEM/RS, a deeper structure is needed to make full use of these data. However, deeper structures can easily lead to overfitting due to an insufficient amount of data. Thus, dense connection from DenseNet (Huang et al. 2017) and residual structure from ResNet (He et al. 2016), which all perform well on small sample problems, are introduced to the construction of DRBlock.
Lithology and soil data are relatively simple, so we constructed a relatively simple structure, namely GeoBlock, to get rock and soil information. The structure of DRBlock and GeoBlock is shown in Figure 8.
The residual connection used in both dense connection and residual structure was modified by maxpooling in this paper. A standard residual structure can be described as follows. Let F i (.) denote the convolution operation of the i-th layer, o i denote the output of the i-th layer, then the output of i-th layer for a standard residual structure is: However, we adjusted the original residual structure in DRBlock. Let P(.) denotes max pooling, the output of modified residual structure in this paper is: Compared to the original structure, which adds F i (o i−1 ) and o i−1 directly, the modified structure adds F i (o i−1 ) and the max pooling result of the previous layer. As the residual structure is achieved by addition, feature maps involved in the addition must have the same size (same height and same width). However, convolution will make the feature map smaller. As a result, padding is used on the original residual structure to make F i (o i−1 ) and o i−1 have the same size. The drawback of padding is that it may perturb the extracted features. Feature maps themselves, as the output of the previous convolution layers, have already contained a lot of semantic information. As soon as the meaningless numbers (i.e. 0) are used to make the previous feature maps' size larger, the information is polluted to some extent. The polluted features are harmful to feature extraction and fusing afterward. By contrast, max pooling is used instead of using padding to make the size match. In this operation, no meaningless numbers are introduced. Therefore, max pooling can not only reduce the model size to accelerate computation but also balance the feature information and reduce the feature size. Figure 9 shows the diagram of the original and modified residual structure.
3.3.2. Channel shuffle and fuse block for feature fusion Shallow features are insufficient to evaluate the DFS of valleys, as the susceptibility depends on multifactor interactions. As a result, the feature fusing procedure is put forward to further characterize the interaction between the extracted features. This procedure has two main substructures: channel shuffle and fuse block. The diagram of the channel shuffle and structure of the fuse block is shown in Figure 10.
Channel shuffle is first proposed in ShuffleNet ). This operation can help the information flow across different channels. It can enhance the correlation between different shallow features, which is useful for the inner danger calculation of DFS. Specifically, shallow features of DEM, RS, lithology and soil are split evenly into 4 groups, and these groups are reconstructed to form 4 new feature groups, as shown in Figure 10.
After channel shuffling, the modified residual structure is used again to further feature fusion. Since the input of every fuse block is the result of channel shuffling, which is the combination of different features, it is necessary to use residual structure to fully use both shallow and high-level features.

Calculation of inner danger and DFS
DRSNet is a two-phase model. Thus, it has two main outputs. First, the inner danger score of the valley, which can also be viewed as potential DFS. Second, the DFS of the valley.  The output of phase 1 is the similarity score between a given valley and 6 different kinds of valleys. Let x (1) i denote the similarity score between a given valley and the i-th class of valley, S (1) denote potential DFS. To convert the similarity score to potential DFS (i.e. probability), softmax is used: According to the value of S 1 , the potential DFS of the valley can be divided into five categories: Let S (2) denote the output of phase 2, which is the similarity score between the input valley and the debris-flow valley. DFS can be decided on the value of S (2) . The value range of S (2) and the corresponding level of DFS is same as S (1) in (5).

Model performance evaluation
This part introduces the metrics for CNN and ML models evaluation. There are some notations needed to be explained first: TP (true positive) is the number of the correct positive predictions; FN (false negative) is the number of incorrect negative predictions; FP (false positive) is the number of incorrect positive numbers and TN (true negative) is the number of correct negative predictions.

Evaluating CNN models
Overall accuracy (OA), precision, recall rate, F1-score and Kappa coefficient were used to compare model performance.
1. OA means the ratio of correct predicted samples of all test samples.
2. Precision means the ratio of correct positive predictions of all samples that are predicted as positive. For DFS assessment, the higher precision means the model is more capable of distinguishing the difference between debris flow and non-debris flow.
3. Recall rate means the ratio of correct positive predictions of all positive samples. If a model has a high recall rate, the model captures specific features related to debris flow valleys.
4. F1-score is the comprehensive result of precision and recall.

F1 = 2 * Precision * Recall
Precision + Recall (9) 5. Kappa coefficient is used for the estimation of inter-observer agreement. The value of the kappa coefficient ranges between 0 and 1. This coefficient is relatively conservative because it eliminates the random factors in prediction. Thus, the higher value of the kappa coefficient gives us more confident in the assessment result of DFS.

Evaluating ML models
The receiver operating characteristic (ROC) curve expresses the trade-off between the TP rate and FP rate as its discrimination threshold varied. Area under the curve (AUC) is used to evaluate the predictive ability of a model. The value of AUC ranges from 0.5 to 1.0 and has a positive correlation with the predictive ability of a model.

Performance metrics evaluation
The parameters and environment settings are listed in Table 2.
To make a fair comparison, the number of layers and hidden units of the BP net were determined on the basis of Hecht-Nielsen (1987). The parameters of SVM and RF were determined by the grid search method. The parameters of BP, SVM and RF are listed in Table 3. The factors used for training and testing BP net, SVM and RF were all derived from DEM, RS, lithology and soil data. Totally 14 factors were used in the experiment, including: (1) water shed area; (2) longitudinal slope; (3) difference of elevation; (4) main groove length; (5) Curvature; (6) percentage of area with slope greater than 25°; (7) stream power index (SPI); (8) sediment transport index (STI); (9) topographic wetness index (TWI); (10) watershed cutting density; (11) NDVI; (12) lithology; (13) soil type; (14) max daily precipitation.
In the first experiment, we compare the model performance of DRSNet with other 7 common CNN models and 3 ML models. The results were obtained by 10-fold cross-validation. 90% of data were randomly selected for training and the remained 10% of data were used for testing in each fold. The results are shown in Table 4.
Notably, DRSNet has the best results by all metrics. The precision of the proposed model is at least 7.4% higher than other CNN models and 15.2% higher than any ML method, which indicates that DRSNet is more capable of distinguishing features of debris flows and non-debris flows. F1score up to 0.811 denotes the robustness of DRSNet. Comparing all CNN models with other ML methods, almost all CNN models perform better than other ML methods.
To further demonstrate the predictive ability of DRSNet, the ROC curve of DRSNet, BP net, SVM and RF as well as their AUC are given in Figure 11. It is clear that the AUC of DRSNet is the highest, which has the best predictive ability.

Potential DFS and DFS mapping
Since DRSNet has a remarkable ability to recognize the features related to debris flows, it is suitable to use this model to conduct DFS mapping. Two maps, the potential DFS map and the DFS map are drawn according to the output of phase 1 and phase 2 of the model. Maps are shown in Figure 12. The number of valleys of different danger levels is given in Table 5.
In phase 1, 104 of 132 historical debris flow valleys were evaluated as high or very high susceptibility. After adding the rainfall factor, 118 of 132 historical debris flow valleys were judged as high or very high susceptibility. This indicates that, on the one hand, the internal factors can evaluate the DFS to some extent. On the other hand, the precipitation factor can further improve the predictive ability of the model. It is worth noting that there are still 14 historical valleys that were evaluated as mid/low/very low susceptibility. We suspect that there are two possible reasons. First, the precipitation factor is concatenated with other inner danger features in phase 2. These inner features are expressed in the form of a vector, and are already the result of interactions between factors. Thus, the interaction between precipitation and other factors may not be fully calculated. The second reason is the still limited amount of data despite data enhancement and careful structure design applied. Some valleys do have debris flow occurring, but the number of valleys similar to them is too small, which is not sufficient for the model to capture features related to these small portions of valleys. Nevertheless, comparing our results with previous studies on the same study area (Yimin, Lei, and Suhang 2019), although the elevation result for the Nu River basin and Lancang River basin are similar, the susceptibility map generated by DRSNet is much better. In the study of Yimin, Lei, and Suhang (2019), most of the area to the east was judged as low/very low, where numerous debris flows have occurred. By contrast, our model gave a more reasonable evaluation of DFS in this part of the area.

Model performance
Clearly, the proposed DRSNet achieves state-of-art performance. Hence, it is necessary to explore why the model performs so well.
Comparing CNN models with other ML methods, it is not surprising that the performance of CNN models is generally higher. This is due to the differences in input data. The inputs of BP net, SVM and RF are still statistical values, and these input factors are mutually independent. However, the input of CNN models is comprehensive data. Taking DEM as an example, the input of CNN models is the DEM data of the valley, but the inputs of other ML methods are slope, aspect and elevation difference derived from the DEM. Thus, data supposed to be whole is fragmented, which is ineffective in the subsequent fitting of inter-factor relationships.  Comparing DRSNet with other CNN models, it is no doubt that DRSNet performs best. A significant difference lies in the organization method of different data. Among all the CNN models in this paper, only DRSNet first sent different data to different branches and then fused features of different data. Other CNN models packed all the data together and threw them to the model indiscriminately. Owing to the difference between the raw data, we presume that the features may influence each other at an early stage and even contaminate each other. We assume that when using CNN to deal with heterogeneous data, it is necessary to process various types of data separately and then fuse them.
It is worth mentioning that although deeper models, including ResNet18, ResNet34, ResNext and DenseNet have more residual structures, which are also intensively used in DRSNet, they fail to achieve excellent results. We infer that the reason is partly because of the unique structure design for different data and partly because of the difference in the number of parameters. Other residual structure-based models have more parameters to be trained due to their deeper structure, while DRSNet has a relatively small number of parameters. It is generally accepted that the deeper a CNN model is, the better it performs (Mahmood et al. 2017). However, we argue that this view is based on the premise that the training data are sufficient. For conventional tasks, there are massive data of tens of thousands or even millions. Thus, deeper structures may lead to better results. However, for special tasks like DFS assessment, there are not enough data to train so many parameters. A deeper structure with more parameters leads to a higher risk of overfitting, which means the loss of generalization ability. We suggest that the size of the model should match the amount of data to achieve better results. The number of parameters of residual-based models is shown in Table 6.
Another point worth noting is that the kappa of all models is lower than 0.8. Even the best model DRSNet only reached the kappa of 0.749. We speculate the reason is that the model prefers to predict input samples as positive samples. This tendency can be seen from the precision rate and recall rate in Table 4, as well as the DFS mapping in Figure 12. The tendency may result from the low resolution of the raw data, which slightly weakened the model's ability to distinguish debris flow valleys and non-debris flow valleys. Thus, in cross-validation, these wrongly classified negative samples resulted in low kappa. However, due to the manual selection of the negative samples may not very precise, the kappa of 0.749 is still acceptable to conduct DFS mapping.

Case study
In this section, a severe debris flow event in Dongyuege Valley is carefully analyzed and mid-feature visualization is used to demonstrate the feature extraction ability of DRSNet. This disaster caused 39 death and 53 missing. The direct economic loss is up to 0.14 billion yuan. Figure 13 gives an image of the valley, the alluvial fan and the image after the disaster.
Dongyuege Valley (98°43'57 ′′ E, 27°38'14 ′′ N) is located at Puladi Village. The main groove length of this valley is 14.7 km. The watershed area is about 45 km 2 , while the average width is only 3.3 km. The elevation difference is up to 2854 m. 80% of the entire valley area has a slope of over 25°, most of which is located at the upper half of the valley. The longitudinal slope of the valley bed is 212‰. The shape of the valley is long and narrow, and the upstream is fan-shaped. The valley profiles on both sides are V-shaped. All these conditions are conducive to the convergence and acceleration of water flow (Simoni et al. 2020). Figure 14 shows some crucial features DRSNet captured. Features calculated in ArcGIS are also given for comparison. All features have good interpretability. (a2) and (a3) capture the different aspects that control the confluence direction of waterflow. The red part in (a2) corresponds to the north aspect, and the red part in (a3) mainly concentrates on the south aspect. (a2) and (a3) not only have an apparent complementary relationship but also maintain a high degree of consistency with (a1) which is generated from DEM data in ArcGIS. (b2) is the curvature features captured by DRSNet, with a high degree of coherence of (b1), which is also generated directly by DEM. (c1) and (c2) are STI and TWI, respectively, which are calculated as follows: Where A denotes the area of the upstream watershed, S denotes the slope. The SPI is used to measure the erosive force of running water, while TWI reflects the cumulative effect of local topography on runoff flow direction. Since the calculation methods of SPI and TWI are similar, their images are similar too. Nonetheless, the similarity between (c3) and (c1), (c2) remains striking, which indicates that the model captures related features. (d1) is the NDVI generated from RS data. Feature map (d2) is almost identical to (d1). From the above discussion and figures, it is notable that the CNN model does extract many deep features which are closely related to the debris flow. That means the model 'learned' the features by itself. For example, the input data is just raw DEM, which only contains the elevation of each point, but the aspect is calculated and captured by DRSNet. Furthermore, features like curvature that cannot be derived directly from DEM data are also captured since curvature is calculated from the slope. Thus, it is inevitable that the CNN model learned the slope features first, and then curvature was captured. Besides, some interacting features like NDVI are also captured. The conventional calculation method of NDVI is as follows: where NIR (near infrared) and Red are Band 4 and Band 3 of the RS data, respectively. This feature is also captured by DRSNet as can be seen in Figure 14(d2). In terms of material source conditions, it is shown in Figure 13(c) that glaciers are distributed at higher altitudes. DRSNet also captured the extent of glacier distribution in Figure 13(a). The moraines formed by glacial erosion can become the material source under high temperatures or rain washouts. Other material sources are mainly broken rocks, including quartzite, granite and marble. The hard-and-soft mingled rocks are severely weathered due to the subtropical mountainous humid monsoon climate. These features are reflected in the RS and lithology data, which DRSNet can also capture.
Combining the above analysis, it can be seen that Dongyuge Valley has excellent potential conditions for debris flow formation. These conditions include the geometric structure conducive to water convergence and abundant material sources. Combined with historical records, the average temperature was 28.6°C before the disaster happened, exceeded the historical highest temperature of 27.7°C. The high temperature caused some glaciers to melt, destabilizing some moraines. In addition, the maximum rainfall reached 11 mm/h before the disaster happened, which was much higher than the historical average. The high intensity of rainfall acted as the trigger factor and broke the shear strength limit of the soil. Glacial meltwater, precipitation carrying moraines and broken rocks formed a powerful debris flow. Under the favorable topographic condition, the debris  flow continued to accelerate and eventually rushed out of the valley, causing a great disaster. This potential DFS and DFS of Dongyuege Valley are 1.0 and 0.989 respectively, which is very high. The assessment result is shown in Table 7.

Conclusion
In this paper, DRSNet, a CNN model for the DFS assessment in the valley unit, is proposed. This is the first study that directly combines the CNN model with raw data on DFS assessment. We carried out comparative tests on other CNN models and ML methods to demonstrate the superiority of DRSNet. The result of DFS mapping of the study area is highly consistent with historical disaster records. The following are the main conclusions of the study. plicated geological factors such as curvature and NDVI from given data, which provides a new angle for investigating the debris flow-causing factors. 3. When dealing with small sample problems using CNN, the model size should be adapted to the amount of data. Otherwise, the model will likely overfit. When the raw data are heterogeneous, separate processing of different data benefits feature extraction. In addition, the effectiveness of the residual structure was once again confirmed. This structure improves the data reusability, which is suitable for small sample problems.
In summary, the CNN model is well-suited for DFS assessment in large areas. We expect the CNN model can be a new approach to DFS assessment and the result of the CNN model can act as a foundation for debris flow prevention and control. In future studies, adding the precipitation factor more naturally to the CNN model may lead to better predictive ability. Finding better ways to interpret the mid-feature maps is also a research focus.

Disclosure statement
No potential conflict of interest was reported by the author(s).