Determination of Important Topographic Factors for Landslide Mapping Analysis Using MLP Network

Landslide is one of the natural disasters that occur in Malaysia. Topographic factors such as elevation, slope angle, slope aspect, general curvature, plan curvature, and profile curvature are considered as the main causes of landslides. In order to determine the dominant topographic factors in landslide mapping analysis, a study was conducted and presented in this paper. There are three main stages involved in this study. The first stage is the extraction of extra topographic factors. Previous landslide studies had identified mainly six topographic factors. Seven new additional factors have been proposed in this study. They are longitude curvature, tangential curvature, cross section curvature, surface area, diagonal line length, surface roughness, and rugosity. The second stage is the specification of the weight of each factor using two methods. The methods are multilayer perceptron (MLP) network classification accuracy and Zhou's algorithm. At the third stage, the factors with higher weights were used to improve the MLP performance. Out of the thirteen factors, eight factors were considered as important factors, which are surface area, longitude curvature, diagonal length, slope angle, elevation, slope aspect, rugosity, and profile curvature. The classification accuracy of multilayer perceptron neural network has increased by 3% after the elimination of five less important factors.


Introduction
Landslide is one of the most aggressive natural disasters that causes loss of lives and billions of dollars worth of damages annually worldwide [1]. Landslide is also a frequent problem throughout most of Malaysia following a heavy rainfall. The total economic loss due to landslides in Malaysia reported from 1973 until 2007 is estimated to be about one billion US dollars [2]. Considerable amount of research works have been conducted over the past years to identify the most important factors that cause the slope instability [3]. However, there are different factors such as geological, topographical, and human causes (disregard for sustainable developments) contribute towards landslide occurrences [4,5].
A literature review of landslide-causing factors shows that topographic factors are linked strongly with landslide occurrence [6][7][8][9][10][11][12][13][14]. Slope angle, slope aspect, plan curvature, profile curvature and general curvature, are the conventional topographic factors which are extracted from digital elevation model [15]. DEM has recently found widespread application in geographic information system [16] and landslide hazard mapping. Some studies have merged the DEM to landslide hazard mapping in their applications [3,6,7,17] Neural networks have gained popularity from their simplicity, generality and easy application. They have shown good performance when used in landslides prediction and weight determination of the landslide causative factors [18,19]. One of the most popular neural networks is the multilayer perceptron network. Many training and learning algorithms have been found to improve the performance of the MLP; the most popular one is the back-propagation algorithm. In the year 1999, Zhou has introduced an algorithm to determine the weights of each of the input factors through the neural network training. The 2 The Scientific World Journal study in this paper has many contributions. Firstly, digital elevation model with very high resolution of 5 meters/pixel is used, while the previous studies used 20 to 10 meters/pixel resolution. Secondly, this study includes the extraction of new topographic factors, which has not been performed on Penang island or in Malaysia before. These seven new factors are cross curvature, tangent curvature, longitude curvature, surface area, surface roughness, rugosity, and diagonal length. Thirdly, the importance of factors is determined using the MLP network layer weights (Zhou method) and output accuracy. Dominant factors which have higher influences towards landslide are determined based on these two methods, that is, weights computed using Zhou method and output accuracy. The dominant factors are used in the landslide hazard analysis for better accuracy. Figure 1 shows the work methodology for this study.

Study Area
Penang consists of the island of Penang and a coastal strip on the mainland known as the Province Wellesley. Figure 2 shows the study area of Penang island and landslide location map with hill shaded map [20]. It lies between 5 ∘ 15 to 5 ∘ 30 N latitude and 100 ∘ 10 to 100 ∘ 20 E longitude. The North Channel separates the study area from the mainland. Penang island occupies an area of 285 km 2 and it is one of the 13 states of Malaysia, located in the northwest of the Malaysian Peninsula. Topographic elevations vary between 0 m and 820 m above sea level. The geological data of study area shows that Ferringhi granite, Batu Maung granite, clay, and sand granite represent more than 72% of the study area's geology. The rainfall plays a major role in triggering the landslides in the study area. The rainfall amount varies approximately between 2254 mm and 2903 mm annually in the study area. The slope angle ranges from 0 ∘ to 87 ∘ while 43.28% of Penang island is flat. This research work focuses only on the island, where frequent landslides have occurred and threaten lives and damage properties. Landslides analyses in Penang island have been analyzed by different methods such as statics, fuzzy, and neural network methods [21]. The previous studies used the geological factors and topographic factors, together with other factors, to produce the landslide hazard map. For this research work, the topographic factors are the subject of the study.

MLP with Back-Propagation Algorithm
The multilayered perceptron is one of the widely used tools in solving classification and prediction problems. This is because of its computational simplicity, finite parameterization, stability, and smaller structure size for a particular problem compared with other neural network structures [22]. MLP consists of a set of layers, namely, input layer, one or more hidden layers, and an output layer ( Figure 3).
Each layer in the MLP consists of independent processing units called neurons. These neurons are linked to neurons in other layers through the weight. The network determines the relationship between pairs of input (factors) and output (responses) vectors by altering the weight and bias values. Adjusting the weights between the neurons without a learning algorithm is a difficult task. For that, the backpropagation learning algorithm with momentum was used in this study to reduce the error rate between the actual output and the neural network output results. The algorithm was also used to build up the weight for the input factors [23]. In the input layer, each input is multiplied by a corresponding initial weight; the sum of the product is obtained and then processed by using an activation function to produce a result. For one hidden layer of the MLP network, as shown in Figure 3, the input and output of the th neuron in the hidden layer are given by (1) and (2), respectively. Consider where net indicates a hidden layer input, and are indices of different neurons in the network, is the size of the input vector, is the weight, and is the input element. Each neuron of the hidden layer takes its input net and uses it as the argument for a function and produces an output given by: The function is usually a nonlinear sigmoid function that is applied to the weighted sum of inputs before the signal propagates to the next layer. One advantage of a sigmoid function is that its derivative can be expressed in terms of the function itself, as shown in the following equation: (net ) = (net) (1 − (net )) .
An error of training input pattern can be defined as being the difference between the network output, , and the target output value, , as follows: The sum of squared error can be calculated as follows: where is the number of neurons in the output layer. The error is propagated back through the neural network and is minimized by adjusting the weights between layers. The weight adjustment is expressed as follows: where Δ ( + 1) and Δ ( ) are weight changes in epochs ( + 1) and ( ), respectively, is the learning rate parameter, is an index of the rate of change in the error, and is the momentum coefficient. This process of feeding forward signals and returns is repeated iteratively until the error of the network is minimized as a whole or reaches an acceptable value, which is 0.1 for this study.

Methodology
In this research paper, twelve topographic factors relevant to landslide analysis were extracted from the DEM using Matlab software. These factors were analysed for the importance rating of factors by using two different methods, that is, MLP network layers weights (Zhou method) and output classification accuracy. Finally, the important factors selected based on the two different methods were analysed and compared. The DEM map of the study area represents the elevation of Penang island ( Figure 5(a)). The DEM was used to extract the maps of twelve topographic factors, which are slope angle, slope aspect, general curvature, plan curvature, profile curvature, cross curvature, tangent curvature, longitude curvature, surface area, surface roughness, rugosity, and diagonal length. Landslide locations of the study area were collected from various government agencies. Figure 4 shows the moving window and denotes the grid resolution, which is equal to 5 meters in this study. Let = ( , ) be a given point in DEM surface while (1 ≤ ≤ 9) denotes the elevation at each cell of the 3 × 3 moving window. The extraction algorithms were developed for the twelve topographic factors based on the equations listed in the following sections.

Topographic Factors Extraction
Slope Angle. The Simple Difference method [24] was applied to extract the slope angles of Penang island. Figure 5(b) shows the extracted slope angle map with the deviation of the angle level using the following equations: slope angle = arctan √ 2 + 2 .
Slope Aspect. Slope aspect is defined as the direction of the slope. Results from previous research have shown that there is a link between the slope aspect and its prone towards landslide. Furthermore, in some landslide cases, researchers have agreed that the slope aspect is one of the main reasons for the occurrences of landslides [25,26]. In this study, the slope aspects of Penang island were extracted from the DEM by applying (9) [27]. The slope aspect has been divided into nine classes ( Figure 5(c)), namely, North, North East, East, South East, South, South West, West, North West, and Flat: Curvature. Surface curvature at a point is the curvature of a line formed by the intersection of the surface with a plane with a specific orientation passing through this point [28][29][30]. Plan curvature, profile curvature, tangential curvature, longitudinal curvature, cross section curvature, and general 4 The Scientific World Journal 243000 246000 249000 252000 255000 258000 261000 264000 582000 585000 588000 591000 594000 597000 600000 603000 606000 River Landslide curvature are the six types of curvatures [31] which are considered in this paper.
The value of the curvature can be either above, below, or equal to zero, representing the convex, concave, or flat shaped curvatures, respectively, as seen in (15). Some equations and definitions are identified in (10)-(15) before the extraction process. Curvature maps are shown in Figures 5 More details about the derivation of, , , , and are found in [28]: Figure 3: The conventional MLP networks.
Plan Curvature. Plan curvature is defined as curvature in a horizontal plane. In addition, a plan curvature can be defined as the hypothetical line, which crosses a specific cell on the contour line. Plan curvature is derived using the following equation: Profile Curvature. Profile curvature is the curvature of the surface in the direction of the steepest slope (with respect to the vertical plane of a flow line). The profile curvature affects the flow velocity of water draining the surface and influences erosion and deposition. In locations with convex (negative) profile curvature, the erosion will prevail while depositions occur in locations with concave (positive) curvature [31]. The following eqaution was used to calculate the profile curvature for this study: The Scientific World Journal Slope angle (deg)  General Curvature. As identified by Wood [32,33], general curvature (also called total curvature) is the curvature of the surface itself (not the curvature of a line formed by the intersection of the surface with a plane). The general curvature can be positive or convex (indicating peaks), negative or concave (indicating valleys), or zero (indicating flat surface or a saddle) [31]. Taking into consideration the previous kinds of curvatures, a link can be established with general curvature as follows: general curvature = profile curvature + plan curvature.
Tangential Curvature. Tangential curvature was identified by Wilson and Gallant [29]. This is the curvature along the line orthogonal to the line of steepest gradient. As with plan curvature, this value indicates whether flowing substances will converge or diverge as they flow over a point [33,34]. The equation for the tangential curvature is given as follows: Longitudinal Curvature. Identified by Wood [32,33], this is conceptually similar to the curvature of the line of intersection between the surface and the plane defined by the slope and aspect direction. It is interpreted in the same manner as profile curvature, in that it tells whether a flowing substance will be accelerating or decelerating as it goes over a point.
The following equation shows how longitude curvature is calculated.
Cross Section Curvature. Cross section curvature was identified by [32,35]; this is conceptually similar to the curvature of the line of intersection between the surface and plane defined by the slope normal and aspect direction. It is interpreted in the same way as plan curvature, in that it tells us whether a flowing substance will be converging or diverging as it goes over the point: Hence, each type of curvature could be convex, concave, or flat. The curvature values consider the corner stone in the curvature shape estimation using (15).
Diagonal Line Lengths. Diagonal line is the line passing through the central cell from the two-corner neighbor cells ( Figure 4). By calculating and through the neighboring cells and using Pythagorean theorem in calculating the hypotenuse, the diagonal length of the values of the center cell can be obtained. It is determined by the horizontal and vertical deltas, as shown in (22). Figure 5(j) shows the length of diagonal line for Penang island: Surface Area. There are a variety of methods in the literature for measuring surface irregularity by using DEM data [36,37]. An estimate of the surface area could also be derived from the slope and the slope aspects within a cell [38]. For this study, the Berry method was used. The surface area is equal to the planimetric area. Its value reflects the topographic surface area within that cell. There are two conditions for calculation as indicated in (23) and (24). The surface area map is shown in Figure 5 where the adjustment factor = 1/ cos (slope angle) and is the cell area.
Surface Roughness. Surface roughness is useful as it reflects numerous geophysical parameters, such as landform characteristics, distribution of crenulations, and degree of erosivity. A number of methods have been proposed for the definition, calculation, and application of surface roughness based on the different types of parameters required for various analyses [39][40][41]. For this study, (25) was used to extract the surface roughness and Figure 5(l) shows the map of surface roughness: Rugosity. This factor is the ratio of the surface area to the planar area across the neighbourhood of the central pixel which is 5 (Figure 4). Using this method, flat areas will have a rugosity value near to 1, while high relief areas will exhibit higher values of rugosity [42], as shown in (26).

Determination of Important Factors.
Two methods are implemented to determine the important factors for landslide analysis. They are weight determination using Zhou method and classification accuracy method.

Weight Determination and Factor Rating Using
Zhou Method. As stated previously, the back-propagating approach is suitable to be used for landslide application in order to determine the weight of each input factor. Zhou (1999) described a method for weight determination using back-propagation. The same method is adopted in this study. The effect of an output from the hidden layer nodes on the output from an output layer node can be represented by the partial derivative of with respect to shown as follows: Equation (27) can have both negative and positive values. Weight importance of node relative to another node in the hidden layer may be calculated as the ratio of the absolute values derived from the following eqution: where is another weight in other than . Equation (28) shows that with respect to a particular node in the output layer, the relative importance of node in the hidden layer is proportional to the absolute value of the weight on its connection to the node in the output layer. Eqution (28) can be used to compute the importance of the node in the output layer when it has one output. For the neural network with more than one output, the following equation is used: The normalized importance of the node in the hidden layer with respect to node in the output layer is given as follows: The total importance of the all nodes in the hidden layer with respect to the same node is given by the following equation: The importance of each node in the hidden layer with respect to all of the nodes in the output layer can be calculated as given by the following equation: Similar to (29), with respect to node in the hidden layer, the normalized importance of the node in the input layer can be defined as follows: With respect to the hidden layer, the overall importance of node is given by the following equation: Correspondingly, the overall importance of the input node with respect to the output node is given by the following equation:  For this study, the structure of the MLP with the backpropagation algorithm chosen as the training algorithm ( Figure 6) was selected to be 13 × 29 × 2. This neural network consists of three layers, where the first is the input layer, the second is the hidden layer, and the third is the output layer. Each neuron of input layer represents one input factor connected with the input layer. The number of the hidden layer neurons chosen is 29 in this research work. In addition, the output layer has two neurons representing landslides and no landslides. The back-propagation algorithm connects the three layers of the MLP to minimize the error between the predicted output and the actual output as in (5). This algorithm, learning rate, momentum, and epoch number control the performance of the neural network. The momentum, learning rate, and performance rate were set to 0.9, 0.01, and 0.1, respectively. In addition, the epoch number was set to 1000. If the neural network performance could not reach the mean square error (MSE) of 0.1, the network will stop after 1000 epochs. Therefore, no overfitting occurs during the training.
Thirteen topographic factors were entered into the neural network at the same time. The average of the weights of each factor after 10 times of training for the same data sets was taken. The weight normalization which tells the importance of each factor was done by calculating the mean weight value of each factor after 10 cycles of training and then dividing the values by the minimum mean weight value among the thirteen factors. The factors with highest rating are taken as factors with highest importance.

Factors Rating Using Classification Accuracy.
Rating of the importance factors by using the neural network classification accuracy is being applied for the first time on landslides hazard analysis. It is carried out in the second stage of the training by entering one of the thirteen factors to the neural network at one time and checking the classification  1  2  3  4  5  6  7  8  9  1 This step was repeated 10 times for every single factor and data set. After that, the accuracy of the neural network for every factor was calculated by taking the average of the highest and lowest accuracy from the 10 sets of training data. The accuracy of the neural network for every factor after training has been considered for factor rate normalization. In addition to that, each factor having 70% accuracy and more is considered as a good factor and will be used on the second stage of classification. Otherwise, the factor is considered as not good enough and therefore is ignored. These ignored factors actually affect the overall neural network performance by reducing the accuracy of classification.

Data Preparation and Classification Performance
An effective intelligent system requires a comprehensive data set. Therefore, 137570 pixel data were selected from each factor in this analysis, where 68786 pixels represent landslides and 68786 pixels represent no landslides. The data were normalized to range between 0 and 1 for each of the factors individually based on the following equation: where the pixel( ) is the pixel to be normalized and ( ) is the minimum or the maximum pixels value for every single factor. The intelligent system target (landslides history) is represented by 1 for landslides and 0 for no landslides.
For the intelligence performance, this study employs a 10fold cross validation method to arrange the number of the data for training and testing sets [43]. In this method, the data are partitioned into 10 sized segments or folds. Each fold consists of 13757 pixels, which are divided by half to landslides and no landslides. Ten iterations of training and testing are performed. In each iteration, one part of the data is held out for testing while the remaining 9 parts are used for training. The MLP layers weight and output accuracy were noted.

Results and Discussion
Thirteen factors were involved in this study. They were extracted from the Penang DEM map using (7) to (26). Topographic factor maps were then verified by comparing the extracted maps using different software such as ArcView, IDRISI, and MapInfo. Satisfactory results were achieved. Table 1 shows the weights produced based on Zhou method whereas weights for all topographic factors obtained using classification accuracy are presented in Table 2. Training the data for 10 times using Zhou's method has produced almost similar values of the weight after each training with small differences on values as observed in Table 1. In addition, the standard deviations distribute from 0.006 to 0.027. In the final analysis, the seven factors with high rating are the diagonal length, longitude curvature, cross section curvature, general curvature, surface area, slope angle, and slope aspect. Out of the 13 factors, tangential curvature has a minimum value of 1.0 and the slope angle has a maximum value of 1.42.
The accuracy classification method result (Table 2) gives the details on the accuracy of every single factor for 10 different data sets. The standard deviation distribution shows low values 0.0341 to 0.1829. The ten data sets have produced the weights with small changes among them.  There are no large changes on the factor's importance (weights) on the training data. The factor's importance was determined through the training stage. Diagonal length and tangential curvature were the most and least important factors, respectively, during the training and testing stage.
It is observed that there were 8 factors having more than 70% of classification accuracy. Coincidentally, they are the same factors with the highest rate using Zhou's method. Therefore, they were chosen for further neural network training and testing. At this stage, a combination of the good factors was employed to improve the performance of the neural network. The number of inputs for the neural network was equivalent to the number of important factors while the number of outputs was two, which represents the landslide prone and not landslide prone. Table 3 depicts the classification performance for all of the factors and the eight important factors. The results have shown improvement in the classification where the average classification is 82.00% using all 13 factors and improve to 85.3% when only 8 important factors are used. The factors with low rates are roughness, plan curvature, tangent curvature, cross curvature and general curvature, and they have negative effect on the classification accuracy and can therefore be ignored.

Conclusion
The classification of the landslide can be improved by choosing the suitable factors. Zhou method and classification accuracy method are proven to be efficient in the selection of important factor in this study area. By ignoring the 5 unimportant factors, that is, roughness, plan curvature, tangent curvature, cross curvature and general curvature, the performance of classification has improved and the complexity of the network has been decreased. Both methods have shown a consistent agreement on the eight significant and important factors. These are profile curvature, rugosity, slope aspect, elevation, slope angle, diagonal length, longitude curvature, and surface area. In descending order, the ratings of the important factors for Zhou's method were slope angle, slope aspect, profile curvature, diagonal length, elevation, surface area, rugosity, and longitude curvature. The least significant factor is roughness. In the classification accuracy method, the slope angle has the highest rate and the roughness has the lowest rate. Meanwhile, the rest of the factors were rated in between these two.