Quantifying Topographic Ruggedness Using Principal Component Analysis

(e development of geospatial technologies has opened a new era in terms of data collection techniques and analysis procedures. Digital elevation models as 3D visualization of the Earth’s surface have many mapping and spatial analysis applications. (e primary terrain factors derived from the raster dataset are usually less critical than secondary ones, e.g., ruggedness index, which plays a vital role in engineering, hydrological information derivation, and geomorphological processes. Surface ruggedness is a significant predictor of topographic heterogeneity by calculating the absolute value of elevation differences within a specified neighborhood surrounding a central pixel. (e current study investigates the impacts of various topographic metrics obtained from a digital elevation model on characterizing terrain ruggedness utilizing stepwise principal component analysis. (is popular multivariate statistical technique is applied to conduct a comprehensive assessment and treat the information redundancy of terrain parameters. Simultaneously, the standard deviation of elevation is also proposed as an alternative approach to quantifying topographic ruggedness. Besides, quantitative and qualitative method is espoused to validate the algorithms and compare their capabilities to the previously introduced models in the literature. (e findings have shown that principal component analysis provides superior performance against other models. Furthermore, they indicated that the standard deviation of elevation could be used instead of the available ones.


Introduction
Geomorphometry, which is defined as the science of quantitative Earth's terrain analysis [1], has mainly concentrated on studying terrestrial landscapes and derivation of a large spectrum of environmental variables [2,3]. e essential data source for performing such spatial analysis is digital elevation models [4]. e digital elevation model is a three-dimensional raster representation of the land surface to identify geographical features [5,6] by interpolating the landform using modern geospatial techniques [7][8][9]. It can be established through conventional ground topographic surveys or remote sensing methods [10][11][12][13]. In general, DEM is high-quality spatial data that is applied to extract terrain factors such as slopes, curvatures, aspects, and drainage network [14][15][16][17][18].
Generally, irregularity in elevation makes cultivation difficult and costly to traverse [19,20], but useful for keeping newborn reindeer calves against weather and windchill [21]. Relief ruggedness is a valuable attribute of wildlife habitat models and one of the prime components affecting vegetation diversity, snowmelt patterns, and water drainage that are key elements of nutrient obtainability [22][23][24][25] and the book [26]. Many previous studies have addressed quantifying surface ruggedness, in which some of these investigations were adopted on the statistical dispersion of elevations, slopes, aspects, and vectors orthogonal to planar facets on the landscape. Beasom et al. [27] described an approach for characterizing land surface ruggedness as a function of the total length of contour lines in a study area. More recently, Riley et al. [28] proposed a methodology to determine topographic heterogeneity by calculating the terrain ruggedness index value for each grid cell from the sum elevation changes within a specified neighborhood surrounding a central pixel. Sappington et al. [29] developed and tested a vector ruggedness measure algorithm to estimate landscape ruggedness as the variation in the 3D orientation of grid cells in a moving window by decomposing slope and aspect into 3D vectors. In contrast, Popit and Verbovšek [30] identified the ruggedness index by computing the differences between the lowest and highest elevation for each cell location on the output raster.
In a multivariate statistical analysis, the dataset's dimension plays a critical role in determining the problem's complexity. One of the most common techniques for reducing the data dimensions to increase its interpretability is the principal component analysis that was introduced by Pearson [31]. Presently, PCA is considered a powerful statistical approach in image processing due to its ability to decrease the number of correlated image bands resulting in a smaller dataset capable of representing most of the original variabilities [32]. Over the last few decades, PCA was used in geosciences and remote sensing [33][34][35][36][37]. For instance, Petrişor et al. [38] highlight the potential of integrating geographic information systems and PCA as a decisionsupport tool for underdeveloped countries. Huang et al. [39] concluded that PCA could provide an accurate representation of terrain complexity distribution features compared to k-means clustering. However, this method has not yet been introduced to the process of modeling terrain ruggedness.
As stated before, no optimal measure would provide a powerful tool for defining habitat availability [40,41]. Hence, selecting an appropriate method to evaluate this indicator is an essential task. In contrast, GIS software such as ArcGIS provides a variety of geoprocessing tools, but still has some gaps that must be addressed. e article's main objective is to investigate the variability of quantifying topographic ruggedness characteristics, relying on PCA, a standard deviation of elevation, and traditional indices. Additionally, the spatial tools using ArcGIS Model Builder and Python script will be developed for automating the processes. As the basis for terrain ruggedness analysis, the DEM is generated using synthetic data acquired by a scanned topographic map with varying coefficients to emulate Earth's physical land-surface features. e comprehensive assessment of several terrain metrics extracted from DEM is examined to describe topographic ruggedness using PCA. After that, the surface ruggedness obtained from the PCA and STD methods was compared against the ones from two different literature models by adopting some statistical approaches and visual depictions.

Materials and Methods
Topography relates to the shape and characteristics of the Earth's land surface and is a central controlling element in a wide range of natural processes [42,43]. It must be quantitatively evaluated [44] by building accurate and reliable geospatial data [45]. e terrain analysis is concerned with extracting topographic attributes from DEM raster in GIS as a convenient environment for handling data [46]. In general, DEM quality and resolution have a critical influence on the derived factors [47][48][49]. e analysis scale and chosen approach for determining different terrain variables will also play a role in the final results. Surface ruggedness is a significant geomorphological parameter applied in geoscience and environmental studies for indicating the relative change in elevation between adjacent grid cells [50]. Many approaches have been applied to verify which algorithm is the most appropriate for depicting topographic ruggedness. e principal component analysis was utilized in this study to evaluate the correlations between terrain factors and reduce the dimension of the original parameters to characterize surface ruggedness. At the same time, the standard deviation of elevation was also tested to be another ruggedness model. Figure 1 describes a methodology flowchart adopted in the current study for quantifying relief ruggedness.

Case Study Description.
e chosen case study is the Wadi Musa area (Valley of Moses) as one of Ma'an governorate towns in Jordan. It is characterized by gorgeous natural scenery and a heterogeneous landscape with mountains in the eastern part ( Figure 2). e geographic position of Wadi Musa has situated about 200 km to the south of Amman, between 30°15′ 00″ N and 30°30′ 00″ N and 35°10′ 00″ E and 35°30′ 00″ E, as shown in Figure 3. e elevation of the terrain surface in this region varies from 49 m to 1725 m above the mean sea level (AMSL). is area has several types of features to assess the topographic ruggedness map under different landform conditions. Indeed, characteristics of terrain relief, such as flat, hilly, and mountainous, have a significant influence on the reliability of DEM [51] and extracted terrain factors. e digital elevation model of the site was generated using a topographic map with a scale of 1/25000. e map was scanned and digitized to capture contour lines with an interval of 10 m. After that, the ANUDEM method, which is entailed in ArcGIS 10.2 under the name of Topo to Raster, is applied to derive the DEM model with 10 m accuracy from CLs. is algorithm was developed by Hutchinson [52] to fit the shape and drainage structure of the land surface. e dataset's quality control was performed by superimposing and inspecting the CLs extracted from the built DEM with the original ones. Figure 4 illustrates the flowchart of data acquisition.

Surface Ruggedness.
Geomorphologic analysis as a measure of landforms geometry was broadly applied to understand various natural processes that impact humans and the environment [53,54]. Generally, the terrain characteristics have tremendous leverage over the habitat and socioeconomic activities. e emergence of modern geospatial techniques has brought new ways to represent and quantify the bare-earth topography using grid-based DEMs and develop novel algorithms for extracting geomorphometric attributes [55,56]. Neighborhood tools are common to carry out DEM analysis workflows by computing a specified statistic for a focal cell within a defined moving window that is designated in relation to the central point [57]. In this study, a 3 × 3 sampling window is effectively shifted across the DEM surface to identify terrain 2 Advances in Civil Engineering information, as indicated in Figure 5.
e main reason behind this selection is to limit the effects of the cell size on the results of the investigated models since finer windows require highly accurate DEMs [58].
Terrain ruggedness index is assigned for each raster cell using the algorithm described by Riley et al. [28] by calculating the square root of the sum of the squared elevation differences between the central pixel and the adjacent ones within a specified neighborhood, as presented in equation (1), in which z o is the elevation of the central cell, z is the elevation of each neighbor grid cell to the focal one, and N is the width of the roving window (number of pixels).

Advances in Civil Engineering
Hence, according to Figure 5, equation (2) is written. e source code was developed in Python programming in the context of ArcPy to calculate the TRI algorithm, as indicated in Figure 6.
Vector ruggedness measure provides another procedure to quantify terrain ruggedness as the variation in cells' 3D orientation within a neighborhood, as presented in Figure 7. It was initially developed by Hobson [59] and later espoused by Sappington et al. [29]. e vector ruggedness measure can be calculated from the slope and aspect using equation (3).

Advances in Civil Engineering
where n is the sample size (number of pixels in the moving window) and r is the resultant vector of central cells in a given neighborhood.

Ruggedness Conditioning Parameters.
e terrain metrics shown in Table 1 are tested to evaluate and analyze ruggedness mapping to determine the most influential predictive factors. ese parameters include six topographic features: slope, aspect, total curvature, surface roughness, topographic relief, and topographic position index. e relevant hydrological factors were the stream power index, sediment transport index, and topographic wetness index. Figure 9 demonstrates an automated workflow in ArcGIS Model Builder that uses standard tools to extract morphometric variables from DEM.

Standard Deviation of Elevation.
e standard deviation of elevation is determined by analyzing a DEM at a pixel level via sliding window scanning using equation (5). e standard deviation of elevation is examined in this study as an alternative ruggedness measure.
e key point about choosing this method is to ease calculating its parameters using focal statistics in ArcGIS with a reasonable processing time for a large spatial dataset.
where N is the width of the roving window, z is the mean elevation in the window, and z is cell elevation. Hence, according to Figure 5, it can be written as follows:

Factor
Definition Formula Slope (°) Angle between a tangent plane and a horizontal one at a certain point [56] β � tan −1 ( ������ Aspect (°) Clockwise angle from north to the horizontal projection of an external normal vector at a specific point [56] α � tan −1 (p/q) Ground curvature (m −1 ) Degree of bending variation at a given point [60,61] p and q are the rate of elevation change in x and y direction, respectively, and r and t are curvature radii of point 2.5. Principal Component Analysis. e simplified algorithm for conducted PCA is seen in Figure 10. In general, four main steps are used to perform this method based on terrain factors. e correlation and covariance matrices are computed, and then the principal components are defined, and finally the dataset is shifted and rotated to the new axis. Usually, to improve the performance, avoid the numerical conflicting, and training stability of the ruggedness values, the following equation is used to normalize the obtained results [66].
normalized values � Raster value − Raster min Raster max − Raster min .
2.6. Model's Performance Assessment. As mentioned previously, the obtained ruggedness grids from each method were classified into five different classes. ese classes were defined as level (1), slight (2), moderate (3), high (4), and very high (5). In line with this study's aim, the identified classes from each model were compared by adopting the principle of confusion matrix that is always used for evaluating the performance of classification algorithms in machine learning. Each matrix's row represents the instances in a true class, whereas each column accounts for the predicted class cases. For a specific class in a multiple classes matrix such as the one shown in Figure 11, it can be divided into four main categories to indicate the number of times that the model: (1) true positive (TP) has rightly predicted the positive class, (2) true negative (TN) has precisely estimated the negative one, (3) false positive (FP) has incorrectly predicted the positive class, and (4) false negative (FN) has incorrectly estimated the negative class. us, using this matrix provides an easy and direct way for investigating the matching between two different approaches of the selected ruggedness modeling methods.
In general, matching any two models in this study is evaluated by computing the confusion matrix's overall accuracy. e accuracy is defined as the number of corresponding points in all cases divided by the total number of points selected for the investigation. Furthermore, the models' performance at the class level is assessed using two different approaches: precision and recall, as demonstrated in Figure 12

Results and Discussion
Digital elevation models are fundamental to many applications in Earth sciences such as geomorphology, geology, ecology, and engineering [67][68][69]. Hence, the accuracy of DEM-derived elements must be deemed to reduce the inherent errors. e terrain ruggedness is extracted from DEM to measure local elevation variability within a defined window.
is study's main objective is to carry out a comparative analysis to assess the quality of surface ruggedness generated by the PCA and STD algorithm. is analysis includes representing topographic ruggedness indices and verifying their performance. e standard deviation of elevation was determined using focal statistics over a 3 × 3 cell neighborhood of the DEM dataset. e chosen conditioning parameters ( Figure 13) were normalized, and then the PCA approach was applied to describe surface ruggedness. On the other hand, the geoprocessing workflow in ArcGIS 10.2 software was automated by Python Script and Model Builder tool to calculate TRI and VRM indices, respectively. e terrain variables in Table 1 were all used to develop an initial PCA model and choose the most suitable ones through its correlation matrix and eigenvalues. Indeed, the curvature, TWI, TPI, and aspect have indicated very low correlations than other metrics. Pearson's correlation coefficient of each pair for a selected dataset (sample size 64635 points) and descriptive statistics were computed to investigate the impact of various terrain parameters on the ruggedness indices, as shown in Tables 2 and 3. Hence, the influencing parameters, which highly correlate with the ruggedness index, were selected for building the PCA model. e Pearson correlation coefficient varies between −1 and 1, with values close to the limits reflect high significance. In general, the blue color refers to have a positive correlation in which the ruggedness index is directly proportional to the terrain feature, while the orange color represents having a negative correlation means that the ruggedness index is inversely proportional to the terrain feature. Accordingly, the slope, TR, and SR have revealed the highest correlations; besides, SPI and STI gave a good correlation with different ruggedness methods. Hence, it is vital to use them to generate an accurate PCA map. Although TWI shows a significant relationship with all approaches, it minimized the model's capabilities when utilized in the PCA due to its negative interaction. Additional analysis was conducted to identify the bivariate correlation of factors with the PCA model. Table 3 indicates that the trends in PCA results follow those of the other approaches. In contrast, the correlation coefficients between the examined parameters and VRM are significantly lower than the other models. For instance, the correlation coefficient between the slope and the ruggedness index reduced from about 0.99 to 0.291 when using the VRM Defining dataset, its dimensions, and calculating an ellipse that bounds the points in the scatterplot.     Figure 14 illustrates the correlation plots to measure the degree of agreement between the models and chosen topographic metrics. e x-axis of all diagrams represents the ruggedness index for each of the investigated methods (i.e., PCA, TRI, STD, and VRM) stated on the top, and the y-axis is the topographic parameters. In general, the TRI and STD algorithms display nearly the same, and TRI is highly correlated with the slope and the SR. In contrast, the PCA is related to several variables while there is no correlation between the VRM and comparative factors. Figure 15 represents topographic ruggedness maps generated by TRI, VRM, PCA, and STD algorithms examined in the research area. ese datasets are split into five homogeneous classes (level, slight, moderate, high, and very high) using the Jenks natural breaks optimization [70,71]. Indeed, the smoothest areas represented by shades of a gray tone are entirely related to plain regions. Moreover, the more rugged zones, clarified by cyan color, are fully seen around the mountains' topmost, as illustrated in Figure 16. e graphical representations of the ruggedness index indicate that the texture and colors/tones of the STD model closely match the corresponding TRI one. At the same time, these two surfaces show a slight difference from that of PCA, especially in places with very high ruggedness. Also, the VRM index reveals a considerable variation in the tone color patches from other algorithms. As a result, visual inspection has highlighted the PCA approach as practically the most superior model among the tested ones. Nevertheless, further analysis will be performed to evaluate tested models using statistical methods. e PCA model was developed using selected topographic parameters in the case study, and covariance and correlation matrices are shown in Tables 4 and 5. Furthermore, the eigenvectors and eigenvalues are provided in Tables 6 and 7, respectively. Typically, the first component of the updated PCA cumulative contribution reached about         Figure 17: Confusion matrix for each pair of the investigated models. 82.3%, while it was possible to exceed 95% using the first and second components only. However, the surface ruggedness was created using all the components to reach the best accuracy. e confusion matrix was defined for each pair of methods to evaluate their capabilities and variations, as clarified in Figure 17. It is observed that both TRI and STD are almost identical in which they provided the same zonal classification with an overall accuracy (matching) of 99.7%. is means that the STD model can replace the TRI without any negative influence. e difference between PCA and STD and TRI can be represented by an overall variation of about 20%. e PCA classified more points in the very high category compared to the TRI one that was capable of correctly identifying about 45% of the points only.
is means that the PCA model is more sensitive for rugged areas in comparison with other methods. Pairwise comparison between the PCA and VRM models showed an overall difference of about 70%; besides, the ratio of points classified in the first class by VRM was almost 97% of the testing population. is observation is in line with the previous findings in Figure 14 and means that the VRM method is incapable of detecting moderate, high, and very high roughness areas. Finally, both STD and TRI results indicated about 44% correlation with the ones extracted from the VRM due to having a majority of the points classified in the level class of the VRM model. e following comparisons and postanalysis of the results will omit the VRM model and consider PCA, TRI, and STD ones as most VRM points concentrated in the level category. Figure 18 provides a class-based disparity between the PCA, TRI, and STD results through box plots. In general, it can be seen that the points' distribution in each PCA class takes a larger range as compared to the TRI and STD models that shrink the classes into minimal extents of propagation. Furthermore, it can be noticed that the limits of the level class in the PCA model are equal to that of the level, slight, and moderate of the TRI and STD ones combined.
is means that the PCA method increases the ruggedness index values of the entire area as compared to the TRI and STD results even though their maps look very similar. On the other hand, the datasets' normal distribution in each of the investigated models is shown in Figure 19. It can be observed from the plots that the only significant difference between the PCA and the other algorithm is defined in the first class (level), where the PCA follows the normal distribution perfectly. Also, the spread of the other classes looks pretty similar in all of the indices. In general, Figure 20 shows a strong relationship between the results of PCA and those of TRI and STD.
us, the regression equation (8) can be used to transfer TRI and STD models to the PCA one in Wadi Musa were proposed in this paper. e correlation coefficient of these equations is 0.9925 and 0.9926 in TRI and STD, respectively. ese models can find application in many instances to overcome the need for high computational efforts.

Conclusion
Topographic ruggedness is an essential factor of terrain analysis for many applications in engineering and geomorphology. Several algorithms were introduced in the literature to show the spatial distribution of surface ruggedness but did not necessarily provide an optimal solution. is study presents a substitutional approach to generate terrain ruggedness using PCA and STD algorithms. Firstly, the effects of various DEM-derived topographic metrics on the ruggedness model were investigated using PCA. Secondly, the proposed methods are compared with the most common ones, namely, TRI and VRM. Indeed, the performance analysis of the ruggedness indices was conducted relying on visual inspection and statistical methods. e findings indicated the feasibility of applying PCA to generate an accurate ruggedness map consistent with the actual topographic features. STD model is simple to implement, available in ArcGIS software, and produces nearly identical results to TRI, the more complex method, while VRM showed the lowest performance. According to the statistical analysis, the ruggedness index's most influential variables are slope, TR, and SR. Finally, additional research is required to understand PCA's capabilities for describing terrain characteristics across various landscapes [72] and at different scales.

TRI:
Terrain ruggedness index DEM: Digital elevation model VRM: Vector ruggedness measure PCA: Principal component analysis GIS: Geographic information systems STD: Standard deviation of elevation CLs: Contour lines AMSL: Above mean sea level α: Aspect β: Slope C: Curvature SR: Surface roughness TR: Topographic relief TPI: Topographic position index TWI: Topographic wetness index SPI: Stream power index STI: Sediment transport index.

Data Availability
Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e author declares no conflicts of interest.  Figure 20: Regression models to transfer TRI and STD to PCA method. 18 Advances in Civil Engineering