Using and comparing two nonparametric methods (CART and MARS) to model the potential distribution of gullies
Introduction
Gully erosion is one of the most important soil degradation phenomena in several environments around the world and has long been neglected because it is difficult to study and to predict (Valentin et al., 2005). Nevertheless, the negative on-site and off-site effects of gullying are well known: soil loss, reduction of soil water retention (Moeyersons, 2000, Esteves and Lapetite, 2003), landscape dissection (hampering the movement of vehicles, farm machinery and animals), water contamination and sedimentation of channels and reservoirs (Poesen et al., 2003).
Most soil erosion models focus on rill and interill erosion at the hillslope scale and cannot be applied to gully erosion. The few ones, which have been developed for gully erosion (CREAMS; Knisel, 1980; EGEM; Merkel et al., 1988; GLEAMS; Knisel, 1993; WEPP; Flanagan and Nearing, 1995, Sidorchuk, 1999) deal with the estimation of erosion rates or with the prediction of gully growth (Poesen et al., 1998). However, it is necessary to determine the areas in the landscape with risk of gully formation in order to properly design soil conservation plans and strategies (Millington, 1986). Several studies have tried to model the location of gullies in the landscape based on the concept of topographical thresholds (Patton and Schumm, 1975). However, gully erosion usually presents a multi-causal genesis and these models exclusively based on topographical thresholds are difficult to apply with success at the regional scale.
Numerous techniques exist which are able to generate models to predict the potential distribution of a phenomenon from a set of independent variables: such as Logistic Multiple Regression (LMR), Generalized Additive Model (GAM), Classification And Regression Trees (CART), Artificial Neural Networks (ANN), Multivariate Adaptive Regression Splines (MARS), etc. In gully erosion research, it is not common to use these techniques. Meyers and Martínez-Casasnovas (1999) used a logistic modelling approach to predict areas at the field scale with a high probability of gully erosion. Later, Hughes et al. (2001) used CART to construct gully density maps for large areas of Australia. Martínez-Casasnovas et al. (2003) used a logistic regression model to assess sidewall gully erosion in large gullies. Recently, two studies used CART to model the location of gullies in Lebanon (Bou Kheir et al., 2007) and in tropical areas of Mexico (Geissen et al., 2007).
In this line, Gómez Gutiérrez et al. (in press) are developing a multivariable statistical model using Multivariate Adaptive Regression Splines (MARS; Friedman, 1991) to predict the potential distribution of gullies in rangelands of southwest Spain. The main objective of this paper is to compare the results of this previous work with another statistical model based on Classification And Regression Trees (CART; Breiman et al., 1984) algorithms.
Further objectives of this work are to analyze the role of prevalence (different proportions of presence/absence data in the dataset) in the gully erosion model and to evaluate the importance of the different factors involved. Finally, results are implemented and mapped with the help of a Geographical Information System (GIS) by using the results of the Area Under the ROC Curve (AUC) validation to ponder the models.
Among the set of available techniques, CART and MARS were selected because non-parametric approaches (such as ANN, CART and MARS) usually produce better results when modelling very complex phenomenon, particularly MARS (Moisen and Frescino, 2002, Yang et al., 2004). In addition, CART and MARS seem to be faster and easier to interpret than ANN (De Veaux et al., 1993).
Section snippets
Study area
The study was carried out in 54 farms, representative of dehesas and pasturelands in the Iberian Peninsula (Fig. 1). Dehesas commonly have a savannah-like vegetation structure and agrosylvopastoral land use, with farm sizes in excess of 100 ha. Several studies have contributed with data highlighting gully and sheet erosion as the most important processes of soil degradation in these environments (Schnabel and Gómez Amelia, 1993, Schnabel, 1997, Schnabel et al., 1999, Gómez Gutiérrez et al., 2008a
CART and MARS
Classification And Regression Trees (CART; Breiman et al., 1984) is a popular data mining technique based on recursive binary partitioning. The result of CART (1) is a hierarchical binary tree which subdivides the prediction space into regions (Rm) where the values of the response variable are similar (≅am):
The principal inconveniences of CART are the possible complexity of the resulting model, the hierarchical dependence between nodes at different levels and the difficulties that
CART
The result of CART was a tree with 30 non-terminal nodes and 30 terminal nodes. From the 32 independent variables CART used only 12 to generate the optimal model. This model showed an AUC of 0.97 for the training dataset. The values of the AUC for the validation datasets were quite different; while Guadalperalón presented a high value (0.96), the AUC of the Monroy dataset was only 0.66. It is difficult to explain this difference in AUC values since the Guadalperalón and Monroy datasets present
Conclusions
Both, CART and MARS presented large and complex models with a large number of independent variables, showing the complexity of the gullying processes. The two models presented an acceptable to good performance, with AUC from 0.66 to 0.98. MARS produced better results for the two validation datasets. Some differences between the two models were found regarding the importance of the variables. Variables considered by MARS as the most important (lithology and vegetation structure) were located in
Acknowledgements
This work has been supported by the Junta de Extremadura and European Union funded project Montado/Dehesa (INTERREG IIIA, SP4.R13) and by the Spanish Ministry of Science and Innovation (CGL2004-04919-C02-02).
References (66)
- et al.
Hydrological behaviour of a small catchment in the dehesa land use system (Extremadura, SW Spain)
Journal of Hydrology
(1998) - et al.
A comparison of two nonparametric estimation schemes: MARS and neural networks
Computers & Chemical Engineering
(1993) - et al.
Importance of slope gradient and contributing area for optimal prediction of the initiation and trajectory of ephemeral gullies
Catena
(1999) - et al.
An objective method to rank the importance of the factors predisposing to landslides with the GIS methodology: application to an area of the Apennines (Valnerina; Perugia Italy)
Engineering Geology
(2002) - et al.
A multi-scale approach of runoff generation in a Sahelian gully catchment: a case study in Niger
Catena
(2003) - et al.
Superficial and subterranean soil erosion in Tabasco, tropical Mexico: development of a decision tree modeling approach
Geoderma
(2007) - et al.
Effects of sample size on the accuracy of geomorphological models
Geomorphology
(2008) - et al.
Threshold criteria for conversion of probability of species presence to either- or presence–absence
Acta Oecologica
(2007) The topographic thresholds of hillslope incisions in southwestern Rwanda
Catena
(2003)- et al.
Comparing five modelling techniques for predicting forest characteristics
Ecological Modelling
(2002)
Threshold conditions for initiation of valley-side gullies in the Middle veld of Swaziland
Catena
Recent gully erosion in El Cautivo badlands (Tabernas SE Spain)
Catena
Gully erosion and environmental change: importance and research needs
Catena
Dynamic and static models of gully erosion
Catena
Gully erosion: impacts, factors and control
Catena
Geomorphic threshold conditions for ephemeral gully incision
Geomorphology
Topographical threshold for ephemeral gully initiation in intensively cultivated areas of the Mediterranean
Catena
Validation and evaluation of predictive models in hazard assessment and risk management
Natural Hazards
Hawth's Analysis Tools For ArcGis
Use of terrain variables for mapping gully erosion susceptibility in Lebanon
Earth Surface Processes and Landforms
Classification and Regression Trees
Smoothing noisy data with spline functions
Numerische Mathematik
Analysis of Topographic Factors for Predicting Ephemeral Gully Erosion
Receiver operating characteristic laboratory (ROCLAB): software for developing decision strategies that account for uncertainty
Mapa de vegetación y recursos forestales de Extremadura a partir del Mapa Forestal de España
A review of methods for the assessment of prediction errors in conservation presence/absence models
Environmental Conservation
Multivariate adaptive regression splines
Annals of Statistics
Badland systems in the Mediterranean
Estudio del acarcavamiento en explotaciones adehesadas
Gestión ambiental y económica del ecosistema dehesa en la Península Ibérica
Análisis del acarcavamiento y su relación con el uso del suelo en una pequeña cuenca en el SO de España
Gully erosion and land use during the last 60 years in a small rangeland catchment in southwest Spain
Cited by (107)
A data driven gully head susceptibility map of Africa at 30 m resolution
2023, Environmental ResearchAssessment of the gully erosion susceptibility using three hybrid models in one small watershed on the Loess Plateau
2022, Soil and Tillage Research