Elsevier

Ecological Modelling

Volume 220, Issue 24, 24 December 2009, Pages 3630-3637
Ecological Modelling

Using and comparing two nonparametric methods (CART and MARS) to model the potential distribution of gullies

https://doi.org/10.1016/j.ecolmodel.2009.06.020Get rights and content

Abstract

Gully erosion represents an important soil degradation process in rangelands. In order to take preventive or control measures and to reduce its environmental damages and economical costs it is useful to localize the points in the landscape where gullying takes place and to determine the importance of the different factors involved. The study is carried out in Extremadura, southwest Spain. The main objectives of this work are: (a) comparing two nonparametric schemes to model the potential distribution of gullies, (b) evaluating the importance of the different factors involved in gullying processes, (c) analyzing the role of prevalence in the success of the model and finally, (d) implementing and mapping the results with the help of a Geographical Information System (GIS). Two methods were used to model the response of a dependent variable (gullying) from a set of independent variables: Classification And Regression Trees (CART) and Multivariate Adaptive Regression Splines (MARS). Three different datasets were used; the first one for constructing the model (training dataset) and the others for validating the model (external datasets). These datasets are formed by a target variable (presence or absence of gullies) and a set of independent variables. The dependent variable was obtained by mapping the locations of gullies with the help of a GPS and high resolution aerial ortophotographs. A set of 32 independent variables reflecting topography, lithology, soil type, climate, land use and vegetation cover of each area were used. The performance of the models was evaluated using a non-dependent threshold method: the Receiver Operating Characteristic (ROC) curve. The results showed a better performance of MARS for predicting gullying with areas under the ROC curve of 0.98 and 0.97 for the validation datasets, while CART presented values of 0.96 and 0.66.

Introduction

Gully erosion is one of the most important soil degradation phenomena in several environments around the world and has long been neglected because it is difficult to study and to predict (Valentin et al., 2005). Nevertheless, the negative on-site and off-site effects of gullying are well known: soil loss, reduction of soil water retention (Moeyersons, 2000, Esteves and Lapetite, 2003), landscape dissection (hampering the movement of vehicles, farm machinery and animals), water contamination and sedimentation of channels and reservoirs (Poesen et al., 2003).

Most soil erosion models focus on rill and interill erosion at the hillslope scale and cannot be applied to gully erosion. The few ones, which have been developed for gully erosion (CREAMS; Knisel, 1980; EGEM; Merkel et al., 1988; GLEAMS; Knisel, 1993; WEPP; Flanagan and Nearing, 1995, Sidorchuk, 1999) deal with the estimation of erosion rates or with the prediction of gully growth (Poesen et al., 1998). However, it is necessary to determine the areas in the landscape with risk of gully formation in order to properly design soil conservation plans and strategies (Millington, 1986). Several studies have tried to model the location of gullies in the landscape based on the concept of topographical thresholds (Patton and Schumm, 1975). However, gully erosion usually presents a multi-causal genesis and these models exclusively based on topographical thresholds are difficult to apply with success at the regional scale.

Numerous techniques exist which are able to generate models to predict the potential distribution of a phenomenon from a set of independent variables: such as Logistic Multiple Regression (LMR), Generalized Additive Model (GAM), Classification And Regression Trees (CART), Artificial Neural Networks (ANN), Multivariate Adaptive Regression Splines (MARS), etc. In gully erosion research, it is not common to use these techniques. Meyers and Martínez-Casasnovas (1999) used a logistic modelling approach to predict areas at the field scale with a high probability of gully erosion. Later, Hughes et al. (2001) used CART to construct gully density maps for large areas of Australia. Martínez-Casasnovas et al. (2003) used a logistic regression model to assess sidewall gully erosion in large gullies. Recently, two studies used CART to model the location of gullies in Lebanon (Bou Kheir et al., 2007) and in tropical areas of Mexico (Geissen et al., 2007).

In this line, Gómez Gutiérrez et al. (in press) are developing a multivariable statistical model using Multivariate Adaptive Regression Splines (MARS; Friedman, 1991) to predict the potential distribution of gullies in rangelands of southwest Spain. The main objective of this paper is to compare the results of this previous work with another statistical model based on Classification And Regression Trees (CART; Breiman et al., 1984) algorithms.

Further objectives of this work are to analyze the role of prevalence (different proportions of presence/absence data in the dataset) in the gully erosion model and to evaluate the importance of the different factors involved. Finally, results are implemented and mapped with the help of a Geographical Information System (GIS) by using the results of the Area Under the ROC Curve (AUC) validation to ponder the models.

Among the set of available techniques, CART and MARS were selected because non-parametric approaches (such as ANN, CART and MARS) usually produce better results when modelling very complex phenomenon, particularly MARS (Moisen and Frescino, 2002, Yang et al., 2004). In addition, CART and MARS seem to be faster and easier to interpret than ANN (De Veaux et al., 1993).

Section snippets

Study area

The study was carried out in 54 farms, representative of dehesas and pasturelands in the Iberian Peninsula (Fig. 1). Dehesas commonly have a savannah-like vegetation structure and agrosylvopastoral land use, with farm sizes in excess of 100 ha. Several studies have contributed with data highlighting gully and sheet erosion as the most important processes of soil degradation in these environments (Schnabel and Gómez Amelia, 1993, Schnabel, 1997, Schnabel et al., 1999, Gómez Gutiérrez et al., 2008a

CART and MARS

Classification And Regression Trees (CART; Breiman et al., 1984) is a popular data mining technique based on recursive binary partitioning. The result of CART (1) is a hierarchical binary tree which subdivides the prediction space into regions (Rm) where the values of the response variable are similar (≅am):f(x)=am;xRm

The principal inconveniences of CART are the possible complexity of the resulting model, the hierarchical dependence between nodes at different levels and the difficulties that

CART

The result of CART was a tree with 30 non-terminal nodes and 30 terminal nodes. From the 32 independent variables CART used only 12 to generate the optimal model. This model showed an AUC of 0.97 for the training dataset. The values of the AUC for the validation datasets were quite different; while Guadalperalón presented a high value (0.96), the AUC of the Monroy dataset was only 0.66. It is difficult to explain this difference in AUC values since the Guadalperalón and Monroy datasets present

Conclusions

Both, CART and MARS presented large and complex models with a large number of independent variables, showing the complexity of the gullying processes. The two models presented an acceptable to good performance, with AUC from 0.66 to 0.98. MARS produced better results for the two validation datasets. Some differences between the two models were found regarding the importance of the variables. Variables considered by MARS as the most important (lithology and vegetation structure) were located in

Acknowledgements

This work has been supported by the Junta de Extremadura and European Union funded project Montado/Dehesa (INTERREG IIIA, SP4.R13) and by the Spanish Ministry of Science and Innovation (CGL2004-04919-C02-02).

References (66)

  • R.P.C. Morgan et al.

    Threshold conditions for initiation of valley-side gullies in the Middle veld of Swaziland

    Catena

    (2003)
  • P. Nogueras et al.

    Recent gully erosion in El Cautivo badlands (Tabernas SE Spain)

    Catena

    (2000)
  • J. Poesen et al.

    Gully erosion and environmental change: importance and research needs

    Catena

    (2003)
  • A. Sidorchuk

    Dynamic and static models of gully erosion

    Catena

    (1999)
  • C. Valentin et al.

    Gully erosion: impacts, factors and control

    Catena

    (2005)
  • K. Vandaele et al.

    Geomorphic threshold conditions for ephemeral gully incision

    Geomorphology

    (1996)
  • L. Vandekerckhove et al.

    Topographical threshold for ephemeral gully initiation in intensively cultivated areas of the Mediterranean

    Catena

    (1998)
  • S. Beguería

    Validation and evaluation of predictive models in hazard assessment and risk management

    Natural Hazards

    (2006)
  • H.L. Beyer

    Hawth's Analysis Tools For ArcGis

    (2002)
  • R. Bou Kheir et al.

    Use of terrain variables for mapping gully erosion susceptibility in Lebanon

    Earth Surface Processes and Landforms

    (2007)
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • P. Craven et al.

    Smoothing noisy data with spline functions

    Numerische Mathematik

    (1979)
  • L.M. De Santisteban

    Analysis of Topographic Factors for Predicting Ephemeral Gully Erosion

    (2003)
  • J.M. Deleo

    Receiver operating characteristic laboratory (ROCLAB): software for developing decision strategies that account for uncertainty

  • EGMASA

    Mapa de vegetación y recursos forestales de Extremadura a partir del Mapa Forestal de España

  • A.H. Fielding et al.

    A review of methods for the assessment of prediction errors in conservation presence/absence models

    Environmental Conservation

    (1997)
  • Flanagan, D.C., Nearing, M., 1995. USDA-Water Erosion Prediction Project Hillslope Profile and watershed Model...
  • J.H. Friedman

    Multivariate adaptive regression splines

    Annals of Statistics

    (1991)
  • F. Gallart et al.

    Badland systems in the Mediterranean

  • Á. Gómez Gutiérrez et al.

    Estudio del acarcavamiento en explotaciones adehesadas

    Gestión ambiental y económica del ecosistema dehesa en la Península Ibérica

    (2006)
  • Á. Gómez Gutiérrez et al.

    Análisis del acarcavamiento y su relación con el uso del suelo en una pequeña cuenca en el SO de España

  • Á. Gómez Gutiérrez et al.

    Gully erosion and land use during the last 60 years in a small rangeland catchment in southwest Spain

  • Gómez Gutiérrez, Á., Schnabel, S., Felicísimo, A., in press. Modelling the ocurrence of gullies in rangelands of SW...
  • Cited by (107)

    View all citing articles on Scopus
    View full text