Elsevier

Computers & Geosciences

Volume 30, Issue 7, August 2004, Pages 683-691
Computers & Geosciences

Multivariable geostatistics in S: the gstat package

https://doi.org/10.1016/j.cageo.2004.03.012Get rights and content

Abstract

This paper discusses advantages and shortcomings of the S environment for multivariable geostatistics, in particular when extended with the gstat package, an extension package for the S environments (R, S-Plus). The gstat S package provides multivariable geostatistical modelling, prediction and simulation, as well as several visualisation functions. In particular, it makes the calculation, simultaneous fitting, and visualisation of a large number of direct and cross (residual) variograms very easy. Gstat was started 10 years ago and was released under the GPL in 1996; gstat.org was started in 1998. Gstat was not initially written for teaching purposes, but for research purposes, emphasising flexibility, scalability and portability. It can deal with a large number of practical issues in geostatistics, including change of support (block kriging), simple/ordinary/universal (co)kriging, fast local neighbourhood selection, flexible trend modelling, variables with different sampling configurations, and efficient simulation of large spatially correlated random fields, indicator kriging and simulation, and (directional) variogram and cross variogram modelling. The formula/models interface of the S language is used to define multivariable geostatistical models. This paper introduces the gstat S package, and discusses a number of design and implementation issues. It also draws attention to a number of papers on integration of spatial statistics software, GIS and the S environment that were presented on the spatial statistics workshop and sessions during the conference Distributed Statistical Computing 2003.

Introduction

S is a high-level language for data analysis and graphics. Currently, it has one commercial implementation, S-Plus (S-Plus home page: http://www.insightful.com/), Becker et al., 1988; Chambers, 1998 and an open-source implementation, called “R” (Ihaka and Gentleman, 1996; Bivand, 2000; R home page: http://www.r-project.org/Comprehensive R archive network: http://cran.r-project.org/ and mirrors). Geostatistics (Isaaks and Srivastava, 1989) is not a new subject to the S community, and several S packages or libraries are available. Some of these were developed for teaching purposes, and some have very advanced functionality. Still, all of the currently available S packages lack features that are commonly used in applied geostatistics, notably block kriging, kriging in a local neighbourhood, multivariable variogram modelling, cokriging and cosimulation. This paper introduces that gstat S package, which fills this gap.

Gstat (Pebesma and Wesseling, 1998; gstat home page: http://www.gstat.org/) used to be a stand-alone computer program that provides all these features, but with no graphics capabilities of its own: it has an interactive user interface for variogram modelling, but uses the gnuplot graphics program for visualising variograms. The gstat stand-alone program works well with several GIS systems, as it can read and write point and/or grid map data to and from more than 20 GIS formats. Graphical user interfaces that use gstat as a back-end have been developed within PCRaster, Idrisi32 and ArcGIS environments.

The S (R/S-Plus) environment has much to offer for multivariable geostatistical analysis. The Trellis/Lattice graphics functions allow visualising high-dimensional data by creating structured, composite graphs. The gstat S package now offers the major geostatistical functionality of the gstat stand-alone program to S users, provides new functions for fast modelling of arbitrarily many cross and direct variograms, and provides a number of useful functions for plotting spatial point data, multiple grid maps, and multivariable or directional variograms. In the following, “gstat” will refer to “the gstat S package”.

Section snippets

DSC2003 and spatial statistics in S

During the conference distributed statistical computing 2003 (DSC2003) held in Vienna on March 19–22, 2003, a 1-day workshop and three paper sessions were devoted to spatial statistics, and the handling of spatial data in S environments, R in particular. The overview given by Bivand (2003) shows that at least six other R packages deal with geostatistics; three packages deal with point pattern analysis; one package deals with lattice (polygon) data and 10 packages with interfacing R to GIS

Multivariable geostatistics

Multivariable geostatistics involves the simultaneous prediction (or simulation) of multiple variables based on single or multiple predictors, as well as the modelling of all necessary direct and cross variograms. This section is meant to introduce notation for the multivariable geostatistical model as implemented in gstat, as briefly as possible, but necessary for the explanation of the functionality of the gstat package for S. Further theory is also found in various papers and text books,

The S environment

S is a functional language: functions are called with data and specifications as the function arguments. The gstat S package provides a set of functions, most of which are listed in Table 1. These functions consist of about 1000 lines of S code, and for a part they hide calls to the underlying 40,000 lines of C code in the gstat package. Data in S are typically stored in data frames, tables which contain in each column one variable, of categorical (factor) or numerical mode.

Examples

In the following,

Relation to other geostatistics packages

Ripley (2001) gives a short overview of available R packages for spatial statistics. Geostatistics packages on CRAN (R home page: http://www.r-project.org/ Comprehensive R archive network: http://cran.r-project.org/ and mirrors) include spatial, sgeostat, geoR/geoRglm (geoRhome page: http://www.est.ufpr.br/geoR/), fields and RandomFields. Most of these packages provide variogram modelling, trend surface analysis and/or universal kriging. None of them provides kriging in a local neighbourhood,

Code availability

For R, the gstat package can be installed from CRAN (R home page: http://www.r-project.org/Comprehensive R archive network: http://cran.r-project.org/ and mirrors), which means that a single mouse click on Windows version or a single command for Unix versions is sufficient to install the package on computers with an internet connection. For S-Plus, the gstat library is available in binary form for Windows versions of S-Plus, and in source code form for Unix/Linux versions of S-Plus from the

Conclusions

The gstat package provides a robust and flexible suite of univariable or multivariable geostatistical methods. From the following five items:

  • One-, two- or three-dimensional,

  • point, regular block, or irregular block,

  • univariable, multiple (uncorrelated), or multivariable (correlated) cokriging,

  • (co)kriging, unconditional or conditional (co)simulation,

  • using a global or a local neighbourhood,

any combination (e.g. three-dimensional universal irregular block cosimulation) can be obtained by the gstat

S visualisation

One major reason why S is a suitable environment for doing multivariable geostatistics with gstat is its graphics capabilities. The gstat package gratefully uses the Trellis/Lattice functions to visualise its results, notably

  • xyplot for visualising directional variograms and multivariable (direct and cross) variograms (e.g. Fig. 2), and to visualise spatial data and cross-validation residuals;

  • levelplot for visualising (multiple) grid maps, using the aspect argument to make them geographically

Acknowledgements

The development of the gstat S package was supported financially by the Dutch National Institute for Coastal and Marine Management (RIKZ); Richard Duin (RIKZ) played a very stimulating role in the work presented here. Roger Bivand not only cotriggered the writing of the package presented here, but also organ ised the workshop and three sessions on spatial statistics during the DSC2003 conference (DSC2003: http://www.ci.tuwien.ac.at/Conferences/DSC-2003/). One anonymous reviewer provided

References (26)

  • R.S Bivand

    Using the R statistical data analysis language on GRASS 5.0 GIS data base files

    Computers & Geosciences

    (2000)
  • E.J Pebesma et al.

    Gstat, a program for geostatistical modelling, prediction and simulation

    Computers & Geosciences

    (1998)
  • P Abrahamsen et al.

    Kriging with inequality constraints

    Mathematical Geology

    (2001)
  • R.A Becker et al.

    The New S Language

    (1988)
  • Bivand, R.S., 2003. Approaches to classes for spatial data in R. In: Hornik, K., Leisch, F. (Eds.), Proceedings of the...
  • P.A Burrough et al.

    Principles of Geographical Information Systems

    (1998)
  • J.M Chambers

    Programming with Data

    (1998)
  • J.M Chambers et al.

    Statistical Models in S

    (1992)
  • Cressie, N.A.C., 1993. Statistics for Spatial Data, Revised Edn. Wiley, New York. 900...
  • P.J Diggle et al.

    Model-based geostatistics

    Applied Statistics

    (1998)
  • P Goovaerts

    Geostatistics for Natural Resources Evaluation

    (1997)
  • J.J Gómez-Hernández et al.

    Joint sequential simulation of multiGaussian fields

  • C.A Gotway et al.

    A generalized linear model approach to spatial data analysis and prediction

    Journal of Agricultural, Biological and Environmental Statistics

    (1997)
  • Cited by (2200)

    View all citing articles on Scopus

    Code available from server at http://www.iamg.org/CGEditor/index.htm.

    View full text