Multivariable geostatistics in S: the gstat package☆
Introduction
S is a high-level language for data analysis and graphics. Currently, it has one commercial implementation, S-Plus (S-Plus home page: http://www.insightful.com/), Becker et al., 1988; Chambers, 1998 and an open-source implementation, called “R” (Ihaka and Gentleman, 1996; Bivand, 2000; R home page: http://www.r-project.org/Comprehensive R archive network: http://cran.r-project.org/ and mirrors). Geostatistics (Isaaks and Srivastava, 1989) is not a new subject to the S community, and several S packages or libraries are available. Some of these were developed for teaching purposes, and some have very advanced functionality. Still, all of the currently available S packages lack features that are commonly used in applied geostatistics, notably block kriging, kriging in a local neighbourhood, multivariable variogram modelling, cokriging and cosimulation. This paper introduces that gstat S package, which fills this gap.
Gstat (Pebesma and Wesseling, 1998; gstat home page: http://www.gstat.org/) used to be a stand-alone computer program that provides all these features, but with no graphics capabilities of its own: it has an interactive user interface for variogram modelling, but uses the gnuplot graphics program for visualising variograms. The gstat stand-alone program works well with several GIS systems, as it can read and write point and/or grid map data to and from more than 20 GIS formats. Graphical user interfaces that use gstat as a back-end have been developed within PCRaster, Idrisi32 and ArcGIS environments.
The S (R/S-Plus) environment has much to offer for multivariable geostatistical analysis. The Trellis/Lattice graphics functions allow visualising high-dimensional data by creating structured, composite graphs. The gstat S package now offers the major geostatistical functionality of the gstat stand-alone program to S users, provides new functions for fast modelling of arbitrarily many cross and direct variograms, and provides a number of useful functions for plotting spatial point data, multiple grid maps, and multivariable or directional variograms. In the following, “gstat” will refer to “the gstat S package”.
Section snippets
DSC2003 and spatial statistics in S
During the conference distributed statistical computing 2003 (DSC2003) held in Vienna on March 19–22, 2003, a 1-day workshop and three paper sessions were devoted to spatial statistics, and the handling of spatial data in S environments, R in particular. The overview given by Bivand (2003) shows that at least six other R packages deal with geostatistics; three packages deal with point pattern analysis; one package deals with lattice (polygon) data and 10 packages with interfacing R to GIS
Multivariable geostatistics
Multivariable geostatistics involves the simultaneous prediction (or simulation) of multiple variables based on single or multiple predictors, as well as the modelling of all necessary direct and cross variograms. This section is meant to introduce notation for the multivariable geostatistical model as implemented in gstat, as briefly as possible, but necessary for the explanation of the functionality of the gstat package for S. Further theory is also found in various papers and text books,
The S environment
S is a functional language: functions are called with data and specifications as the function arguments. The gstat S package provides a set of functions, most of which are listed in Table 1. These functions consist of about 1000 lines of S code, and for a part they hide calls to the underlying 40,000 lines of C code in the gstat package. Data in S are typically stored in data frames, tables which contain in each column one variable, of categorical (factor) or numerical mode.
Examples
In the following,
Relation to other geostatistics packages
Ripley (2001) gives a short overview of available R packages for spatial statistics. Geostatistics packages on CRAN (R home page: http://www.r-project.org/ Comprehensive R archive network: http://cran.r-project.org/ and mirrors) include spatial, sgeostat, geoR/geoRglm (geoRhome page: http://www.est.ufpr.br/geoR/), fields and RandomFields. Most of these packages provide variogram modelling, trend surface analysis and/or universal kriging. None of them provides kriging in a local neighbourhood,
Code availability
For R, the gstat package can be installed from CRAN (R home page: http://www.r-project.org/Comprehensive R archive network: http://cran.r-project.org/ and mirrors), which means that a single mouse click on Windows version or a single command for Unix versions is sufficient to install the package on computers with an internet connection. For S-Plus, the gstat library is available in binary form for Windows versions of S-Plus, and in source code form for Unix/Linux versions of S-Plus from the
Conclusions
The gstat package provides a robust and flexible suite of univariable or multivariable geostatistical methods. From the following five items:
- •
One-, two- or three-dimensional,
- •
point, regular block, or irregular block,
- •
univariable, multiple (uncorrelated), or multivariable (correlated) cokriging,
- •
(co)kriging, unconditional or conditional (co)simulation,
- •
using a global or a local neighbourhood,
S visualisation
One major reason why S is a suitable environment for doing multivariable geostatistics with gstat is its graphics capabilities. The gstat package gratefully uses the Trellis/Lattice functions to visualise its results, notably
- •
xyplot for visualising directional variograms and multivariable (direct and cross) variograms (e.g. Fig. 2), and to visualise spatial data and cross-validation residuals;
- •
levelplot for visualising (multiple) grid maps, using the aspect argument to make them geographically
Acknowledgements
The development of the gstat S package was supported financially by the Dutch National Institute for Coastal and Marine Management (RIKZ); Richard Duin (RIKZ) played a very stimulating role in the work presented here. Roger Bivand not only cotriggered the writing of the package presented here, but also organ ised the workshop and three sessions on spatial statistics during the DSC2003 conference (DSC2003: http://www.ci.tuwien.ac.at/Conferences/DSC-2003/). One anonymous reviewer provided
References (26)
Using the R statistical data analysis language on GRASS 5.0 GIS data base files
Computers & Geosciences
(2000)- et al.
Gstat, a program for geostatistical modelling, prediction and simulation
Computers & Geosciences
(1998) - et al.
Kriging with inequality constraints
Mathematical Geology
(2001) - et al.
The New S Language
(1988) - Bivand, R.S., 2003. Approaches to classes for spatial data in R. In: Hornik, K., Leisch, F. (Eds.), Proceedings of the...
- et al.
Principles of Geographical Information Systems
(1998) Programming with Data
(1998)- et al.
Statistical Models in S
(1992) - Cressie, N.A.C., 1993. Statistics for Spatial Data, Revised Edn. Wiley, New York. 900...
- et al.
Model-based geostatistics
Applied Statistics
(1998)
Geostatistics for Natural Resources Evaluation
Joint sequential simulation of multiGaussian fields
A generalized linear model approach to spatial data analysis and prediction
Journal of Agricultural, Biological and Environmental Statistics
Cited by (2200)
LUR modeling of long-term average hourly concentrations of NO<inf>2</inf> using hyperlocal mobile monitoring data
2024, Science of the Total EnvironmentAnalysis and modelling of profiles to understand fractionation processes for contaminations with polychlorinated biphenyls observed in fish
2024, Science of the Total EnvironmentNonlinear blind source separation exploiting spatial nonstationarity
2024, Information SciencesModelling of potential vegetation identifies diverging expectable outcomes of river floodplain widening
2024, Global Ecology and Conservation
- ☆
Code available from server at http://www.iamg.org/CGEditor/index.htm.