spacetime : Spatio-Temporal Data in R

This document describes classes and methods designed to deal with diﬀerent types of spatio-temporal data in R implemented in the R package spacetime , and provides examples for analyzing them. It builds upon the classes and methods for spatial data from package sp , and for time series data from package xts . The goal is to cover a number of useful representations for spatio-temporal sensor data, and results from predicting (spatial and/or temporal interpolation or smoothing), aggregating, or subsetting them, and to represent trajectories. The goals of this paper are to explore how spatio-temporal data can be sensibly represented in classes, and to ﬁnd out which analysis and visualisation methods are useful and feasible. We discuss the time series convention of representing time intervals by their starting time only. This vignette is the main reference for the R package spacetime ; it has been published as Pebesma (2012), but is kept up-to-date with the software.


Introduction
Spatio-temporal data are abundant, and easily obtained.Examples are satellite images of parts of the earth, temperature readings for a number of nearby stations, election results for voting districts and a number of consecutive elections, trajectories for people or animals possibly with additional sensor readings, disease outbreaks or volcano eruptions.Schabenberger and Gotway (2004) argue that analysis of spatio-temporal data often happens conditionally, meaning that either first the spatial aspect is analysed, after which the temporal aspects are analysed, or reversed, but not in a joint, integral modelling approach, where space and time are not separated.As a possible reason they mention the lack of good software, Alternatively, they may be stored in different, related tables, which is more typical for relational data bases, or in tree structures which is typical for xml files.We will now illustrate the different single-table formats with simple examples.

Time-wide format
Spatio-temporal data for which each location has data for each time can be provided in two so-called wide formats.An example where a single column refers to a single moment or period in time is found in the North Carolina Sudden Infant Death Syndrome (sids) data set (?) available from package sf, which is in the time-wide format: R> if (require(foreign, quietly = TRUE) && require(sf, quietly = TRUE)) + read.dbf(system.file("shape/nc.dbf",package="sf")) [1:5,c(5,9:14)] where columns refer to a particular time: SID74 contains to the infant death syndrome cases for each county at a particular time period (1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984).

Space-wide format
The Irish wind data (Haslett and Raftery 1989) available from package gstat (Pebesma 2004), for which the first six records and 9 of the stations (abbreviated by are in space-wide format: each column refers to another wind measurement location, and the rows reflect a single time period; wind was reported as daily average wind speed in knots (1 knot = 0.5418 m/s).

Long format
Finally, panel data are shown in long form, where the full spatio-temporal information is held in a single column, and other columns denote location and time.In the Produc data set (Baltagi 2001), a panel of 48 observations from 1970 to 1986 available from package plm (Croissant and Millo 2008) where the first two columns denote space and time (the default assumption for package plm), and e.g., pcap reflects private capital stock.
None of these examples has strongly referenced spatial or temporal information: it is from the data alone not clear that the number 1970 refers to a year, or that ALABAMA refers to a state, and where this state is.Section 7 shows for each of these three cases how the data can be converted into classes with strongly referenced space and time information.

Space-time layouts
In the following we will use the word spatial feature (Herring 2011) to denote a spatial entity.This can be a particular spatial point (location), a line or set of lines, a polygon or set of polygons, or a pixel (grid or raster cell).For a particular feature, one or more measurements are registered at particular moments in time.
Four layouts of space-time data will be discussed next.Two of them reflect lattice layouts, one that is efficient when a particular spatial feature has data values for more than one time point, and one that is most efficient when all spatial feature have data values at each time point.Two others reflect irregular layouts, one of which specializes to trajectories (moving objects).

Spatio-temporal full grids
A full space-time grid of observations for spatial features (points, lines, polygons, grid cells) 1 s i , i = 1, ..., n and observation time t j , j = 1, ..., m is obtained when the full set of n × m set of observations z k is stored, with k = 1, ..., nm.We choose to cycle spatial features first, so observation k corresponds to feature s i , i = ((k − 1) % n) + 1 and with time moment t j , STS: sparse grid layout

STT: trajectory
Figure 1: Four space-time layouts: (i) the top-left: full grid (STF) layout stores all space-time combinations; (ii) top-right: the sparse grid (STS) layout stores only the non-missing spacetime combinations on a lattice; (iii) bottom-left: the irregular (STI) layout: each observation has its spatial feature and time stamp stored, in this example, spatial feature 1 is stored twice -the fact that observations 1 and 4 have the same feature is not registered; (iv) bottom right: simple trajectories (STT), plotted against a common time axis.It should be noted that in both gridded layouts the grid is in space-time, meaning that spatial features can be gridded, but can also be any other non-gridded type (lines, points, polygons).j = ((k − 1)/n) + 1, with / integer division and % integer division remainder (modulo).The t j are assumed to be in time order.
In this data class (top left in Figure 1), for each spatial feature, the same temporal sequence of data is sampled.Alternatively one could say that for each moment in time, the same set of spatial entities is sampled.Unsampled combinations of (space, time) are stored in this class, but are assigned a missing value NA.
It should be stressed that for this class (and the next) the word grid in spatio-temporal grid refers to the layout in space-time, not in space.Examples of phenomena that could well be represented by this class are regular (e.g., hourly) measurements of air quality at a spatially irregular set of points (measurement stations), or yearly disease counts for a set of administrative regions.An example where space is gridded as well could be a sequence of rainfall images (e.g., monthly sums), interpolated to a spatially regular grid.

Spatio-temporal sparse grids
A sparse grid has the same general layout, with measurements laid out on a space time grid (top right in Figure 1), but instead of storing the full grid, only non-missing valued observations z k are stored.For each k, an index [i, j] is stored that refers which spatial feature i and time point j the value belongs to.
Storing data this way may be efficient • If full space-time lattices have many missing or trivial values (e.g., when one want to store features or grid cells where fires were registered, discarding those that did not), • If a limited set of spatial features each have different time instances (e.g., to record the times of crime cases for a set of administrative regions), or, • If for a limited set of times the set of spatial features varies (e.g., locations of crimes registered per year, or spatially misaligned remote sensing images).

Spatio-temporal irregular data
Space-time irregular data cover the case where time and space points of measured values have no apparent organisation: for each measured value the spatial feature and time point is stored, as in the long format.This is equivalent to the (maximally) sparse grid where the index for observation k is [k, k], and hence can be dropped.For these objects, n = m equals the number of records.Spatial features and time points need not be unique, but are replicated in case they are not.
Any of the gridded types can be represented by this layout, in principle, but this would have the disadvantages that • Spatial features and time points need to be stored for each data value, and would be redundant, • The regular layout is lost, and needs be retrieved indirectly, • Spatial and temporal selection would be inefficient, as the grid structure forms an index.
Examples of phenomena that are best served by this layout could be spatio-temporal point processes, such as crime or disease cases or forest fires.Other phenomena could be measurements from mobile sensors (in case the trajectory sequence is of no importance).

Interval time, moving objects, trajectories
In their book "moving objects databases", Güting and Schneider (2005) distinguish 10 different data types in space-time.In particular, they define for point features2 .
a Sets of events without temporal duration (time is an instant), e.g., accidents, lightning, birth, death; b Sets of events with a temporal duration but no movement, e.g., a tree, a (point in the) capital of a country, people's home address; c (Sets of) moving points, e.g., the trajectories of one or more persons, or birds.
To accomodate this typology we distinguish three cases, shown in figure 2: (i) Time is instant and the feature is not moving (it may only exist at a time instant), (ii) Time is interval, objects do not move during this interval, (iii) Time is instant and features move (objects exist between time instants and may move) along a trajectory.
When time reflects intervals, it means that the spatial feature (spatial location or extent of the observation) or its associated data values does not change during this interval, but reflects the value or state during this interval.An examples is the yearly mean temperature of a country or of a set of locations, or the existence (duration) of a nation with a particular layout of its boundaries.
Time instants can reflect the moments of change (e.g., the start of the meteorological summer), or events with a zero or negligible duration (e.g., an earthquake, a lightning).
Movement reflects the fact that moving objects exist and may change location during a time interval.For moving object data, time instants reflect the location at a particular moment, and movement occurs between registered (time, feature) pairs, and must be continuous.
Trajectories cover the case where sets of (irregular) space-time points form sequences, and depict a trajectory.Their grouping may be simple (e.g., the trajectories of two persons on different days), nested (for several objects, a set of trajectories representing different trips) or more complex (e.g., with objects that split, merge, or disappear).
Examples of trajectories can be human trajectories, mobile sensor measurements (where the sequence is kept, e.g., to derive the speed and direction of the sensor), or trajectories of tornados where the tornado extent of each time stamp can be reflected by a different polygon.The different layouts, or types, of spatio-temporal data discussed in Section 3 have been implemented in the spacetime R package, along with methods for import, export, coercion, selection, and visualisation.

Classes
The classes for the different layouts are shown in Figure 3. Similar to the classes of package sp (Pebesma and Bivand 2005;Bivand, Pebesma, and Gomez-Rubio 2008), the classes all derive from a base class ST which is not meant to represent actual data.The first order derived classes specify particular spatio-temporal geometries (i.e., only the spatial and temporal information), the second order derived classes augment each of these with actual data, in the form of a data.frame.An overview of the different time classes in R is found in Ripley and Hornik (2001).Further advice on which classes to use is found in Grothendieck and Petzoldt (2004), or in the CRAN task view on time series analysis.
For spatial interpolation, we used the classes deriving from Spatial in package sp (Pebesma and Bivand 2005;Bivand et al. 2008) because • They are the dominant set of classes in R for dealing with spatial data, • They are interfaced to key external libraries, and, • They provide a single interface to dealing with points, lines, polygons and grids.
We do not use xts or Spatial objects to store spatio-temporal data values, but we use data.frame to store data values.For purely temporal information the xts objects can be used, and for purely spatial information the sp objects can be used.These will be recycled appropriately when coercing to a long format data.frame.
The spatial features supported by package sp are two-dimensional for lines and polygons, but may be higher (three-) dimensional for spatial points, pixels and grids.

Methods
The main methods for spatio-temporal data implemented in packages spacetime are listed in

Creation
Construction of spatio-temporal objects essentially needs specification of the spatial, the temporal, and the data values.The documentation of stConstruct contains examples of how this can be done from long, space-wide, and time-wide tables, or from shapefiles.A simple toy example for a full grid layout with three spatial points and four time instances is given below.First, the spatial object is created: R> sp = cbind(x = c(0,0,1), y = c(0,1,1)) R> row.names(sp)= paste("point", 1:nrow(sp), sep="") R> library(sp) R> sp = SpatialPoints(sp) Then, the time points are defined as four time stamps, one hour apart, starting Aug 5 2010, 10:00 GMT.
When given a long table, stConstruct creates an STFDF object if all space and time combinations occur only once, or else an object of class STIDF, which might be coerced into other representations.

Overlay and aggregation
Aggregation of data values to a coarser spatial or temporal form (e.g., to a coarser grid, aggregating points over administrative regions, aggregating daily data to monthly data, or aggregation along an irregular set of space-time points) can be done using the method aggregate.
To obtain the required aggregation predicate, i.e., the grouping of observations in space-time, the method over is implemented for objects deriving from ST. Grouping can be done based on spatial, temporal, or spatio-temporal predicates.It takes care of the case whether time reflects time instances or time intervals (see section 6.1).These methods effectively provide a spatio-temporal equivalent to what is known in geographic information science as the spatial overlay.

Space and time selection with [
The idea behind the [ method for classes in sp was that objects would behave as much as possible similar to matrix or data.frameobjects.For a data.frame, the expression a[i,j] selects row(s) i and column(s) j.For objects deriving from Spatial, rows were taken as the spatial features (points, lines, polygons, pixels) and columns as the data variables4 .
For the spatio-temporal data classes described here, a[i,j,k] selects spatial features i, temporal instances j, and data variable(s) k.Unless drop=FALSE is added to such a call, selecting a single time or single feature results in an object that is no longer spatio-temporal, but either snapshot of a particular moment, or history at a particular feature (Galton 2004).
Similar to selection on spatial objects in sp and time series objects in xts, space and time indices can be defined by index or boolean vectors, but by specifying spatial areas and time periods.For instance, the selection R> air_quality[2:3, 1:10, "PM10"] yields air quality data for the second and third spatial features, and the first 10 time instances.The expressions R> air_quality [Germany, "2008::2009", "PM10"] with Germany a Spatial object (e.g., a SpatialPolygons) defining Germany, selects the PM10 measurements for the years 2008-9, lying in Germany.
For trajectory objects of class STT or STTDF, selection is slightly different: it is assumed that trajectories are being as complete.An expression obj[1:3] will select the first three full trajectories, obj[Germany, "2008::2009", "Temp"] selects the temperature attribute for all trajectories that intersect with Germany and fall at least partly in 2008-9.

Coercion to long and wide tables
Spatio-temporal data objects can be coerced to the corresponding purely spatial objects.

Graphs of spatio-temporal data
5.1.stplot: panels, space-time plots, animation The stplot method can create a few specialized plot types for the classes in the spacetime package.They are: multi-panel plots In this form, for each time step (selected) a map is plotted in a separate panel, and the strip above the panel indicates what the panel is about.The panels share x-and y-axis, no space needs to be lost by separating white space, and a common legend is used.Three types are implemented for STFDF data: • The x and y axis denote space, an example for gridded data is shown in Figure 4, for polygon data in Figure 9.The stplot is a wrapper around spplot in package sp, and inherits most of its options, • The x and y axis denote time and value; one panel for each spatial feature, colors may indicate different variables (mode="tp"); see Figure 5 (left), • The x and y axis denote time and value; one panel for each variable, colors may denote different features (mode="ts"); see Figure 5 (right).
For both cases with time is on the y-axis (Figure 5), values over time for different variables or features are connected with lines, as is usual with time series plots.This can be changed to symbols by specifying type='p'.
space-time plots Space-time plots show data in a space-time cross-section, with e.g., space on the x-axis and time on the y-axis.(See also Figure 1.) Hovmöller diagrams (Hovmöller 1949) are an example of these for full space-time lattices, i.e., objects of class STFDF.To obtain such a plot, the arguments mode and scaleX should be considered; some special care is needed when only the x-or y-axis needs to be plotted instead of the spatial index (1...n); details are found in the stplot documentation.An example of a Hovmöller-style plot with station index along the x-axis and time along the y-axis is obtained by  Space-time interpolations of wind (square root transformed, detrended) over Ireland using a separable product covariance model, for 10 time points regularly distributed over the month for which daily data was considered (April, 1961).animated plots Animation is another way of displaying change over time; a sequence of spplots, one for each time step, is looped over when the parameter animate is set to a positive value (indicating the time in seconds to pause between subsequent plots).
Time series plots Time series plots are a fairly common type of plot in R. Package xts has a plot method that allows univariate time series to be plotted.Many (if not most) plot routines in R support time to be along the x-or y-axis.The plot in Figure 7 was generated by using package lattice (Sarkar 2008), and uses a colour palette from package RColorBrewer (Neuwirth 2011).

Time periods or time instances
Most data structures for time series data in R have, explicitly or implicitly, for each record a time stamp, not a time interval.The implicit assumption seems to be (i) the time stamp is a moment, (ii) this indicates either the real moment of measurement / registration, or the start of the interval over which something is aggregated (summed, averaged, maximized).For financial "Open, high, low, close" data, the "Open" and "Close" refer to the values at the moment the stock exchange opens and closes, meaning time instances, whereas "high" and "low" are aggregated values -the minimum and maximum price over the time interval between opening and closing times.and this syntax lets one define, unambiguously, yearly, monthly, daily, hourly or minute intervals, but not e.g.10or 30-minute intervals.For a particular interval, the full specification is needed: When matching two sequences of time (Figure 8) in order to overlay or aggregate, it matters whether each of the sequences reflect instances, one of them reflects time intervals and the other instances, or both reflect time intervals.All of these cases are accommodated for.
Objects in spacetime register both (start) time and end time.By default, objects with gridded space-time layout (Figure 1) of class or deriving from STF or STS assume interval time, and STI and STT objects assume instance time.
When no end times are supplied by creation and time intervals are assumed, the assumption is that time intervals are consecutive (Figure 2), and the last interval (for which no end time is present) has a length identical to the second last interval (Figures 2 and 8).

Spatial support
All examples above work with spatial points, i.e., data having a point support.The assumption of data having points support is implicit for SpatialPoints features.For polygons, the assumption will be that values reflect aggregates (e.g., sums, or averages) over the polygon.
For gridded data, it is ambiguous whether the value at the grid cell centre is meant (e.g., for DEM data) or an aggregate over the grid cell (typical for remote sensing imagery).The Spatial* objects of package sp have no explicit information about the spatial support.

Worked examples
This section shows how existing data in various formats can be converted into ST classes, and how they can be analysed and/or visualised.

North Carolina SIDS
As an example, the North Carolina Sudden Infant Death Syndrome (sids) data will be used.These data were first analysed by Symons, Grimson, and Yuan (1983), and first published and analysed in a spatial setting by Cressie and Chan (1989).

Panel data
The panel data discussed in Section 2 are imported as a full spatio-temporal data.frame(STFDF), and linked to the proper state polygons of maps.We can obtain the states polygons from package map (Brownrigg and Minka 2011) (Produc.st[,,"unemp"],yrs, col.regions = brewer.pal(9,"YlOrRd"),cuts=9) produces the plot shown in Figure 9.
Time and state were not removed from the data table on construction; printing these data after coercion to data.frame can then be used to verify that time and state were matched correctly.
The routines in package plm can be used on the data, when back transformed to a data.frame,when index is used to specify which variables represent space and time (the first two columns from the data.frameno longer contain state and year).For instance, to fit a panel linear model for gross state products (gsp) to private capital stock (pcap), public capital (pc), labor input (emp) and unemployment rate (unemp), we get R> if (require(plm, quietly = TRUE)) + zz <-plm(log(gsp) ~log(pcap) + log(pc) + log(emp) + unemp, + data = as.data.frame(Produc.st),index = c("state", "year")) where the output of summary(zz) is left out for brevity.More details are found in Croissant and Millo (2008) and Millo and Piras (2012).

Interpolating Irish wind
This worked example is a modified version of the analysis presented in demo(wind) of package gstat (Pebesma 2004).This demo is rather lengthy and reproduces much of the original analysis in Haslett and Raftery (1989).Here, we will reduce the material and focus on the use of spatio-temporal classes.

Country shapes in cshapes
The cshapes (Weidmann, Kuse, and Gleditsch 2010) package contains a GIS dataset of country boundaries , and includes functions for data extraction and the computation of distance matrices.The data set consist of a SpatialPolygonsDataFrame, with the following data variables:
In the following fragment, we create the spatio-temporal object using begin-and end-times:

Further material
Searching past email discussion threads on the r-sig-geo (R Special Interest Group on using GEOgraphical data and Mapping) email list may be a good way to look for further material, before one considers posting questions.Search strings, e.g., on the google search engine may look like: spacetime site:stat.ethz.chwhere the search keywords should be made more precise.
The excellent book Statistics for spatio-temporal data (Cressie and Wikle 2011) provides a large number of methods for the analysis of mainly geostatistical data.A demo script, which can be run by

R> library(spacetime) R> demo(CressieWikle)
downloads the data from the book web site, and reproduces a number of graphs shown in the book.It should be noted that the the book examples only deal with STFDF objects.Section 7.3 contains an example of a spatial interpolation with a spatio-temporal separable or product-sum covariance model.The functions for this are found in package gstat, and more information is found through R> if (require(gstat, quietly = TRUE)) { + vignette("st") + } An example where (potentially large) data sets are proxied through R objects is given in a vignette in the spacetime package, obtained by

R> library(spacetime) R> vignette("stpg")
A proxy object is an object that contains no data, but only references to tables in a data base.Selections on this object are translated into SQL statements that return the actually selected data.This way, the complete data set does not have to be loaded in memory (R), but can be processed part by part.Selection in the data base uses indexes on the spatial and temporal references.
Examples of overlay and aggregation methods for spatio-temporal data are further detailed in a separate vignette, obtained by

R> library(spacetime) R> vignette("sto")
It illustrates the methods with daily air quality data taken from the European air quality data base, for 1998-2009.Aggregations are temporal, spatial, or both.

Discussion
Handling and analyzing spatio-temporal data is often complicated by the size and complexity of these data.Also, data may come in many different forms, they may be time-rich, space-rich, and come as sets of space-time points or as trajectories.
Building on existing infrastructure for spatial and temporal data, we have successfully implemented a coherent set of classes for spatio-temporal data that covers regular space-time layouts, partially regular (sparse) space-time layouts, irregular space-time layouts and trajectory data.The set is flexible in the sense that several representations of space (points, lines, polygons, grid) and time (POSIXt, Date, timeDate, yearmon, yearqtr) can be combined.
We have given examples for constructing objects of these classes from various data sources, coercing them from one to another, exporting them to spatial or temporal representations, as well as visualising them in various forms.We have also shown how one can go from one form into another by ways of prediction based on a statistical model, using an example on spatiotemporal geostatistical interpolation.In addition to spatio-temporally varying information, objects of the classes can contain data values that are purely spatial or purely temporal.Selection can be done based on spatial features, time (intervals), or data variables, and follows a logic similar to that for selection on data tables (data.frames).

Challenges that remain include
• The representation of spatio-temporal polygons in a consistent way, i.e., such that each point in space-time refers to one and only one space-time feature, • Dealing with complex developments, such as merging, splitting, and death and birth of objects (further examples are found in Galton (2004)), • Explicitly registering the support, or footprint of spatio-temporal data, • Annotating objects such that incorrect operations (such as the interpolation of a point process, or the weighted density estimates on a geostatistical process) can lead to warning or error messages, • Making handling of massive data sets easier, and implementing efficient spatio-temporal indexes for them, • Integrating package spacetime with other packages dealing with specific spatio-temporal classes such as raster and surveillance.
The classes and methods presented in this paper are a first attempt to cover a number of useful cases for spatio-temporal data.In a set of case studies it is demonstrated how they can be used, and can be useful.As software development is often opportunistic, we admittedly picked a lot of low hanging fruits, and a number of large challenges remain.We hope that these first steps will help discovering and identifying these more complex use cases.

Figure 2 :
Figure 2: Time instant (top left), object movement (top right), time interval with consecutive (bottom left) or arbitrary (bottom right) intervals.s 1 refers to the first feature/location, t 1 to the first time instance or interval, thick lines indicate time intervals, arrows indicate movement.Filled circles denote start time, empty circles end times, intervals are right-closed.

Figure 3 :
Figure 3: Classes for spatio-temporal data in package spacetime.Arrows denote inheritance, lower side of boxes list slot name and type, green lines indicate supported coercions (both ways).
Figure 4:Space-time interpolations of wind (square root transformed, detrended) over Ireland using a separable product covariance model, for 10 time points regularly distributed over the month for which daily data was considered(April, 1961).
Figure 5:Time series for four variables and four features plotted with stplot, with mode="tp" (left) and mode="ts" (right); see also Section 7.2.

Figure 7 :Figure 8 :
Figure 7: Time series plot of daily wind speed at 12 stations, used for interpolation in Figure 4.

Figure 11 :
Figure11: EOFs of space-time interpolations of wind over Ireland (for spatial reference, see Figure4), for the 10 time points at which daily data was chosen above(April, 1961).
, the first five records and nine columns are

table 1 .
Their usage is illustrated in examples that follow.

Table 1 :
Methods for spatio-temporal data in package spacetime.
When combining all this information, we do not need to reorder states because states and Produc order states alphabetically.We need to de-select District of Columbia, which is not present in Produc table (record 8): the resulting object is of the appropriate subclass of Spatial in the spatial form, or of class xts in the temporal form.
are lists of bursts, sets of sequential, connected space-time points at which an object is registered.An example ltraj data set is obtained by 5 :