zoo : S3 Infrastructure for Regular and Irregular Time Series

zoo is an R package providing an S3 class with methods for indexed totally ordered observations, such as discrete irregular time series. Its key design goals are independence of a particular index/time/date class and consistency with base R and the "ts" class for regular time series. This paper describes how these are achieved within zoo and provides several illustrations of the available methods for "zoo" objects which include plotting, merging and binding, several mathematical operations, extracting and replacing data and index, coercion and NA handling. A subclass "zooreg" embeds regular time series into the "zoo" framework and thus bridges the gap between regular and irregular time series classes in R .


Introduction
The R system for statistical computing (R Development Core Team 2005, http://www.R-project.org/)ships with a class for regularly spaced time series, "ts" in package stats, but has no native class for irregularly spaced time series.With the increased interest in computational finance with R over the last years several implementations of classes for irregular time series emerged which are aimed particularly at finance applications.These include the S3 classes "timeSeries" in package fCalendar from the Rmetrics bundle (Wuertz 2005) and "irts" in package tseries (Trapletti 2005) and the S4 class "its" in package its (Heywood 2004).With these packages available, why would anybody want yet another package providing infrastructure for irregular time series?The above mentioned implementations have in common that they are restricted to a particular class for the time scale: the former implementation comes with its own time class "timeDate" built on top of the "POSIXt" classes available in base R whereas the latter two use "POSIXct" directly.And this was the starting point for the zoo project: the first author of the present paper needed more general support for ordered observations, independent of a particular index class, for the package strucchange (Zeileis, Leisch, Hornik, and Kleiber 2002).Hence, the package was called zoo which stands for Z's ordered observations.Since the first release, a major part of the additions to zoo were provided by the second author of this paper, so that the name of the package does not really reflect the authorship anymore.Nevertheless, independence of a particular index class remained the most important design goal.While the package evolved to its current status, a second key design goal became more and more clear: to provide methods to standard generic functions for the "zoo" class that are similar to those for the "ts" class (and base R in general) such that the usage of zoo is very intuitive because few additional commands have to be learned.This paper describes how these design goals are implemented in zoo.The resulting package provides the "zoo" class which offers an extensive (and still growing) set of standard and new methods for working with indexed observations and 'talks' to the classes "ts", "its", "irts" and "timeSeries".It also bridges the gap between regular and irregular time series by providing coercion with (virtually) no loss of information between "ts" and "zoo".With these tools zoo provides the basic infrastructure for working with indexed totally ordered observations and the package can be either employed by users directly or can be a basic ingredient on top of which other more specialized applications can be built.
The remainder of the paper is organized as follows: Section 2 explains how "zoo" objects are created and illustrates how the corresponding methods for plotting, merging and binding, several mathematical operations, extracting and replacing data and index, coercion and NA handling can be used.Section 3 outlines how other packages can build on this basic infrastructure.Section 4 gives a few summarizing remarks and an outlook on future developments.Finally, an appendix provides a reference card that gives an overview of the functionality contained in zoo.

The class "zoo" and its methods
This section describes how "zoo" series can be created and subsequently manipulated, visualized, combined or coerced to other classes.In Section 2.1, the general class "zoo" for totally ordered series is described.Subsequently, in Section 2.2, the subclass "zooreg" for regular "zoo" series, i.e., series which have an index with a specified frequency, is discussed.The methods illustrated in the remainder of the section are mostly the same for both "zoo" and "zooreg" objects and hence do not have to be discussed separately.The few differences in merging and binding are briefly highlighted in Section 2.4.

Creation of "zoo" objects
The simple idea for the creation of "zoo" objects is to have some vector or matrix of observations x which are totally ordered by some index vector.In time series applications, this index is a measure of time but every other numeric, character or even more abstract vector that provides a total ordering of the observations is also suitable.Objects of class "zoo" are created by the function zoo(x, order.by)where x is the vector or matrix of observations 1 and order.by is the index by which the observations should be ordered.It has to be of the same length as NROW(x), i.e., either the same length as x for vectors or the same number of rows for matrices. 2The "zoo" object created is essentially the vector/matrix as before but has an additional "index" attribute in which the index is stored. 3Both the observations in the vector/matrix x and the index order.bycan, in principle, be of arbitrary classes.However, most of the following methods (plotting, aggregating, mathematical operations) for "zoo" objects are typically only useful for numeric observations x.Special effort in the design was put into independence from a particular class for the index vector.In zoo, it is assumed that combination c(), querying the length(), value matching MATCH(), subsetting [,, and, of course, ordering ORDER() work when applied to the index.In addition, an as.character() method might improve printed output 4 and as.numeric() could be used for computing distances between indexes, e.g., in interpolation.Both methods are not necessary for working with "zoo" objects but could be used if available.All these methods are available, e.g., for standard numeric and character vectors and for vectors of classes "Date", "POSIXct" or "times" from package chron, but not for the class "dateTime" in fCalendar.In the last case, the solution is to provide methods for the above mentioned functions so that indexing "zoo" objects with "dateTime" vectors works (see Section 3.3 for an example).To achieve this independence of the index class, new generic functions for ordering (ORDER()) and value matching (MATCH()) are introduced as the corresponding base functions order() and match() are non-generic.The default methods simply call the corresponding base functions, i.e., no new method needs to be introduced for a particular index class if the non-generic functions order() and match() work for this class.
To illustrate the usage of zoo(), we first load the package and set the random seed to make the examples in this paper exactly reproducible.
2 The only case where this restriction is not imposed is for zero-length vectors, i.e., vectors that only have an index but no data.
3 There is some limited support for indexed factors available in which case the "zoo" object also has an attribute "oclass" with the original class of x.This feature is still under development and might change in future versions.
4 If an as.character() method is already defined, but gives not the desired output for printing, then an index2char() method can be defined.This is a generic convenience function used for creating character representations of the index vector and it defaults to using as.character().

Creation of "zooreg" objects
Strictly regular series are such series of observations where the distances between the indexes of every two adjacent observations are the same (i.e., in time series applications, the time differences are identical).Such series can also be described by their frequency, i.e., the reciprocal value of the distance between two observations.As "zoo" can be used to store series with arbitrary type of index, it can, of course, also be used to store series with regular indexes.So why should this case be given special attention, in particular as there is already the "ts" class devoted entirely to regular series?There are two reasons: First, to be able to convert back and forth between "ts" and "zoo", the frequency of a certain series needs to be stored on the "zoo" side.Second, "ts" is limited to strictly regular series and the regularity is lost if some internal observations are omitted.Series that can be created by omitting some internal observations from strictly regular series will in the following be refered to as being (weakly) regular.Therefore, a class that bridges the gap between irregular and strictly regular series is needed and "zooreg" fills this gap.Objects of class "zooreg" inherit from class "zoo" but have an additional attribute "frequency" in which the frequency of the series is stored.Therefore, they can be employed to represent both strictly and weakly regular series.
To create a "zooreg" object, either the command zoo() can be used or the command zooreg().
zoo(x, order.by,frequency) zooreg(data, start, end, frequency, deltat, ts.eps, order.by) If zoo() is called as in the previous section but with an additional frequency argument, it is checked whether frequency complies with the index order.by:if it does an object of class "zooreg" inheriting from "zoo" is returned.The command zooreg() takes mostly the same arguments as ts(). 6In both cases, the index class is more restricted than in the plain "zoo" case.The index must be of a class which can be coerced to "numeric" (for checking its regularity) and when converted to numeric the index must be expressable as multiples of 1/frequency.Furthermore, adding/substracting a numeric to/from an observation of the index class, should return the correct value of the index class again, i.e., group generic functions Ops should be defined. 7 The following calls yield equivalent series
zooreg() can also deal with non-numeric indexes provided that adding "numeric" observations to the index class preserves the class and does not coerce to "numeric".To check whether a certain series is (strictly) regular, the new generic function is.regular(x, strict = FALSE) can be used: This function (and also the frequency, deltat and cycle) also work for "zoo" objects if the regularity can still be inferred from the data: Of course, inferring the underlying regularity is not always reliable and it is safer to store a regular series as a "zooreg" object if it is intended to be a regular series.
If a weakly regular series is coerced to "ts" the missing observations are filled with NAs (see also Section 2.8).For strictly regular series with numeric index, the class can be switched between "zoo" and "ts" without loss of information.

Plotting
The plot method for "zoo" objects, in particular for multivariate "zoo" series, is based on the corresponding method for (multivariate) regular time series.It relies on plot and lines methods being available for the index class which can plot the index against the observations.By default the plot method creates a panel for each series

R> plot(Z)
but can also display all series in a single panel R> plot(Z, plot.type= "single", col = 2:4) In both cases additional graphical parameters like color col, plotting character pch and line type lty can be expanded to the number of series.But the plot method for "zoo" objects offers some more flexibility in specification of graphical parameters as in R> plot(Z, type = "b", lty = 1:3, pch = list(Aa = 1:5, Bb = 2, Cc = 4), + col = list(Bb = 2, 4)) The argument lty behaves as before and sets every series in another line type.The pch argument is a named list that assigns to each series a different vector of plotting characters each of which is expanded to the number of observations.Such a list does not necessarily have to include the names of all series, but can also specify a subset.For the remaining series the default parameter is then used which can again be changed: e.g., in the above example the col argument is set to display the series "Bb" in red and all remaining series in blue.
The results of the multiple panel plots are depicted in Figure 2 and the single panel plot in Figure 1.

Merging and binding
As for many rectangular data formats in R, there are both methods for combining the rows and columns of "zoo" objects respectively.For the rbind method the number of columns of the combined objects has to be identical and the indexes may not overlap.The c method simply calls rbind and hence behaves in the same way.
The cbind method by default combines the columns by the union of the indexes and fills the created gaps by NAs.
In fact, the cbind method is synonymous with the merge method8 except that the latter provides additional arguments which allow for combining the columns by the intersection of the indexes using the argument all = FALSE R> merge(z1, z2, all = FALSE) z1 z2 2004-01-05 0.74675994 -0.04149429 2004-01-19 -0.29823529 -0.52575918 2004-02-12 0.22170438 -0.62733473 Additionally, the filling pattern can be changed in merge, the naming of the columns can be modified and the return class of the result can be specified.In the case of merging of objects with different index classes, R gives a warning and tries to coerce the indexes.Merging objects with different index classes is generally discouraged-if it is used nevertheless, it is the responsibility of the user to ensure that the result is as intended.If at least one of the merged/binded objects was a "zooreg" object, then merge tries to return a "zooreg" object.This is done by assessing whether there is a common maximal frequency and by checking whether the resulting index is still (weakly) regular.
If non-"zoo" objects are included in merging, then merge gives plain vectors/factors/matrices the index of the first argument (if it is of the same length).Scalars are always added for the full index without missing values.
R> merge(z1, pi, 1:10) Another function which performs operations along a subset of indexes is aggregate, which is discussed in this section although it does not combine several objects.Using the aggregate method, "zoo" objects are split into subsets along a coarser index grid, summary statistics are computed for each and then the reduced object is returned.In the following example, first a function is set up which returns for a given "Date" value the corresponding first of the month.This function is then used to compute the coarser grid for the aggregate call: in the first example, the grouping is computed explicitely by firstofmonth(index(Z)) and the mean of the observations in the month is returned-in the second example, only the function that computes the grouping (when applied to index(Z)) is supplied and the first observation is used for aggregation.

Mathematical operations
To allow for standard mathematical operations among "zoo" objects, zoo extends group generic functions Ops.These perform the operations only for the intersection of the indexes 9 of the objects.As an example, the summation and logical comparison with < of z1 and z2 yield Additionally, methods for transposing t of "zoo" objects-which coerces to a matrix beforeand computing cumulative quantities such as cumsum, cumprod, cummin, cummax which are all applied column wise.

R> cumsum(Z)
Aa Bb Cc zoo: S3 Infrastructure for Regular and Irregular Time Series The data stored in "zoo" objects can be extracted by coredata which strips off all "zoo"specific attributes and it can be replaced using coredata<-.Both are new generic functions10 with methods for "zoo" objects as illustrated in the following example.
As the interpretation of the index as "time" in time series applications is natural, there are also synonymous methods time and time<-.Hence, the commands index(z2) and time(z2) return equivalent results.

NA handling
Four methods for dealing with NAs (missing observations) in the observations are applicable to "zoo" objects: na.omit, na.contiguous, na.approx and na.locf.na.omit-or its default method to be more precise-returns a "zoo" object with incomplete observations removed.na.contiguous extracts the longest consecutive stretch of non-missing values.Furthermore, new generic functions na.approx and na.locf and corresponding default methods are introduced in zoo.The former replaces NAs by linear interpolation (using the function approx) and the name of the latter stands for last observation carried forward.It replaces missing observations by the most recent non-NA prior to it.Leading NAs, which cannot be replaced by previous observations, are removed in both functions by default.R> na.approx(z1) -01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12 9.000000 7.714286 7.000000 6.000000 5.000000 6.000000 7.111111 2004-02-16 2004-02-20 8.000000 9.000000 As the above example illustrates, na.approx uses by default the underlying time scale for interpolation.This can be changed, e.g., to an equidistant spacing, by setting the second argument of na.approx.

Rolling functions
A typical task to be performed on ordered observations is to evaluate some function, e.g., computing the mean, in a window of observations that is moved over the full sample period.The resulting statistics are usually synonymously referred to as rolling/running/moving statistics.
In zoo, the generic function rapply is provided along with a "zoo" and a "ts" method.The most important arguments are rapply(data, width, FUN) where the function FUN is applied to a rolling window of size width of the observations data.
The function rapply currently only evaluates the function for windows of full size width, hence the result has width -1 fewer observations than the original series.But it can be determined whether the 'lost' observations should be padded with NAs and whether the result should be left-or right-aligned or centered (default) with respect to the original index.

Combining zoo with other packages
The main purpose of the package zoo is to provide basic infrastructure for working with indexed totally ordered observations that can be either employed by users directly or can be a basic ingredient on top of which other packages can build.The latter is illustrated with a few brief examples involving the packages strucchange, tseries and fCalendar in this section.Finally, the classes "yearmon" and "yearqtr" (provided in zoo) are used for illustrating how zoo can be extended by creating a new index class.

strucchange: Empirical fluctuation processes
The package strucchange provides a collection of methods for testing, monitoring and dating structural changes, in particular in linear regression models.Tests for structural change assess whether the parameters of a model remain constant over an ordering with respect to a specified variable, usually time.To adequately store and visualize empirical fluctuation processes which capture instabilities over this ordering, a data type for indexed ordered observations is required.This was the motivation for starting the zoo project.
A simple example for the need of "zoo" objects in strucchange which can not be (easily) implemented by other irregular time series classes available in R is described in the following.We assess the constancy of the electrical resistance over the apparent juice content of kiwi fruits. 13The data set fruitohms is contained in the DAAG package (Maindonald and Braun zoo: S3 Infrastructure for Regular and Irregular Time Series 2004).The fitted ocus object contains the OLS-based CUSUM process for the mean of the electrical resistance (variable ohms) indexed by the juice content (variable juice).This OLS-based CUSUM process can be visualized using the plot method for "gefp" objects which builds on the "zoo" method and yields in this case the plot in Figure 3 showing the process which crosses its 5% critical value and thus signals a significant decrease in the mean electrical resistance over the juice content.For more information on the package strucchange and the function gefp see Zeileis et al. (2002) and Zeileis (2004).

tseries: Historical financial data
A typical application for irregular time series which became increasingly important over the last years in computational statistics and finance is daily (or higher frequency) financial data.The package tseries provides the function get.hist.quotefor obtaining historical financial data by querying Yahoo!Finance at http://finance.yahoo.com/,an online portal quoting data provided by Reuters.The following code queries the quotes of Lucent Technologies starting from 2001-01-01 until 2004-09-30: R> library(tseries) R> LU <-get.hist.quote(instrument= "LU", start = "2001-01-01", + end = "2004-09-30", origin = "1970-01-01") time series starts 2001-01-02 In the returned LU object the irregular data is stored by extending it in a regular grid and filling the gaps with NAs.The time is stored in days starting from an origin, in this case specified to be 1970-01-01, the origin used by the Date class.This series can be transformed easily into an irregular "zoo" series using a "Date" index.The log-difference returns for Lucent Technologies are depicted in Figure 4.
R> LU <-as.zoo(LU)R> index(LU) <-as.Date(index(LU)) R> LU <-na.omit(LU) 3.3.fCalendar: Indexes of class "timeDate" Although the methods in zoo work out of the box for many index classes, it might be necessary for some index classes to provide c, length, ORDER and MATCH methods such that the methods in zoo work properly.An example for such an index class which requires a bit more attention is "timeDate" from the fCalendar package.
R> library(fCalendar) R> z2td <-zoo(coredata(z2), timeDate(index(z2), FinCenter = "GMT")) R> z2td Figure 4: Log-difference returns for Lucent Technologies has a similar but slightly different focus: it describes how new index classes can be created addressing a certain type of indexes.These classes are "yearmon" and "yearqtr" (already contained in zoo) which provide indexes for monthly and quarterly data respectively.As the code is virtually identical for both classes-except that one has the frequency 12 and the other 4-we will mainly discuss "yearmon" explicitly.
Of course, monthly data can simply be stored using a numeric index just as the class "ts" does.The problem is that this does not have the meta-information attached that this is really specifying monthly data which is in "yearmon" simply added by a class attribute.Hence, the class creator is simply defined as yearmon <-function(x) structure(floor(12*x + .0001)/12,class = "yearmon") which is very similar to the as.yearmon coercion functions provided.
As "yearmon" data is now explicitly declared to describe monthly data, this can be exploited for coercion to other time classes: either to coarser time scales such as "yearqtr" or to finer time scales such as "Date", "POSIXct" or "POSIXlt" which by default associate the first day within a month with a "yearmon" observation.Adding a format and as.character method produces human readable character representations of "yearmon" data and Ops and MATCH methods complete the methods needed for conveniently working with monthly data in zoo.
Note, that all of these methods are very simple and rather obvious (as can be seen in the zoo sources), but prove very helpful in the following examples.
First, we create a regular series zr3 with "yearmon" index which leads to improved printing compared to the regular series zr1 and zr2 from Section 2.2.The index can easily be transformed to "Date", the default being the first day of the month but which can also be changed to the last day of the month.R> as.Date(index(zr3))

Summary and outlook
The package zoo provides an S3 class and methods for indexed totally ordered observations, such as both regular and irregular time series.Its key design goals are independence of a particular index class and compatibility with standard generics similar to the behaviour of the corresponding "ts" methods.This paper describes how these are implemented in zoo and illustrates the usage of the methods for plotting, merging and binding, several mathematical operations, extracting and replacing data and index, coercion and NA handling.
An indexed object of class "zoo" can be thought of as data plus index where the data are essentially vectors or matrices and the index can be a vector of (in principle) arbitrary class.For (weakly) regular "zooreg" series, a "frequency" attribute is stored in addition.Therefore, objects of classes "ts", "its", "irts" and "timeSeries" can easily be transformed into "zoo" objects-the reverse transformation is also possible provided that the index fulfills the restrictions of the respective class.Hence, the "zoo" class can also be used as the basis for other classes of indexed observations and more specific functionality can be built on top of it.Furthermore, it bridges the gap between irregular and regular series, facilitating operations such as NA handling compared to "ts".
Whereas a lot of effort was put into achieving independence of a particular index class, the types of data that can be indexed with "zoo" are currently limited to vectors and matrices, typically containing numeric values.Although, there is some limited support available for indexed factors, one important direction for future development of zoo is to add better support for other objects that can also naturally be indexed including specifically factors, data frames and lists.

A. Reference card
Creation zoo(x, order.by)creation of a "zoo" object from the observations x (a vector or a matrix) and an index order.byby which the observations are ordered.
For computations on arbitrary index classes, methods to the following generic functions are assumed to work: combining c(), querying length length(), subsetting [, ordering ORDER() and value matching MATCH().For pretty printing an as.character and/or index2char method might be helpful.
Creation of regular series zoo(x, order.by,freq) works as above but creates a "zooreg" object which inherits from "zoo" if the frequency freq complies with the index order.by.An as.numeric method has to be available for the index class.zooreg(x, start, end, freq) creates a "zooreg" series with a numeric index as above and has (almost) the same interface as ts().
Standard methods plot plotting lines adding a "zoo" series to a plot print printing summary summarizing (column-wise) str displaying structure of "zoo" objects head, tail head and tail of "zoo" objects Coercion as.zoo coercion to "zoo" is available for objects of class "ts", "its", "irts" (plus a default method).as.class.zoocoercion from "zoo" to other classes.Currently available for class in "matrix", "vector", "data.frame","list", "irts", "its" and "ts".is.zoo querying wether an object is of class "zoo" Merging and binding merge union, intersection, left join, right join along indexes cbind column binding along the intersection of the index c, rbind combining/row binding (indexes may not overlap) aggregate compute summary statistics along a coarser grid of indexes R> zooreg(1:5, start = as.Date("2005-01-01"))

Figure 3 :
Figure 3: Empirical M-fluctuation process for fruitohms data classes "yearmon" and "yearqtr": Roll your own index One of the strengths of the zoo package is its independence of the index class, such that the index can be easily customized.The previous section already explained how an existing class ("timeDate") can be used as the index if the necessary methods are created.This section zoo: S3 Infrastructure for Regular and Irregular Time Series R> plot(diff(log(LU)))