Jackknife Estimator of Species Richness with S-PLUS

An estimate of the number of species, S , usually called species richness by ecologists, in an area is one of the basic statistics used to ascertain biological diversity. Traditionally ecologists have used the number of species observed in a sample, S 0 , to estimate S , realizing that S 0 is a lower bound for S . One alternative to S 0 is to use a nonparametric procedure such as jackknife resampling. For species richness, a closed form of the jackknife estimator is available. Typically statistical software contains only the traditional iterative form of the jackknife estimator. The purpose of this article is to propose an S-PLUS function for calculating the noniterative ﬁrst order jackknife estimator of species richness and some associated plots and statistics.


Introduction
Estimating the true number of species in an area, S, usually called species richness by ecologists, is one of the basic statistics used to ascertain biological diversity. To estimate species richness one would naturally consider the observed count of species, S 0 , from a given sample. However, it is clear that S 0 is a lower bound for the true number of species. For S 0 to accurately estimate S the researcher must actually observe every species. If the researcher can only sample a few plots from the area, then S 0 is likely to be smaller than S. Even if a census of the area is done it is likely that some species will be missed because of human error, environmental fluctuations that effect observations, or very small species detection probabilities.

Jackknife estimation
In the late 1970s statisticians and ecologists began to avidly look for alternative procedures for estimating S. The estimators considered included frequentist, Bayesian and nonparametric philosophies, and sampling from finite and infinite populations (Mingoti and Meeden 1992;Bunge and Fitzpatrick 1993).
One alternative, presented by Smith and van Belle (1984), to using S 0 as an estimator of species richness is to use a nonparametric procedure such as jackknife resampling. The jackknife is useful because it is known to reduce bias and, for estimates of species richness, it has a closed form. Another useful characteristic of the jackknife estimator of species richness is that the estimator is based on the presence or absence of a species in a given plot rather than on the abundance of the species. To use the jackknife estimator for species richness, data must be collected at n locations (e.g., plots) in the designated area for which S is to be estimated.
The basic idea behind the first order jackknife estimator of S is to base it on the amount of unique species information that is contained in each observation. Following Smith and van Belle (1984) 1. Remove one of the observations, say, i, where i ∈ {1, 2, ..., n} denotes the labels of the sample units.
2. Compute an estimate of S,Ŝ −i , on all observations excluding i.
5. The first-order jackknife estimator of S is Note that in step 1 two observations could be removed, and in fact, as many as n − 1 observations could be removed to obtain higher order jackknife estimators (Smith and van Belle 1984).
A closed form solution to the jackknife algorithm is available. Here the jackknife estimator depends on the number of unique species in the removed observations (e.g. plots). The closed form of the first order jackknife estimator of species richness, as given by Smith and van Belle (1984), is where S 0 is the observed species count over all plots, r i is the number of species that are found only in plot i, and n is the number of plots. Note that when all species are observed on at least two plots, J n (S) = S 0 because r i = 0 for all i = 1, 2, ..., n. When there is more variability between observations the r i 's and J n (S) become larger.
An estimator of the variance of J n (S) is given by This is a measure of the average deviation of the r i 's from the observed mean of the r i 's. Our S-PLUS function reports the standard error of J n (S), VAR[J n (S)].

Performance of the jackknife estimator
A few researchers have evaluated the performance of J n (S) including Smith and van Belle (1984), Palmer (1990), and Hellmann and Fowler (1999). Smith and van Belle (1984) evaluated J n (S) under the assumption that the abundance of a given species has a Poisson distribution. They showed that J n (S) is less biased than S 0 and that the expected bias approaches zero as the species density (number of species per plot) increases. Palmer (1990) evaluated J n (S) based on samples taken from hardwood stands in North Carolina. A census was taken at 30 locations to obtain the "true" species richness, then samples were taken from plots on the 30 locations. Palmer used the mean deviation to show that J n (S) has less bias than S 0 , and used the mean squared deviation to show that J n (S) has less variability (more precision) than S 0 . Hellmann and Fowler (1999) considered the bias, precision, and accuracy of J n (S) based on samples from five different forested locations in Michigan. Each location contained 160 plots. The five locations had different types of tree growth and ranged from 5 total species to 25 total species. The 160 plots at each location were considered to be the population of plots, and samples of different sizes were taken from each population.
Hellmann and Fowler's results indicated that J n (S) is less biased than S 0 when less than 60% of the "population" is sampled and that J n (S) is typically less precise than S 0 but it is usually more accurate than S 0 . Note that Hellmann and Fowler measured precision by VAR[J n (S)] and measured accuracy by MSE[J n (S)]. They also pointed out that the characteristics of J n (S), as well as S 0 , depend on the sample size.
Note that Palmer (1990) had different results regarding precision than Hellmann and Fowler (1999). This may be because they were looking at different data sets. However, they are not exactly clear about their definitions so it is possible that they observed different results because they were looking at different characteristics or using different estimates of precision.

Algorithm for calculating the jackknife estimate
S-PLUS does contain an iterative jackknife procedure, but as mentioned previously, a closed form jackknife estimator exists for estimating S. The following outlines an algorithm for an S-PLUS function which calculates a noniterative first order jackknife estimate of species richness for each of several sampling periods (e.g., years). The difficult task is in identifying the number of unique species in each plot.
1. The data set should have the following headings: "Period" for identifying the sampling period (e.g. years), "Plot" for each unique sampling location, and "Species" for the actual species observed in each period on each plot.
2. Identify the number of periods and the number of plots in the data set for future use.
3. Create a matrix that contains the number of unique species on each plot for each year.
(a) Create a storage matrix with the number of rows equal to the total number of plots, and with three columns for sampling period, plot, and count.
(b) Identify the plots listed for each period, and compute the total number of species for each period.
(c) Identify the number of species not on plot i and subtract number of species not on plot i from the total number of species. This is the number of unique species on plot i, r i .
(d) Fill the storage matrix with the information obtained in steps (b) and (c) and assign header names: "Period", "Plot", "Count".
4. Create a matrix of jackknife estimates.
(a) Create a storage matrix with the number of rows equal to the number of sampling periods in the data set, and with columns for sampling period, the observed count, the jackknife estimate, the standard error of the jackknife estimate, and the number of plots for the given period.
(b) Identify the plots listed for each year and compute the total number of species for each sampling period.
for each period from the data set created in step 3.
(d) Multiply by the appropriate constants to get the estimate of the first order jackknife and the estimate of its variance.

S-PLUS functions
The basic structure of our S-PLUS function follows the previously stated algorithm with some additional plots and statistics. This function also calculates the 95% standard error of the first order jackknife estimate and calculates a standard normal confidence interval. Note that the use of a normal confidence interval is appropriate if the number of plots is large, say, n ≥ 30. Our function produces a table identifying the number of plots in which unique species occur for each sampling period, a set of notched box plots (McGill, Tukey, and Larsen 1978) of the number of species on each plot for each sampling period, and a dot plot (Cleveland 1984) of the observed counts and first order jackknife estimates.
The following is our S-PLUS code. Note that the function, jack.fun, calls the functions species.boxplot and jackone.plot, which are also displayed here. The functions were coded in S-PLUS 5.1 (Insightful Corporation 1999) for Unix operating systems (see, for example, Krause and Olson (2000)). The function has also run successfully in S-PLUS 6.2 on Windows XP. Note that the function will calculate J n (S) and S 0 in R if the plotting sections are removed.
The function jack.fun has five argurments. The first argument, mydata, should be replaced with the name of the data frame containing the data set you wish to use. An example of -qnorm(1-alpha/2)*sqrt(jackone.variance) index.row <-index.row + 1 } headers <-c("Period", "Number of Plots", "Observed", "Jackknife Estimate", "Standard Error", "Lower Limit", "Upper Limit") dimnames(jackone.est) <-list(NULL, headers) ##create box plot of Species per Plot## if (box.plot) species.boxplot(data) if (est.plot) jackone.plot(jackone.est) return(species.unique, jackone.est) } The following is S-PLUS code for the function species.boxplot which is called by the function jack.fun for creating a variable width notched box plots for the number of species per plot for each year. For an example of species.boxplot output see Figure 1.

Example
We provide an example of species richness estimates based on data collected for the Land Condition Trend Analysis (LCTA) project at Fort Riley, KS. The LCTA project monitors the environment at the fort and collects information on soil, vegetation, birds and mammals. The data for birds has been collected from 1991 through 2002 on approximately 60 plots per year (sampling period). The data include the year, the plot, and the species found on each plot for each year. Table 2 contains a list of the number of plots in which unique species occur. Notice that for our data most plots contain zero unique species. In 1991, only 8 plots contained one unique species and no plots contained 2 or 3 unique species. Table 3 displays the year, the number of  1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 Species Counts per Plot   plots sampled each year, the observed count, S 0 , the jackknife estimate, J n (S), the standard error of J n (S), and the lower and upper limits of the 95% standard normal confidence interval for S based on J n (S). At first glance the standard errors for J n (S) may seem unreasonably small. However, as noted, the numbers of unique species are very small which drives down the variances of J n (S). Figure 1 contains a variable width notched box plots for the number of species per plot for each year. Figure 2 is a dot plot of S 0 and J n (S) for each year.

Summary
We have demonstrated the need for a function which calculates first order jackknife estimates for species richness and how to implement such a function. Note that a function could be written for any order jackknife procedure. However, the calculations for jackknife variance quickly become difficult. Also, based on our experience, the second order jackknife procedure does not give estimates that are much different from the first order jackknife estimates.