Introducing untb , an R Package For Simulating Ecological Drift Under the Uniﬁed Neutral Theory of Biodiversity

The distribution of abundance amongst species with similar ways of life is a classical problem in ecology. The uniﬁed neutral theory of biodiversity , due to Hubbell, states that observed population dynamics may be explained on the assumption of per capita equivalence amongst individuals. One can thus dispense with diﬀerences between species, and diﬀerences between abundant and rare species: all individuals behave alike in respect of their probabilities of reproducing and death. It is a striking fact that such a parsimonious theory results in a non-trivial dominance-diversity curve (that is, the simultaneous existence of both abundant and rare species) and even more striking that the theory predicts abundance curves that match observations across a wide range of ecologies. Here I introduce the untb package of R routines, for numerical simulation of ecological drift under the uniﬁed neutral theory. A range of visualization, analytical, and simulation tools are provided in the package and these are presented with examples in the paper. This vignette is based on Hankin (2007a). For reasons of performance, some of the more computationally expensive results are pre-loaded. To calculate them from scratch, change “ calc_from_scratch <- TRUE ” to “ calc_from_scratch <- FALSE ” in chunk time_saver .


Island biogeography and biodiversity
In the field of ecological dynamics, one simple system of interest is that of similar species, at the same trophic level, that compete for the same or similar resources.One might choose forest ecology as the canonical example: trees are long-lived, stationary, and easily identified.Determining the statistical properties of such systems is a classical problem in ecology (Bell 2000); the observed range of abundances between apparently similar species does not have a ready explanation.
Many studies involve the tabulation of individuals, collected according to some formal sampling method, and it is a universal finding that the species abundances exhibit a large range: the majority of individuals are accounted for by a relatively small number of species, while the rarest species typically are sufficiently rare to have been sampled only once ('singletons').

The unified neutral theory of biodiversity
One natural null hypothesis which arises in this field is that differences between species may be ascribed to random, species-independent, processes.The origin of such ideas may be traced back to at least the 1960s, in which MacArthur and Wilson (1963) discussed the observation that island faunas tend to become progressively impoverished with distance from the nearest landmass.Their insight was to treat the number of species on an island as the independent variable.Hubbell (2001) considers that MacArthur and Wilson's theory raised the possibility that "chance alone could play a . . .role in structuring ecological communities" (my italics), thus paving the way for later, more explicitly stochastic, theories.
In the context of theoretical studies, and in particular the untb package, it is standard practice to consider a metacommunity of fixed size J M = S i=1 n i individuals where the abundance of the i th species, 1 ⩽ i ⩽ S, is n i .The evolution of the community is then modelled as a Markov chain with one state change corresponding to the death of a single individual, and the simultaneous birth of another individual of the same or a different species.Because the community size is fixed, it is common to refer to an individual being associated with a "site", thus notionally carving the ecosystem resource into discrete units.
The transition matrix may thus be described, following MacArthur andWilson (1963, 1967), in terms of the behaviour of the i th species: with probability p 0 n i (t) + 1 with probability p +1 . (1) The three probabilities (p −1 , p 0 , p +1 ) may be functions of the species i, and the abundances n i of each species present.A theory is species neutral if it assumes that the probabilities are independent of i.Thus, in a species neutral theory, all species are equivalent.Note that this model does not preclude abundant species being at a competitive disadvantage1 or advantage2 .
A theory is neutral (sometimes unified neutral) if it assumes that all individuals in a community are "strictly equivalent regarding their prospects of reproduction and death" (Chave 2005); a neutral theory is thus species neutral.One consequence of this assumption is that both p B (i) and p D (i), the probabilities of a birth and death respectively being an individual of species i, are both proportional to n i , the number of individuals of that species present in the ecosystem.Then we have p D (i) = p B (i) for all i, and hence p −1 = p +1 .Thus the number of individuals of species i-and hence that of all species-is a Martingale.Hubbell (2001Hubbell ( , 1979) ) discusses, motivates, and assesses the unified neutral theory of biodiversity (henceforth UNTB) and shows that it makes a wide variety of quantitative predictions, all of which appear to be statistically consistent with predictions-or to be incorrect by only a small margin, as discussed by Volkov, Vanavar, Hubbell, and Maritan (2003), for example.
The neutral theory also provides a useful null model against which to compare temporal data: Leigh, Jr., Wright, Herre, and Putz (1993), for example, use neutral dynamics in this way and found evidence that tree diversity on newly isolated tropical islands declines faster than expected by neutral models, suggesting that the discrepancy may be explained by non-neutral animal-plant interactions.
More recent work adduces subtle analyses that cast doubt on the verisimilitude of the UNTB.For example, Nee (2005) shows that the UNTB implies that the expected age of common lineages (of trees, in his example) is "impossibly old"; McGill, Maurer, and Weiser (2006) present a hierarchy of increasingly demanding statistical tests of the UNTB and finds it lacking at the highest levels.
Nevertheless, the UNTB furnishes a broadly realistic mechanism for reproduction and speciation (Maurer and McGill 2004); it makes predictions that are closer to field observations than one might expect for so parsimonious a theory.McGill et al. (2006), after rejecting the UNTB in favour of a non-mechanistic alternative 3 , state that the UNTB remains "an extraordinary and perhaps uniquely elegant ecological theory.Such elegance is indicative that the neutral theory will have some important role in ecology".Similarly, Nee (2005) states that the UNTB can still provide "useful null models for the interpretation of data".It is in this spirit that I present untb, an R (R Development Core Team 2007) package that uses the UNTB to simulate and analyze such datasets.The theory is then applied to a number of oceanographic datasets.
In the context of computer simulations, it is possible to simulate neutral ecological drift by the following algorithm.Consider an system of J M individuals, and a timestep sufficiently short for only a single individual to die.Then: 1. Choose a site at random; the individual at this site dies.
2. With probability 1 − ν, an individual of species chosen at random from the remaining population is born 3.With probability ν, an individual of a new species is born.
Subsequent timesteps may be enacted using the new community as a pool in step 2. The meaning of "new" species is subject to differing interpretations: the new species might be the result of a point mutation, or in the case of island biogeography, the successful immigration of an individual from a metacommunity (typically the mainland).
This system is not the only way to simulate neutral ecological drift: it ignores, for example, changes in competitive advantage accruing to older individuals.Also, alternative models of speciation may be adopted, such as the random fission model (Ricklefs 2003;Hubbell 2003).

Computational population ecology and the untb package
The R package untb associated with this paper may be used to analyze ecosystem data in the 3 The lognormal distribution (Volkov et al. 2003) context of Hubbell's neutral theory.The package contains routines that summarize census data, estimate biodiversity parameters, and generate synthetic datasets under conditions of exact neutrality.In an R session, typing library(help = "untb") at the command line shows a list of topics that are documented in the package.
It is envisaged that the untb package will be useful to ecologists in both research and teaching.Many of the routines are intended to be useful to teachers covering the neutral theory: a number of visualization tools are included that display census datasets in a consistent and standardized format; function display.untb()displays a "movie" of a system undergoing neutral drift in a visually striking, and hopefully memorable, manner.
Researchers will find useful functionality in the package, which can perform a range of relevant analyses on real data.It is hoped that the package will avoid other workers having to 'reinvent the wheel' by virtue of having all of these features together in a unified, clearly described framework implemented in an open-source software environment.

Analysis of species abundance data
Consider the saunders dataset, which lists species abundances of various types of marine animals living in kelp holdfasts (Saunders 2007;Anderson, Diebel, Blom, and Landers 2005).The dataset may be examined using the functionality of untb: > data("saunders") > summary(saunders.tot)Thus there are 42 species with abundance 1 (that is, singletons); 14 species with abundance 2; 21 species with abundance 3-4; 22 species with abundance 5-8, and so on.
A more graphical representation of the same dataset is given in figure 1, which shows the ranked abundance of each species in red on the logarithmic vertical axis.The highest red dot thus corresponds to Ventojassa with 1046 individuals.On this figure, grey lines show synthetic datasets generated using the maximum likelihood estimate for θ (section 2.2).

Estimation of parameters
One distinguishing feature of Hubbell's unified neutral theory is that it provides a mechanism for speciation: the algorithm shown in section 1.1 specifies that a birth of an individual will be a new species with probability ν.Although ν is small and J M large, Hubbell (2001, page 116) observes that their product is a 'moderately sized number'.The Fundamental Biodiversity parameter θ, defined as θ = 2J M ν, arises naturally when investigating neutral ecological drift and in the present context may be regarded as a parameter susceptible to statistical inference.
If ϕ a is the number of species with abundance a, the "Ewens sampling formula" (Ewens 1972) gives the probability of observing a particular species abundance distribution (SAD) as [theta.prob() in the package] from which it is clear that the number of species S occurring in a fixed sample size of J M individuals is a sufficient statistic for θ, and optimizing the likelihood L given as [theta.likelihood() in the package] over positive values thus furnishes a maximum likelihood estimate for θ.Function optimal.theta()carries out this procedure numerically: > optimal.theta(saunders.tot) [1] 31.35894 The accuracy of this estimate may be assessed by examining the likelihood curve (Alonso and McKane 2004); figure 2 shows the support (log L) for a range of θ.Using an exchange rate of two units of support per degree of freedom (Edwards 1992) suggests that the true value of θ lies in the range 26-37.

Estimating the probability of immigration: The Etienne sampling formula
Neutral theories may allow for the possibility of restricted immigration to, for example, islands.Consider a partially isolated ecosystem of J individuals, and a neutral metacommunity of J M individuals and biodiversity parameter θ.In the algorithm presented in section 1.1, reinterpret the birth of a new species (an event with probability ν) as the arrival of an organism from the metacommunity.This probability is conventionally labelled m; the probability of mutation on the island is negligible.
It is often of great interest to estimate m; Etienne (2005) shows that the probability of a given set of abundances D is just equation 2, modified by a factor that is a complicated function of m, the species abundances, and θ.Etienne's sampling probability is given by function etienne() in the package, and is maximized by function optimal.params().(Hankin 2007b); or the PARI/GP system (Batut, Belabas, Bernardi, Cohen, and Olivier 2006).These options are controlled by setting a flag in function logkda(); full documentation is given in the online help page.

The Barro Colorado dataset
The BCI dataset (Condit, Hubbell, and Foster 2005)-a standard resource for ecologists (Hubbell 2001;Etienne 2005)-contains location and species identity for all trees of ⩾ 10 cm diameter at breast height on Barro Colorado Island for a number of years.This dataset was used in Hankin (2007a) but is not included in the untb package (and therefore not used in this vignette), because its licence appears to be inconsistent with the GPL.Further details may be found in the package's online documentation file bci.Rd.

Estimation of parameters from field data
Some of the difficulties of estimating θ and m from field data are shown in Figure 3, which shows three ensembles of maximum likelihood estimates for local communities of size J = 100, 1000, 10000, partially isolated (m = 0.01) from a metacommunity of size J M = 5 × 10 6 with θ = 50.The local communities were allowed to evolve for sufficiently many timesteps for the results to be unaffected by further simulation (Chisholm and Burgman 2004;Hubbell and Borda-de-Água 2004).Note the ensemble behaviour of the maximum likelihood estimator: the estimates are severely biased, especially for the smaller sample sizes.In general, θ is underestimated, while m is overestimated.Even with the largest local community, E( θ) = 40.9 and E( m) = 0.047, showing under-and over-estimation respectively.
In particular, note that many of the ensembles give rise to estimates for m very close to 1.This represents an incorrect assessment of (almost) no dispersal limitation, as in fact m = 0.01.The probability of m being greater than, say, 0.95 decreases with increasing local community size J but is definitely appreciable even when J = 10000.In cases where m ≃ 1, inspection of the likelihood surface in the (θ, m) plane shows that there is no extremum (that is, ∂L/∂θ = ∂L/∂m = 0) anywhere in the half-plane m ⩽ 1.Such cases are challenging for the method of maximum likelihood because the numerically determined maximum will fall on the line m = 1.
In practice, this means that m being close to 1 is not inconsistent with a much smaller value of m unless J ≫ 10000, at least for θ ∼ 50.Such considerations may be relevant to any work in which m ≃ 1; rather than using  θ, m as a basis for making inferences, one might adopt the Bayesian approach and use the posterior joint distribution of (θ, m).

Exact combinatorial analysis for small J M
The neutral theory may be used to produce various analytical results, but many are tractable only for small J M ; here I determine exact expected abundances for J M = 20.Hubbell (2001, S, r 1 , . . ., r S , 0, . . ., 0 where r i (k) is the abundance of the i th species under configuration k, and summation extends over partitions of J M in the sense of Hankin (2006) (Hubbell's "configurations" are our "partitions").Function expected.abundance() in the package calculates this, as shown in Figure 4 which plots expected abundance as a function of species rank R. The very small expected abundances for, say, R > 10, are consistent with the overwhelming majority of J M = 20, θ = 2 ecosystems comprising fewer than 11 species (thus giving any species with R > 10 a zero abundance).As an extreme case, the value of about 10 −11 for R = 20 is the probability of the ecosystem comprising only singletons.The abundance of each ranked species will exhibit variance; it is possible to build up a histogram of the abundance of, say, the third ranked species by repeatedly generating a neutral ecosystem and recording the abundance of the third most abundant species amongst the J M individuals present.Figure 5 shows such a histogram generated, again using J M = 20 and θ = 2.Note the absence of zero abundance: in none of the 1000 replicates did the ecosystem comprise only two species (thus rendering the third ranked species nonexistent).Also note the impossibility of it being 7 or greater: if the third ranked species had abundance 7, then the first and second ranked species would have abundances ⩾ 7, so the total number of individuals would exceed 20.

Creation and analysis of synthetic datasets
Package untb includes a number of functions that generate synthetic datasets in the context of visualization or verification of predictions of the neutral theory.
Function untb() generates random neutral ecosystems using the system specified in equations 1.A sample output is shown graphically in Figure 6, which exhibits sudden overturning events during which the dominant species suffers a drastic reduction in abundance, to be replaced by a different species.This is a direct numerical verification of the neutral theory's prediction of punctuated equilibrium (Hubbell 2001, page 233).

Conclusions
This paper presents Hubbell's Unified Neutral Theory of Biodiversity from a computational ecology perspective, and introduces the untb package of R routines for numerical simulation of neutral ecosystem dynamics, analysis of field data, and visualization of real and synthetic datasets.
Examples taken from oceanographic datasets are presented and discussed using the package.
A number of synthetic datasets are generated and analyzed using the package's simulation functions and analyzed using the visualization suite.

Figure 1 :
Figure1: The ranked abundance curve of the Saunders dataset showing real data (red) and 10 simulated ranked abundance curves, generated randomly using rand.neutral()using the maximum likelihood estimate for θ (grey)

Figure 2 :
Figure 2: The support curve for the Saunders dataset as a function of the Fundamental Biodiversity Parameter θ

Figure 3 :
Figure 3: Ensemble of 100 maximum likelihood estimates of m and θ for three different sizes of local community.Here, the metacommunity is of size J M = 5 × 10 6 , the biodiversity parameter θ is 50, and the immigration probability m is 0.01.The large diagonal cross corresponds to the true values (50, 0.01).Note the systematic bias present for each J, being most severe for the smallest (J = 100; black circles)

Figure 4 :
Figure 4: The expected abundances for the i th ranked species with J M = 20 and θ = 3 for 1 ⩽ i ⩽ 20.Red dots show exact expected abundances and the vertical black lines show the range if 2 ⩽ θ ⩽ 4.

Figure 5 :
Figure 5: Abundance of third ranked species of an ecosystem of J M = 20 individuals and a biodiversity parameter θ = 2: 1000 replicates

Figure 6 :
Figure6: Synthetic dataset generated using neutral dynamics.Lines show the abundance of each species in time; different colours correspond to different (equivalent) species.A sudden displacement of the initially monodominant species (black line) with another species (red line) occurs at t ≃ 17000; a similar overturning event occurs at around time 35000

Figure 7 :
Figure 7: Simulated ecosystems of size J M = 100 for varying parameters, with each organism represented by a dot, the different colours representing different species; spatial locations are functionally identical.Initial conditions are a system of maximal diversity (i.e. 100 singletons) Preston (1948)ers.totdatasetcomprises8419 individuals of 176 species, of which 42 are "singletons"-that is, species sufficiently rare that they have only one representative in the entire sample.Further details of the data are given by function preston(), which carries out the analysis ofPreston (1948):