Abstract
We discuss a method for analyzing data that are positively skewed and contain a substantial proportion of zeros. Such data commonly arise in ecological applications, when the focus is on the abundance of a species. The form of the distribution is then due to the patchy nature of the environment and/or the inherent heterogeneity of the species. The method can be used whenever we wish to model the data as a response variable in terms of one or more explanatory variables. The analysis consists of three stages. The first involves creating two sets of data from the original: one shows whether or not the species is present; the other indicates the logarithm of the abundance when it is present. These are referred to as the ‘presence data’ and the ‘log-abundance’ data, respectively. The second stage involves modelling the presence data using logistic regression, and separately modelling the log-abundance data using ordinary regression. Finally, the third stage involves combining the two models in order to estimate the expected abundance for a specific set of values of the explanatory variables. A common approach to analyzing this sort of data is to use a ln (y+c) transformation, where c is some constant (usually one). The method we use here avoids the need for an arbitrary choice of the value of c, and allows the modelling to be carried out in a natural and straightforward manner, using well-known regression techniques. The approach we put forward is not original, having been used in both conservation biology and fisheries. Our objectives in this paper are to (a) promote the application of this approach in a wide range of settings and (b) suggest that parametric bootstrapping be used to provide confidence limits for the estimate of expected abundance.
Similar content being viewed by others
References
J. Aitchison J.A.C. Brown (1957) The Lognormal Distribution Cambridge University Press Cambridge, UK
R. Coe R.D. Stern (1982) ArticleTitleFitting models to daily rainfall data Journal of Applied Meteorology 21 1024–1031
E.L. Crow K. Shimizu (1988) Lognormal Distributions: Theory and Applications Dekker New York, USA
A.C. Davison D.V. Hinkley (1997) Bootstrap Methods and Their Application Cambridge University Press Cambridge, UK
M.J. Dobbie A.H. Welsh (2001) ArticleTitleModelling correlated zero-inflated count data Australian and New Zealand Journal of Statistics 43 431–444
B. Efron R.J. Tibshirani (1993) An Introduction to the Bootstrap Chapman & Hall New York
K.J. Gaston T.M. Blackburn J.D. Greenwood R.D. Gregory R.M. Quinn J.H. Lawton (2000) ArticleTitleAbundance-occupancy relationships Journal of Applied Ecology 37 IssueIDSuppl. 1 39–59
D.W. Hosmer T. Hosmer S. Cessie Particlele S. Lemeshow (1997) ArticleTitleA comparison of goodness-of-fit tests for the logistic regression model Statistics in Medicine 16 965–980 Occurrence Handle9160492 Occurrence Handle1:STN:280:ByiA3cvitF0%3D
P.A. Lachenbruch (1976) ArticleTitleAnalysis of data with clumping at zero Biometrical Journal 18 351–356
D. Lambert (1992) ArticleTitleZero-inflated Poisson regression, with an application to defects in manufacturing Technometrics 34 1–14
N.C.H. Lo L.D. Jacobson J.L. Squire (1992) ArticleTitleIndices of relative abundance from fish spotter data based on delta-lognormal models Canadian Journal of Fisheries and Aquatic Science 49 2515–2526 Occurrence Handle10.1139/f92-278
B.F.J. Manly (1997) Randomization, Bootstrap and Monte Carlo Methods in Biology Chapman and Hall London, UK
B.F.J. Manly L.L. McDonald D.L. Thomas (1993) Resource Selection by Animals: Statistical Design and Analysis for Field Studies Chapman and Hall London, UK
P. McCullagh J.A. Nelder (2000) Generalized Linear Models Chapman and Hall London, UK (2nd Edition)
McShane, P.E., Naylor, J.R., Anderson, O., Gerring, P., and Stewart, R. (1993) Pre-fishing surveys of kina (Evechinus chloroticus) in Dusky Sound, Southwest New Zealand. New Zealand Fisheries Assessment Research Document 93/11
R.A. Myers P. Pepin (1990) ArticleTitleThe robustness of lognormal-based estimators of abundance Biometrics 46 1185–1192
M. Pennington (1983) ArticleTitleEfficient estimators of abundance, for fish and plankton surveys Biometrics 39 281–286
J.N. Perry L.R. Taylor (1985) ArticleTitleAdes: new ecological families of species-specific frequency distributions that describe repeated spatial samples with an intrinsic power-law variance-mean property Journal of Animal Ecology 54 931–953
G. Stefansson (1996) ArticleTitleAnalysis of groundfish survey abundance data: combining the GLM and delta approaches ICES Journal of Marine Science 53 577–588
A.H. Welsh R.B. Cunningham C.F. Donnelly D.B. Lindenmayer (1996) ArticleTitleModelling the abundance of rare species: statistical models for counts with extra zeros Ecological Modelling 88 297–308
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fletcher, D., MacKenzie, D. & Villouta, E. Modelling skewed data with many zeros: A simple approach combining ordinary and logistic regression. Environ Ecol Stat 12, 45–54 (2005). https://doi.org/10.1007/s10651-005-6817-1
Issue Date:
DOI: https://doi.org/10.1007/s10651-005-6817-1