Skip to main content
Log in

Modeling Clustered Count Data with Excess Zeros in Health Care Outcomes Research

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

In health research, count outcomes are fairly common and often these counts have a large number of zeros. In order to adjust for these extra zero counts, various modifications of the Poisson regression model have been proposed. Lambert (Lambert, D., Technometrics 34, 1–14, 1992) described a zero-inflated Poisson (ZIP) model that is based on a mixture of a binary distribution (π i ) degenerated at zero with a Poisson distribution (λ i ). Depending on the relationship between π i and λ i , she described two variants: a ZIP and a ZIP (τ) model. In this paper, we extend these models for the case of clustered data (e.g., patients observed within hospitals) and describe random-effects ZIP and ZIP (τ) models. These models are appropriate for the analysis of clustered extra-zero Poisson count data. The distribution of the random effects is assumed to be normal and a maximum marginal likelihood estimation method is used to estimate the model parameters. We applied these models to data from patients who underwent colon operations from 123 Veterans Affairs Medical Centers in the National VA Surgical Quality Improvement Program.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Albert, J., “A Bayesian analysis of a Poisson random effects model for home run hitters,” The American Statistician 46, 246–253, 1992.

    Google Scholar 

  2. Berndt, B., Hall, E., Hall, R., and Hausman, J., “Estimation and inference in nonlinear structural models,” Annals of Economic and Social Measurement 3, 653–666, 1974.

    Google Scholar 

  3. Bork, R.D., Multilevel analysis of educational data, Academic Press, New York, 1989.

    Google Scholar 

  4. Breslow, N.E., “Extra Poisson variation in log-linear models,” Applied Statistics 33, 38–44, 1984.

    Google Scholar 

  5. Bryk, A.S. and Raudenbush, S.W., Hierarchical linear models: Applications and data analysis methods, Sage, London, 1992.

    Google Scholar 

  6. Cameron, A. and Trivedi, P., “Econometric models based on count data: Comparisons and applications of some estimators and test,” Journal of the Applied Econometrics 1, 29–53, 1986.

    Google Scholar 

  7. Cohen, A., “Estimation of the Poisson parameter from truncated samples and from censored samples,” Journal of the American Statistical Association 49, 158–168, 1954.

    Google Scholar 

  8. Daley, J., Khuri, S.F., Henderson, W.G., Hur, K. et al., “Risk adjustment of the postoperative morbidity rate for the comparative assessment of the quality of surgical care,” Journal of the American College of Surgeons 185(4), 328–340, 1997.

    Google Scholar 

  9. Dunlop, D., “Regression for longitudinal data: A bridge from least squares Regression,” The American Statistician 48(4), 299–303, 1994.

    Google Scholar 

  10. Gibbons, R. and Hedeker, D., “Application of random effects probit regression models,” Journal of Consulting and Clinical Psychology 62, 285–296, 1994.

    Google Scholar 

  11. Gibbons, R. and Hedeker, D., “Random effects probit and logistic regression models for three-level data,” Biometrics 53, 1527–1537, 1997.

    Google Scholar 

  12. Greene, W.H., “Accounting for excess zeros and sample selection in Poisson and negative binomial regression models,” Working paper, Department of Economics, Stern School of business, New York University, New York, 1994.

    Google Scholar 

  13. Greene, W.H., LIMDEP Version 7.0 user's manual, rev. edn., Econometric Software, Inc., Plainview, NY, 1998.

    Google Scholar 

  14. Gupta, P., Gupta, R., and Tripathi, R., “Analysis of zero-adjusted count data,” Computational Statistics and Data Analysis 23, 207–218, 1996.

    Google Scholar 

  15. Hall, D.B., “Zero-inflated Poisson and binomial regression with random effects: A case study,” Biometrics 56, 1030–1039, 2000.

    Google Scholar 

  16. Hedeker, D., MIXPREG:Acomputer program for mixed-effects Poisson regression. Technical Report, School of Public Health, University of Illinois at Chicago, Chicago, 1998.

    Google Scholar 

  17. Hedeker, D. and Gibbons, R., “A random effects ordinal regression model for multilevel analysis,” Biometrics 50, 933–944, 1994.

    Google Scholar 

  18. Hedeker, D., Siddiqui, O., and Hu, F., “Random effects regression analysis of correlated grouped-time survival data,” Statistical Methods in Medical Research 9, 161–179, 2000.

    Google Scholar 

  19. Heilbron, D., “Generalized linear models for altered zero probabilities and over dispersion in count data,” Technical Report, Department of Epidemiology and Biostatistics, University of California, San Francisco, 1989.

    Google Scholar 

  20. Heilbron, D., “Zero-altered and other regression models for count data with added zeros,” Biometrics Journal 36, 531–547, 1994.

    Google Scholar 

  21. Johnson, D. and Kotz, S., Distributions in statistics-Discrete distributions, JohnWiley and Sons, New York, 1969.

    Google Scholar 

  22. Khuri, S.F., Daley, J., Henderson, W.G., Hur, K. et al., “Risk adjustment of the postoperative mortality rate for the comparative assessment of the quality of surgical care: Results of the National Veterans Affairs Surgical Risk Study,” Journal of the American College of Surgeons 185(4), 315–327, 1997.

    Google Scholar 

  23. Khuri, S.F., Daley, J., Henderson, W.G., Hur, K. et al., “The Department of Veterans Affairs NSQIP: The first national validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of care,” Annals of Surgery 228(4), 491–507, 1998.

    Google Scholar 

  24. King, G., “Event count models for international relations: Generalizations and applications,” International Studies Quarterly 33, 123–147, 1989.

    Google Scholar 

  25. Lambert, D., “Zero-inflated Poisson regression with an application to defects in manufacturing,” Technometrics 34, 1–14, 1992.

    Google Scholar 

  26. Lawless, J., “Negative binomial and mixed Poisson regression,” Canadian Journal of Statistics 15, 209–225, 1987.

    Google Scholar 

  27. Lee, Y. and Nelder, J.A., “Hierarchical generalized linear models (with discussion),” Journal of the Royal Statistical Society B 58, 619–678, 1996.

    Google Scholar 

  28. Long, S., Regression models for categorical and limited dependent variables, Sage, London, 1997.

    Google Scholar 

  29. Mullahy, J., “Specification and testing of some modified count data models,” Journal of Econometrics 33, 341–365, 1986.

    Google Scholar 

  30. Neuhaus, J.M. and Jewell, N., “Some comments on Rosner's multiple logistic model for clustered data,” Biometrics 46, 523–534, 1990.

    Google Scholar 

  31. Neuhaus, J.M., Kalbfleisch, J.D., and Hauck, W.W., “Acomparison of cluster-specific and population-averaged approaches for analyzing correlated binary data,” International Statistical Review 59, 25–35, 1991.

    Google Scholar 

  32. Normand, S.L.T., Glickman, M.E. et al., “Using admission characteristics to predict short-term mortality from myocardial infarction in elderly patients,” JAMA 275, 1322–1328, 1996.

    Google Scholar 

  33. Preisler, H.K., “Analysis of a toxicological experiment using a generalized linear model with nested random effects,” International Statistical Review 57, 145–159, 1989.

    Google Scholar 

  34. Prentice, R., “Correlated binary regression with covariates specific to each binary observation,” Biometrics 44, 1033–1048, 1988.

    Google Scholar 

  35. Rosner, B., “Multivariate methods for clustered binary data with more than one level of nesting,” Journal of the American Statistical Association 84, 373–380, 1989.

    Google Scholar 

  36. Siddiqui, O., “Modeling clustered count and survival data with an application to a school-based smoking prevention study,” PhD Dissertation, University of Illinois at Chicago, 1996.

    Google Scholar 

  37. Snijders, T. and Bosker, R., Multilevel analysis: An introduction to basic and advanced multilevel modeling, Sage, Thousand Oaks, CA, 1999.

    Google Scholar 

  38. Stroud, A.H. and Sechrest, D., Gaussian quadrature formulas, Prentice Hall, Englewood Cliffs, NJ, 1966.

    Google Scholar 

  39. Ten Have, T., Landis, R., and Hartzel, J., “Population-averaged and cluster-specific models for clustered ordinal response data,” Statistics in Medicine 15, 2573–2588, 1996.

    Google Scholar 

  40. Thall, P.F., “Mixed Poisson likelihood regression models for longitudinal interval count data,” Biometrics 44, 197–209, 1992.

    Google Scholar 

  41. Vach, W. and Blettner, M., “Missing data in epidemiologic studies,” Encyclopedia of Biostatistics 4, 2641–2654, 1998.

    Google Scholar 

  42. Vuong, Q., “Likelihood ratio tests for model selection and non-nested hypotheses,” Econometrica 57(2), 307–333, 1989.

    Google Scholar 

  43. Yau, K. and Lee, A., “Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme,” Statistics in Medicine 20, 2907–2920, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kwan Hur.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hur, K., Hedeker, D., Henderson, W. et al. Modeling Clustered Count Data with Excess Zeros in Health Care Outcomes Research. Health Services & Outcomes Research Methodology 3, 5–20 (2002). https://doi.org/10.1023/A:1021594923546

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1021594923546

Navigation