Skip to main content

Advertisement

Log in

A Shared Latent Process Model to Correct for Preferential Sampling in Disease Surveillance Systems

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

Disease surveillance systems are crucial to monitor and predict outbreaks, epidemics and pandemics, as well as to understand the dynamics and trends of diseases over space and time. For zoonotic diseases, i.e., diseases that spread from animals to humans, surveillance systems often rely on complex data collection mechanisms which present particular challenges to the statistician, including sampling processes that commonly violate key assumptions of standard statistical methods. One such mechanism is preferential sampling, referring to a stochastic dependency between a spatial process of interest and the locations at which it is observed, commonly arising out of practical considerations related to a limited sampling budget and a rare outcome. While this sampling strategy can lead to considerably biased spatial predictions, few solutions to address preferential sampling have been proposed in the context of disease surveillance. We propose a novel approach to correct for preferential sampling in disease surveillance applications and show by simulation the practical benefits of reduced bias in parameter estimates and greater accuracy of the estimated risk surface. We conclude with an application of the model to a disease surveillance dataset targeting plague (Yersinia pestis) in the sylvatic rodent populations in California.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Adeli A, Dowd P, Emery X, Xu C (2021) Using cokriging to predict metal recovery accounting for non-additivity and preferential sampling designs. Miner Eng 170:106923

    Google Scholar 

  • Banerjee S, Carlin BP, Gelfand AE (2014) Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC, London

    MATH  Google Scholar 

  • Buller I (2019) On estimating the spatial distribution of Yersinia pestis in the United States using a wide-ranging sentinel species and spatial statistics with sampling considerations., PhD dissertation, Emory University

  • California Department of Public Health (2015) “Vector-borne disease section annual report,”, https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/VBDSAnnualReports.aspx#

  • Carlson CJ, Bevins SN, Schmid BV (2022) Plague risk in the western United States over seven decades of environmental change. Glob Change Biol 28(3):753–769

    Google Scholar 

  • Carlton EJ, Eisenberg JN, Goldstick J, Cevallos W, Trostle J, Levy K (2014) Heavy rainfall events and diarrhea incidence: the role of social and environmental factors. Am J Epidemiol 179(3):344–352

    Google Scholar 

  • Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw 76(1)

  • Cecconi L, Biggeri A, Grisotto L, Berrocal W, Rinaldi L, Musella V, Cringoli G, Catelan D (2016) Preferential sampling in veterinary parasitological surveillance. Geospat Health. https://doi.org/10.4081/gh.2016.412

    Article  Google Scholar 

  • Chipman HA, George EI, McCulloch RE (2010) BART: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298

    MathSciNet  MATH  Google Scholar 

  • Choi J, Cho Y, Shim E, Woo H (2016) Web-based infectious disease surveillance systems and public health perspectives: a systematic review. BMC Public Health 16(1):1238

    Google Scholar 

  • Conn PB, Thorson JT, Johnson DS (2017) Confronting preferential sampling when analysing population distributions: diagnosis and model-based triage. Methods Ecol Evol 8(11):1535–1546

    Google Scholar 

  • Daly C, Halbleib M, Smith JI, Gibson WP, Doggett MK, Taylor GM, Curtis J, Pasteris PP (2008) Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. Int J Climatol: J R Meteorol Soc 28(15):2031–2064

    Google Scholar 

  • Danforth M, Tucker J, Novak M (2018) The deer mouse (Peromyscus maniculatus) as an enzootic reservoir of plague in California. EcoHealth 15:566–576

    Google Scholar 

  • Davis RM, Smith RT, Madon MB, Sitko-Cleugh E (2002) Flea, rodent, and plague ecology at Chuchupate campground, Ventura County, California. J Vector Ecol 27:107–127

    Google Scholar 

  • Diggle P (2013) Statistical analysis of spatial and spatio-temporal point patterns. Chapman and Hall/CRC

    MATH  Google Scholar 

  • Diggle P, Menezes R, Tingali S (2010) Geostatistical inference under preferential sampling. J R Sta Soc: Ser C (Appl Stat) 59(2):191–232

    MathSciNet  Google Scholar 

  • Diggle P, Morris S, Elliott P, Shaddick G (1997) Regression modelling of disease risk in relation to point sources. J R Stat Soc: Ser A (Stat Soc) 160(3):491–505

    Google Scholar 

  • DiMatteo I, Genovese CR, Kass RE (2001) Bayesian curve-fitting with free-knot splines. Biometrika 88(4):1055–1071

    MathSciNet  MATH  Google Scholar 

  • Duane S, Kennedy AD, Pendleton BJ, Roweth D (1987) Hybrid Monte Carlo. Phys Lett B 195(2):216–222

    MathSciNet  Google Scholar 

  • Fandos G, Kéry M, Cano-Alonso LS, Carbonell I, Luis Tellería J (2021) Dynamic multistate occupancy modeling to evaluate population dynamics under a scenario of preferential sampling. Ecosphere 12(4):e03469

    Google Scholar 

  • Fithian W, Elith J, Hastie T, Keith DA (2015) Bias correction in species distribution models: pooling survey and collection data for multiple species. Methods Ecol Evolu 6(4):424–438

    Google Scholar 

  • Gage KL, Kosoy MY (2005) Natural history of plague: perspectives from more than a century of research. Ann Rev Entomol 50:505–528

    Google Scholar 

  • Gage KL, Ostfeld RS, Olson JG (1995) Nonviral vector-borne zoonoses associated with mammals in the United States. J Mammal 76:695–715

    Google Scholar 

  • Gelfand AE, Sahu SK, Holland DM (2012) On the effect of preferential sampling in spatial prediction. Environmetrics 23(7):565–578

    MathSciNet  Google Scholar 

  • Gelfand AE, Shirota S (2019) Preferential sampling for presence/absence data and for fusion of presence/absence data with presence-only datav. Ecol Monogr 89(3):e01372

    Google Scholar 

  • Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (1995) Bayesian data analysis. Chapman and Hall/CRC

    MATH  Google Scholar 

  • Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871

    Google Scholar 

  • Ho LP, Stoyan D (2008) Modelling marked point patterns by intensity-marked Cox processes. Stat Probab Lett 78(10):1194–1199

    MathSciNet  MATH  Google Scholar 

  • Hoffman MD, Gelman A (2014) The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15(1):1593–1623

    MathSciNet  MATH  Google Scholar 

  • Holt AC, Salkeld DJ, Fritz CL, Tucker JR, Gong P (2009) Spatial analysis of plague in California: niche modeling predictions of the current distribution and potential response to climate change. Int J Health Geogr 8(1):38

    Google Scholar 

  • Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL, Daszak P (2008) Global trends in emerging infectious diseases. Nature 451(7181):990–993

    Google Scholar 

  • Keusch GT, Pappaioanou M, Gonzalez MC, Scott KA, Tsai P (2009) National research council in sustaining global surveillance and response to emerging zoonotic diseases. National Academies Press

    Google Scholar 

  • Lang JD, Wills W (1991) Ecology of sylvatic plague in the San Jacinto Mountains of southern California. Bull Soc Vector Ecol 16(1):183–199

    Google Scholar 

  • Leathwick JR, Elith J, Hastie T (2006) Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol Model 199(2):188–196

    Google Scholar 

  • Lee A, Szpiro A, Kim SY (2015) Impact of preferential sampling on exposure prediction and health effect inference in the context of air pollution epidemiology. Environmetrics 26(4):255–267

    MathSciNet  Google Scholar 

  • Lee D, Ferguson C, Scott EM (2011) Constructing representative air quality indicators with measures of uncertainty. J R Stat Soc: Ser A (Stat Soc) 174(1):109–126

    MathSciNet  Google Scholar 

  • Lindgren F, Rue H, Lindstrom J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Stat Soc: Ser B (Stat Methodol) 73(4):423–498

    MathSciNet  MATH  Google Scholar 

  • Maher SP, Ellis C, Gage KL, Enscore RE, Peterson AT (2010) Range-wide determinants of plague distribution in North America. Am J Trop Med Hyg 83(4):736–742

    Google Scholar 

  • Meyer KF (1934) Selvatic plague-its present status in California. Calif West Med 40(6):407–410

    Google Scholar 

  • Murray KF (1964) The evolution of plague control in California. Proc Vertebr Pest Conf 2(2)

  • Nakazawa Y, Williams R, Peterson AT, Mead P, Staples E, Gage KL (2007) Climate change effects on plague and tularemia in the United States. Vector-Borne Zoonotic Dis 7(4):529–540

    Google Scholar 

  • Nelson BC (1980) Plague studies in California–the roles of various species of sylvatic rodents in plague ecology in California. Proc Vertebr Pest Conf 9(9)

  • Nelson BC, Smith CR (1976) Ecological effects of a plague epizootic on the activities of rodents inhabiting caves at Lava Beds National Monument, California. J Med Entomol 13(1):51–61

    Google Scholar 

  • Paci L, Gelfand AE, Beamonte MA, Gargallo P, Salvador M (2020) Spatial hedonic modelling adjusted for preferential sampling. J R Stat Soc: Ser A (Stat Soc) 183(1):169–192

    MathSciNet  Google Scholar 

  • Pati D, Reich BJ, Dunson DB (2011) Bayesian geostatistical modelling with informative sampling locations. Biometrika 98(1):35–48

    MathSciNet  MATH  Google Scholar 

  • Pennino MG, Paradinas I, Illian JB, Munoz F, Bellido JM, Lopez-Quilez A, Conesa D (2019) Accounting for preferential sampling in species distribution models. Ecol Evol 9(1):653–663

    Google Scholar 

  • Plowright RK, Becker DJ, McCallum H, Manlove KR (2019) Sampling to elucidate the dynamics of infections in reservoir hosts. Philos Trans R Soc B 374(1782):20180336

    Google Scholar 

  • Rinaldi L, Biggeri A, Musella V, de Waal T, Hertzberg M, Mavrot F, Torgerson PR, Selemetas N, Coll T, Bosco A, Grisotto L (2015) Sheep and Fasciola hepatica in Europe: the GLOWORM experience. Geospat Health 9(2):309–317

    Google Scholar 

  • Smith CR, Tucker JR, Wilson BA, Clover JR (2010) Plague studies in California: a review of long-term disease activity, flea-host relationships and plague ecology in the coniferous forests of the Southern Cascades and northern Sierra Nevada mountains. J Vect Ecol 35(1):1–12

    Google Scholar 

  • State of California Health and Human Services Agency (2020) California compendium of plague control. https://www.cdph.ca.gov/Programs/CID/DCDC/CDPH%20Document%20Library/CAPlagueCompendium.pdf

  • Stephen C, Zimmer P, Lee M (2019) Is there a due diligence standard for wildlife disease surveillance? A Canadian case study. Can Vet J= Revue Vet Can 60(8):841

    Google Scholar 

  • Taylor LH, Latham SM, Woolhouse MEJ (2001) Risk factors for human disease emergence. Philos Trans R Soc B 356:983–989

    Google Scholar 

  • Veneziano D, Kitanidis PK (1982) Sequential sampling to contour an uncertain function. J Int Assoc Math Geol 14(5):387–404

    Google Scholar 

  • Watson J (2021) A perceptron for detecting the preferential sampling of locations and times chosen to monitor a spatio-temporal process. Sp Stat 43:100500

    MathSciNet  Google Scholar 

  • Watson J, Zidek JV, Shaddick G (2019) A general theory for preferential sampling in environmental networks. Ann Appl Stat 13(4):2662–2700

    MathSciNet  MATH  Google Scholar 

  • Wherry WB (1908) Plague among the ground squirrels of California. J Infect Dis 5(5):485–506

    Google Scholar 

Download references

Acknowledgements

We gratefully thank the local, state, and federal agencies that contributed rodent sampling data to the California Department of Public Health-Vector-Borne Disease Section plague surveillance program. We also sincerely thank two anonymous reviewers for their insightful comments that have greatly improved our analysis. The findings and conclusions in this article are those of the author(s) and do not necessarily represent the views or opinions of the California Department of Public Health or the California Health and Human Services Agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brian Conroy.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 36 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Conroy, B., Waller, L.A., Buller, I.D. et al. A Shared Latent Process Model to Correct for Preferential Sampling in Disease Surveillance Systems. JABES 28, 483–501 (2023). https://doi.org/10.1007/s13253-023-00535-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-023-00535-4

Keywords

Navigation