Abstract
Disease surveillance systems are crucial to monitor and predict outbreaks, epidemics and pandemics, as well as to understand the dynamics and trends of diseases over space and time. For zoonotic diseases, i.e., diseases that spread from animals to humans, surveillance systems often rely on complex data collection mechanisms which present particular challenges to the statistician, including sampling processes that commonly violate key assumptions of standard statistical methods. One such mechanism is preferential sampling, referring to a stochastic dependency between a spatial process of interest and the locations at which it is observed, commonly arising out of practical considerations related to a limited sampling budget and a rare outcome. While this sampling strategy can lead to considerably biased spatial predictions, few solutions to address preferential sampling have been proposed in the context of disease surveillance. We propose a novel approach to correct for preferential sampling in disease surveillance applications and show by simulation the practical benefits of reduced bias in parameter estimates and greater accuracy of the estimated risk surface. We conclude with an application of the model to a disease surveillance dataset targeting plague (Yersinia pestis) in the sylvatic rodent populations in California.
Similar content being viewed by others
References
Adeli A, Dowd P, Emery X, Xu C (2021) Using cokriging to predict metal recovery accounting for non-additivity and preferential sampling designs. Miner Eng 170:106923
Banerjee S, Carlin BP, Gelfand AE (2014) Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC, London
Buller I (2019) On estimating the spatial distribution of Yersinia pestis in the United States using a wide-ranging sentinel species and spatial statistics with sampling considerations., PhD dissertation, Emory University
California Department of Public Health (2015) “Vector-borne disease section annual report,”, https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/VBDSAnnualReports.aspx#
Carlson CJ, Bevins SN, Schmid BV (2022) Plague risk in the western United States over seven decades of environmental change. Glob Change Biol 28(3):753–769
Carlton EJ, Eisenberg JN, Goldstick J, Cevallos W, Trostle J, Levy K (2014) Heavy rainfall events and diarrhea incidence: the role of social and environmental factors. Am J Epidemiol 179(3):344–352
Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw 76(1)
Cecconi L, Biggeri A, Grisotto L, Berrocal W, Rinaldi L, Musella V, Cringoli G, Catelan D (2016) Preferential sampling in veterinary parasitological surveillance. Geospat Health. https://doi.org/10.4081/gh.2016.412
Chipman HA, George EI, McCulloch RE (2010) BART: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298
Choi J, Cho Y, Shim E, Woo H (2016) Web-based infectious disease surveillance systems and public health perspectives: a systematic review. BMC Public Health 16(1):1238
Conn PB, Thorson JT, Johnson DS (2017) Confronting preferential sampling when analysing population distributions: diagnosis and model-based triage. Methods Ecol Evol 8(11):1535–1546
Daly C, Halbleib M, Smith JI, Gibson WP, Doggett MK, Taylor GM, Curtis J, Pasteris PP (2008) Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. Int J Climatol: J R Meteorol Soc 28(15):2031–2064
Danforth M, Tucker J, Novak M (2018) The deer mouse (Peromyscus maniculatus) as an enzootic reservoir of plague in California. EcoHealth 15:566–576
Davis RM, Smith RT, Madon MB, Sitko-Cleugh E (2002) Flea, rodent, and plague ecology at Chuchupate campground, Ventura County, California. J Vector Ecol 27:107–127
Diggle P (2013) Statistical analysis of spatial and spatio-temporal point patterns. Chapman and Hall/CRC
Diggle P, Menezes R, Tingali S (2010) Geostatistical inference under preferential sampling. J R Sta Soc: Ser C (Appl Stat) 59(2):191–232
Diggle P, Morris S, Elliott P, Shaddick G (1997) Regression modelling of disease risk in relation to point sources. J R Stat Soc: Ser A (Stat Soc) 160(3):491–505
DiMatteo I, Genovese CR, Kass RE (2001) Bayesian curve-fitting with free-knot splines. Biometrika 88(4):1055–1071
Duane S, Kennedy AD, Pendleton BJ, Roweth D (1987) Hybrid Monte Carlo. Phys Lett B 195(2):216–222
Fandos G, Kéry M, Cano-Alonso LS, Carbonell I, Luis Tellería J (2021) Dynamic multistate occupancy modeling to evaluate population dynamics under a scenario of preferential sampling. Ecosphere 12(4):e03469
Fithian W, Elith J, Hastie T, Keith DA (2015) Bias correction in species distribution models: pooling survey and collection data for multiple species. Methods Ecol Evolu 6(4):424–438
Gage KL, Kosoy MY (2005) Natural history of plague: perspectives from more than a century of research. Ann Rev Entomol 50:505–528
Gage KL, Ostfeld RS, Olson JG (1995) Nonviral vector-borne zoonoses associated with mammals in the United States. J Mammal 76:695–715
Gelfand AE, Sahu SK, Holland DM (2012) On the effect of preferential sampling in spatial prediction. Environmetrics 23(7):565–578
Gelfand AE, Shirota S (2019) Preferential sampling for presence/absence data and for fusion of presence/absence data with presence-only datav. Ecol Monogr 89(3):e01372
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (1995) Bayesian data analysis. Chapman and Hall/CRC
Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871
Ho LP, Stoyan D (2008) Modelling marked point patterns by intensity-marked Cox processes. Stat Probab Lett 78(10):1194–1199
Hoffman MD, Gelman A (2014) The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15(1):1593–1623
Holt AC, Salkeld DJ, Fritz CL, Tucker JR, Gong P (2009) Spatial analysis of plague in California: niche modeling predictions of the current distribution and potential response to climate change. Int J Health Geogr 8(1):38
Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL, Daszak P (2008) Global trends in emerging infectious diseases. Nature 451(7181):990–993
Keusch GT, Pappaioanou M, Gonzalez MC, Scott KA, Tsai P (2009) National research council in sustaining global surveillance and response to emerging zoonotic diseases. National Academies Press
Lang JD, Wills W (1991) Ecology of sylvatic plague in the San Jacinto Mountains of southern California. Bull Soc Vector Ecol 16(1):183–199
Leathwick JR, Elith J, Hastie T (2006) Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol Model 199(2):188–196
Lee A, Szpiro A, Kim SY (2015) Impact of preferential sampling on exposure prediction and health effect inference in the context of air pollution epidemiology. Environmetrics 26(4):255–267
Lee D, Ferguson C, Scott EM (2011) Constructing representative air quality indicators with measures of uncertainty. J R Stat Soc: Ser A (Stat Soc) 174(1):109–126
Lindgren F, Rue H, Lindstrom J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. J R Stat Soc: Ser B (Stat Methodol) 73(4):423–498
Maher SP, Ellis C, Gage KL, Enscore RE, Peterson AT (2010) Range-wide determinants of plague distribution in North America. Am J Trop Med Hyg 83(4):736–742
Meyer KF (1934) Selvatic plague-its present status in California. Calif West Med 40(6):407–410
Murray KF (1964) The evolution of plague control in California. Proc Vertebr Pest Conf 2(2)
Nakazawa Y, Williams R, Peterson AT, Mead P, Staples E, Gage KL (2007) Climate change effects on plague and tularemia in the United States. Vector-Borne Zoonotic Dis 7(4):529–540
Nelson BC (1980) Plague studies in California–the roles of various species of sylvatic rodents in plague ecology in California. Proc Vertebr Pest Conf 9(9)
Nelson BC, Smith CR (1976) Ecological effects of a plague epizootic on the activities of rodents inhabiting caves at Lava Beds National Monument, California. J Med Entomol 13(1):51–61
Paci L, Gelfand AE, Beamonte MA, Gargallo P, Salvador M (2020) Spatial hedonic modelling adjusted for preferential sampling. J R Stat Soc: Ser A (Stat Soc) 183(1):169–192
Pati D, Reich BJ, Dunson DB (2011) Bayesian geostatistical modelling with informative sampling locations. Biometrika 98(1):35–48
Pennino MG, Paradinas I, Illian JB, Munoz F, Bellido JM, Lopez-Quilez A, Conesa D (2019) Accounting for preferential sampling in species distribution models. Ecol Evol 9(1):653–663
Plowright RK, Becker DJ, McCallum H, Manlove KR (2019) Sampling to elucidate the dynamics of infections in reservoir hosts. Philos Trans R Soc B 374(1782):20180336
Rinaldi L, Biggeri A, Musella V, de Waal T, Hertzberg M, Mavrot F, Torgerson PR, Selemetas N, Coll T, Bosco A, Grisotto L (2015) Sheep and Fasciola hepatica in Europe: the GLOWORM experience. Geospat Health 9(2):309–317
Smith CR, Tucker JR, Wilson BA, Clover JR (2010) Plague studies in California: a review of long-term disease activity, flea-host relationships and plague ecology in the coniferous forests of the Southern Cascades and northern Sierra Nevada mountains. J Vect Ecol 35(1):1–12
State of California Health and Human Services Agency (2020) California compendium of plague control. https://www.cdph.ca.gov/Programs/CID/DCDC/CDPH%20Document%20Library/CAPlagueCompendium.pdf
Stephen C, Zimmer P, Lee M (2019) Is there a due diligence standard for wildlife disease surveillance? A Canadian case study. Can Vet J= Revue Vet Can 60(8):841
Taylor LH, Latham SM, Woolhouse MEJ (2001) Risk factors for human disease emergence. Philos Trans R Soc B 356:983–989
Veneziano D, Kitanidis PK (1982) Sequential sampling to contour an uncertain function. J Int Assoc Math Geol 14(5):387–404
Watson J (2021) A perceptron for detecting the preferential sampling of locations and times chosen to monitor a spatio-temporal process. Sp Stat 43:100500
Watson J, Zidek JV, Shaddick G (2019) A general theory for preferential sampling in environmental networks. Ann Appl Stat 13(4):2662–2700
Wherry WB (1908) Plague among the ground squirrels of California. J Infect Dis 5(5):485–506
Acknowledgements
We gratefully thank the local, state, and federal agencies that contributed rodent sampling data to the California Department of Public Health-Vector-Borne Disease Section plague surveillance program. We also sincerely thank two anonymous reviewers for their insightful comments that have greatly improved our analysis. The findings and conclusions in this article are those of the author(s) and do not necessarily represent the views or opinions of the California Department of Public Health or the California Health and Human Services Agency.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Conroy, B., Waller, L.A., Buller, I.D. et al. A Shared Latent Process Model to Correct for Preferential Sampling in Disease Surveillance Systems. JABES 28, 483–501 (2023). https://doi.org/10.1007/s13253-023-00535-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-023-00535-4