A General Framework for Spatio-Temporal Modeling of Epidemics With Multiple Epicenters: Application to an Aerially Dispersed Plant Pathogen

The spread dynamics of long-distance-dispersed pathogens are influenced by the dispersal characteristics of a pathogen, anisotropy due to multiple factors, and the presence of multiple sources of inoculum. In this research, we developed a flexible class of phenomenological spatio-temporal models that extend a modeling framework used in plant pathology applications to account for the presence of multiple sources and anisotropy of biological species that can govern disease gradients and spatial spread in time. We use the cucurbit downy mildew pathosystem (caused by Pseudoperonospora cubensis) to formulate a data-driven procedure based on the 2008 to 2010 historical occurrence of the disease in the U.S. available from standardized sentinel plots deployed as part of the Cucurbit Downy Mildew ipmPIPE program. This pathosystem is characterized by annual recolonization and extinction cycles, generating annual disease invasions at the continental scale. This data-driven procedure is amenable to fitting models of disease spread from one or multiple sources of primary inoculum and can be specified to provide estimates of the parameters by regression methods conditional on a function that can accommodate anisotropy in disease occurrence data. Applying this modeling framework to the cucurbit downy mildew data sets, we found a small but consistent reduction in temporal prediction errors by incorporating anisotropy in disease spread. Further, we did not find evidence of an annually occurring, alternative source of P. cubensis in northern latitudes. However, we found a signal indicating an alternative inoculum source on the western edge of the Gulf of Mexico. This modeling framework is tractable for estimating the generalized location and velocity of a disease front from sparsely sampled data with minimal data acquisition costs. These attributes make this framework applicable and useful for a broad range of ecological data sets where multiple sources of disease may exist and whose subsequent spread is directional.

respectively. An instantaneous measure of the epidemic velocity v can be 134 expressed in terms of the distance r from the epicenter as Ojiambo et al. [1] used this power-logistic model to estimate the spread 136 parameter b of epidemic waves resulting from an assumed isotropic spread 137 of cucurbit downy mildew in the eastern U.S. They assumed that all 138 epidemics are first observed at the same initial distance of spread r 0 [1] 139 given that P. cubensis overwinters in south Florida and the inoculum is 140 aerially dispersed northward when the environment is conducive [12,16].

141
The spatial-temporal models depicted by Equations 5 and 6 assume 142 isotropic spread, which is implicit in applications of these models to 143 describe disease spread [7,8]   an arbitrary spatial location x ∈ R 2 relative to x 0 . Let the parametric 219 model for disease incidence at location (r, φ) and time t be denoted by 220 y = y(r, φ, t|Θ) = f (r, φ, t|Θ) where Θ are the parameters of the model.

221
The model is characterized by the partial derivatives with respect to r 222 and t, which describe the spatial and temporal components of the model.

223
Let 224 ∂y ∂t = ay(1 − y) (9) where g(φ) : [0, 2π] → R is a function of the angle between the location x 225 and the source point x 0 , and a and b are parameters of the model. The

226
function g induces spatial anisotropy by allowing the rate of change of 227 disease incidence with distance from the source to depend on direction.

228
Note that in this model, the rate of change with time is independent of 229 location.

230
The explicit form for f (·) is obtained by integration. First integrating

231
Equation 10 gives where c(t, φ) is a normalizing function. Then, integrating Equation 9 233 gives 234 log where d(r, φ) is a normalizing function. Now, together Equations 11 and 235 12 imply Note that if the partial derivative ∂y ∂φ is specified, one can obtain a 237 functional form for c(φ), but here the spatial component of the model is only specified in terms of ∂y ∂r . Algebraic rearrangement of Equation 13 239 yields that the explicit form for f (·) up to normalizing constants is 240 y(r, φ, t|Θ) = 1 where the parameter set Θ = {a, b, g(·)} and A(φ) = 1 c(φ) . This is a 241 disease incidence spatio-temporal process with spatial kernel . We note that this result is a generalization of the information about the spatiotemporal distribution of disease incidence.

251
From Equations 9 and 10, velocity is given by where h(φ) is a normalizing function. We note that Equation 16 is linear 254 in time and can be fit to obtain estimates of M , h, and g using regression 255 Notation Description x (k) kth source location (Cartesian) r (k) distance to kth source (km) φ (k) angle to kth source (radians) t time (day of year) K number of sources g k kth directional anisotropy function p (k) probability that disease is caused by kth source c k kth regression intercept −M k kth regression parameter for time h k (·) normalizing function for kth regression model   Let (x (1) , . . . , x (K) ) denote the source locations; for each k = 1, . . . , K, 260 x (k) ∈ R 2 . Now an arbitrary location x ∈ R 2 is associated with K sets of 261 polar coordinates (r (1) , φ (1) ), . . . , (r (K) , φ (K) ) where the kth polar 262 coordinate pair indicates the distance r (k) and angle φ (k) to the kth source 263 point x (k) . A depiction of this data representation is given in Figure 1.
x (1) Estimation. We propose an estimation procedure wherein velocity 278 models are fit using regression methods conditional on known g k . The  disease occurrence data (presence or absence) of the form Estimates of c k , M k and h k are easily computed using semiparametric rewriting Equation 20 we obtain

Updatep
where ϕ(x, σ 2 ) is the probability density function of a Gaussian 313 random variable with mean zero and variance σ 2 evaluated at value 314 x. 4. Repeat steps 2-3 until convergence.

316
A simple heuristic for the initialization step is to use asp kth source is closest on the variables r (1) /ĝ 1 (φ (1) ), . . . , r (K) /g k (φ (K) ). We 319 note that an isotropic model with one or many sources can be recovered 320 within this framework as a special case by fixing g k (x) = 1/2π for 321 x ∈ [0, 2π], with the consequence that h k ≡ 0.
Since in addition, the model includes estimated probabilities that the ith 331 data point is associated with each source (the estimatesp simple heuristic for selecting a single temporal estimate fromt and a single spatial estimate fromr associated with the most probable source. That is, let Then, a fitted model can be evaluated according to the spatial and 336 temporal root mean square error (RMSE) metrics

Cucurbit downy mildew data
where h is a positive number and C(h) −1 =   Table 2: Parameter estimates and 95% confidence intervals for one-and two-source models fit to 2008 data (n = 25). In the two-source models, two sets of estimates are reported corresponding to a northern placement and a southwestern placement for the alternate (non-FL) source. No basis parameters are reported for the isotropic models, since these terms are only included in the anisotropic models.   Table 3: Parameter estimates and 95% confidence intervals for one-and two-source models fit to 2009 data (n = 65). In the two-source models, two sets of estimates are reported corresponding to a northern placement and a southwestern placement for the alternate (non-FL) source. No basis parameters are reported for the isotropic models, since these terms are only included in the anisotropic models. (FL -Florida, TX -Texas, OH -Ohio) The basis parameter estimatesβ  Table 4: Parameter estimates and 95% confidence intervals for one-and two-source models fit to 2010 data (n = 28). In the two-source models, two sets of estimates are reported corresponding to a northern placement and a southwestern placement for the alternate (non-FL) source. No basis parameters are reported for the isotropic models, since these terms are only included in the anisotropic models. In this year, no alternate source model is estimated for the southwest location, as nearly all data points are attributed to the FL source during model fitting. (FL -Florida, TX -Texas, OH -Ohio) Finally, the estimates for the two-source models suggest that the  (Tables 2 and 3). 556 Accounting for anisotropy in one source models reduced RMSE 557 measured in time slightly (0.76 to 1.35 days) but consistently (Table 5); 558 spatial errors were not consistently reduced in these data sets. For  Table 6). Among the multiple source models, some anisotropic 562 models reduced temporal and spatial errors for some years as compared 563 to the corresponding isotropic model. However, no single more complex 564 model consistently reduced prediction errors across all years when 565 multiple sources were included. 566 Figure 3: Time-of-occurrence prediction errors for predictions from isotropic and anisotropic one-and two-source models fit to data from sentinel plots in 2008, with contours representing estimated disease front over time according to the models. The two-source models were each fit with two alternate (non-FL) source locations: a northern and a southwestern location. Each panel shows results according to a different model: isotropic one-source (IOS); isotropic two-source (ITS); anisotropic onesource (AOS); and anisotropic two-source (ATS). Figure 4: Time-of-occurrence prediction errors for predictions from isotropic and anisotropic one-and two-source models fit to data from sentinel plots in 2009, with contours representing estimated disease front over time according to the models. The two-source models were each fit with two alternate (non-FL) source locations: a northern and a southwestern location. Each panel shows results according to a different model: isotropic one-source (IOS); isotropic two-source (ITS); anisotropic onesource (AOS); and anisotropic two-source (ATS). Figure 5: Time-of-occurrence prediction errors for predictions from isotropic and anisotropic one-and two-source models fit to data from sentinel plots in 2010, with contours representing estimated disease front over time according to the models. The two-source models were each fit with two alternate (non-FL) source locations: a northern and a southwestern location. Each panel shows results according to a different model: isotropic one-source (IOS); isotropic two-source (ITS); anisotropic onesource (AOS); and anisotropic two-source (ATS).      northern edge of the Gulf Coast were mostly attributed to this source, 5) as only two sentinel plots in Texas and Michigan were attributed to 597 that source. 598 We again emphasize here that prediction errors associated with any of 599 these models varied depending on the year and specific model, and were 600 not necessarily improved uniformly as compared to the corresponding 601 isotropic one source model (Tables 5 and 6).  pathogenic fungi, but did not consider anisotropy in epidemics over time.

629
The models derived in this study accommodate both spatial and   We did find support for an alternate inoculum source on the western  The multiple source modeling framework we present is most useful for 722 post-hoc analysis of epidemics rather than prediction. This is because 723 parameter estimation requires an iterative procedure based on known 724 distribution of disease, which precludes prediction during an active 725 epidemic. This has no bearing on one-source models, with or without anisotropy, which our analyses suggest may be adequate in some 727 situations. 728 We introduced anisotropy in disease spread through the functions g k 729 that are estimated from prevailing wind directions at the epicenters. properties of the initial disease epicenter. 743 We also point out that many other functions could be used to