A Bayesian Nonparametric Spiked Process Prior for Dynamic Model Selection

. In many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or “normal” behavior. In this manuscript, we consider the monitoring of pneumonia and inﬂuenza (P&I) mortality, to detect inﬂuenza outbreaks in the continental United States, and propose a Bayesian nonparametric model selection approach to take into account the spatio-temporal dependence of outbreaks. More speciﬁcally, we introduce a zero-inﬂated conditionally identically distributed species sampling prior which allows borrowing information across time and to assign data to clusters associated to either a null or an alternate process. Spatial dependences are accounted for by means of a Markov random ﬁeld prior, which allows to inform the selection based on inferences conducted at nearby locations. We show how the proposed modeling framework performs in an application to the P&I mortality data and in a simulation study, and compare with common threshold methods for detecting outbreaks over time, with more recent Markov switching based models, and with spike-and-slab Bayesian nonparametric priors that do not take into account spatio-temporal dependence.


Introduction
In disease surveillance, investigators are often concerned with monitoring a health condition, in order to understand patterns of disease progression across space and time. An outbreak may persist only briefly, since the monitored condition is eventually expected to revert to a long-term trend, or recur and endure over time. Anomalies may be confined locally to specific regions, or instead spread to large areas, possibly after some delay.
The methods routinely employed by the U.S. Center for Disease Control and Prevention (CDC) for outbreak detection are based on thresholding a specific time series based on historical data (Muscatello et al., 2008). More sophisticated methods proposed in the literature have employed hidden Markov models (HMMs) (Madigan, 2005) or more flexible Markov switching models (Martínez-Beneito et al., 2008) to capture the transition between the epidemic and the non-epidemic phase of a disease. A few spatio-temporal models have assumed a first-order autoregressive model on the temporal dynamics of the epidemics (see, e.g. Heaton et al., 2012;Zou et al., 2012).
In this manuscript, we investigate the modeling of possibly temporary and localized changes in disease patterns as a problem of model selection over space and time. In order to identify departures of a monitored process from a baseline or "normal" behavior, we start by assuming a two-groups modeling framework (Efron, 2004(Efron, , 2008: a data point, say y, is assigned either to a null model, H 0 : y ∼ F (y; θ 0 ), or to an alternative model H 1 : y ∼ F (y), according to the realizations of a latent binary indicator variable γ ∼ Bern(ω), i.e. y|ω ∼ ω F (y; θ 0 ) + (1 − ω) F (y). (1) In Bayesian nonparametrics, it is common to assume a Dirichlet process mixture (DPM) model on F (y), in order to describe the heterogeneity of realizations under the alternative model (Ferguson, 1983), i.e.
where G ∼ DP (α, G * ) indicates a Dirichlet Process, with concentration (mass) parameter α > 0 and base distribution G * . The null distribution F (y; θ 0 ) may be either parametric or nonparametric. For example, Do et al. (2005) and Guindani et al. (2014) employ DPM priors to infer differential gene expression (respectively, sequence abundances) under two different conditions. As a further example, Scarpa and Dunson (2009) consider a mixture of a null parametric model and an alternative nonparametric contamination to model functional data. A different approach is presented in Kim et al. (2009), who directly assume y ∼ F (y) as in (2), with the base measure G * modeled as a "spiked" mixture of a point-mass and a diffuse measure. Canale et al. (2017) refer to (1) as an "outer" mixture, as opposed to the "inner" spike-and-slab mixture prior of Kim et al. (2009), and show how two types of mixtures yield structurally different priors. In particular, model (1)-(2) does not coincide with the typical DPM model, due to the constraint that y ∼ F (y; θ 0 ) with probability ω.
In this paper, we assume the outer mixture formulation (1), and we further allow model selection varying across space and time, by modeling the slab component F (y) in (1) with a flexible non-exchangeable species sampling prior process. More specifically, we assume a state-space model, where the state process is the realization of a conditionally identically distributed (CID) species sampling sequence (Berti et al., 2004;Bassetti et al., 2010). In this formulation, the state of a process at time t depends on past observations through a generalized Pólya urn scheme. This allows to describe the evolution of a non-stationary process over time, with limited assumptions on the temporal dependence among observations. With respect to Dirichlet processes, CID sequences only require that, conditionally upon the past, sequences of future observations are identically distributed, i.e. a weaker dependence than exchangeability and a requirement akin to predictive stationary. For the analysis of regime-switching processes, CID sequences provide a robust and flexible alternative to general hidden Markov and semi-Markov models (Airoldi et al., 2014).
We further account for spatial dependencies in the selection of the states, by coupling the species sampling model that guides the temporal evolution of the process with a spatial Markov random field (MRF). Thus, observations are assumed to have a higher probability a priori to originate from the baseline process if nearby locations also do. Similarly, deviations may propagate in space, although their levels could vary. Since the spatial MRF model is used at each time to infer if the process is in its baseline state, our approach is different than the one by (Jo et al., 2017), where a conditionally autoregressive (CAR) specification is used to specify the weights of a random probability measure, with no baseline and with independent observations at each location.
In an application to the detection of outbreaks of pneumonia and influenza (P&I) mortality in the continental United States, we show how our approach provides a more flexible description of spatio-temporal dependencies with respect to other methods used in disease surveillance, namely the thresholding-based method of Muscatello et al. (2008), which is similar to the one currently employed by the CDC, and Bayesian Markov-switching models, which assumes two levels of the process (Martínez-Beneito et al., 2008). We also compare the performance of our approach with "inner" spikeand-slab Dirichlet process (SS-DP) and Pitman-Yor (SS-PY) mixture priors, which enable clustering of observations without any assumption on their temporal dependence (Canale et al., 2017). Our approach allows to cluster observations at different states over time, avoiding a strict homogeneous Markov assumption, while still retaining spatial dependence in the modeling of outbreaks in neighboring regions.
The paper is organized as follows. In Section 2, we formulate our dynamic model selection approach, with a spiked CID process prior to capture the evolution of the process over time and a Markov Random Field prior to inform spatial selection. In Section 2.3, we describe posterior inference under a compound decision theoretic framework. In Section 3 we apply our model to a CDC pneumonia and influenza mortality dataset, and compare with respect to alternative methods. In Section 4, we further investigate the performance of our modeling approach by means of a simulation study. We conclude with some final remarks in Section 5.

Bayesian dynamic model selection
Let Y (s, t) denote a spatio-temporal stochastic process defined on a spatial domain D, at times t ∈ T . We have measurements available at a finite set of locations, s ∈ {s 1 , . . . , s n } ⊂ D, at discrete times t = 1, . . . , T . In the following, we assume D as a spatial lattice, although our framework may be extended to include spatial dependences on a general domain D ⊂ R d , d > 1. We assume a random effect formulation for the stochastic process Y (s, t), i.e.
where μ(s, t) denotes a general regression term, capturing the effect of baseline characteristics, which may include temporal trends and spatially varying covariates, θ(s, t) is a pure random effect, capturing deviations from the baseline trend localized in space and time, and (s, t) is a residual uncorrelated measurement error, e.g.
for s ∈ D and t = 1, . . . , T . The regression term μ(s, t) is modeled on the basis of available covariate information, according to the formulation which is typical in Bayesian spatio-temporal process modeling, see e.g. Banerjee et al. (2014). Here, we are mainly interested in inference about the random effect process, θ(s, t).

Spiked CID process prior
We formulate the dynamic model selection problem by assuming that at each location s and at each time point t, the random effect θ(s, t) is either a realization from a latent spatio-temporal baseline ("null") process, e.g. θ(s, t) = θ 0 (s, t), or a realization from an alternate spatio-temporal process, i.e. θ(s, t) ∼ G(s, t), with G(s, t) indicating the marginal distribution of a general process G(·, ·) defined on D × T , which we further specify below. Without loss of generality, we assume θ 0 (s, t) = 0, i.e. under baseline conditions the observations are fully described by the regression μ(s, t), possibly including an intercept, and by the measurement error variances.
We further characterize the evolution of the process θ(·, ·) over time as the realization of a sequence of conditional distributions, where the realization at time t, θ(s, t), depends on past realizations θ(s, j), j < t. More specifically, let θ (t−1) (s) = [ θ(s, 1), θ(s, 2), . . . , θ(s, t − 1) ] T denote the t − 1-dimensional vector containing the past realizations of the latent process at location s, s ∈ {s 1 , . . . , s n }. Then, for s ∈ {s 1 , . . . , s n } and any t > 1, we define the conditional distribution of θ(s, t) as the spiked mixture, where δ 0 (·) denotes a point mass on the baseline value θ(s, t) = 0, and ω(s, t) is a weight, which in general is assumed to be both spatially and temporally varying. The conditional distribution G t−1 (·) encodes the temporal dependence of the realizations of the process θ(s, t), given θ (t−1) (s). Here, we assume that the sequence of conditional distributions G t−1 (·), t > 1, is defined by the predictive distributions of a generalized species sampling sequence. For notational simplicity, we indicate with G t−1 (·) also the sequence of conditional cumulative distribution functions, and write: where the weights p (j) s,t−1 specify the probability that the value of the process at time t coincides with a past value, θ(s, j), j < t, and r s,t−1 is the remainder probability that θ(s, t) does not cluster with any previous value. G * (x) is a non-atomic base distribution, characterizing the values of the latent random effect under the alternate process. Thus, at each location we have Due to the positive probability of ties θ(s, t) = θ(s, j), j < t, equation (6) can be used to describe the temporal persistence of outbreaks (i.e., departures from the baseline process), and to identify sparse clusters of observations over time. To this purpose, we further assume that G t−1 (·) is a conditionally identically distributed (CID) species sampling sequence (Bassetti et al., 2010). The species sampling weights are defined as a function of independent (not necessarily identically distributed) latent random variables W 1 (s, t), W 2 (s, t), . . . as follows: with the realizations W (s, j) ∈ (0, 1), j = 1, . . . , t. Thus, the prior predictive spatiotemporal effects θ(s, t + k), k ≥ 0, are identically distributed conditionally to the past, θ (t−1) (s). The specification of the weights p (j) s,t−1 and r s,t−1 in (6)-(7) leads to different characterizations of CID sequences. A computationally convenient specification assumes for j = 1, . . . , t, which defines a Beta-GOS sequence at each location (Airoldi et al., 2014). Since the predictive weights depend on the latent sequence W (·, ·), the evolution of the process does not depend on the estimation of a single transition matrix across all time points. Therefore, this choice provides an alternative to more complicated nonhomogeneous Hidden Markov Models for modeling a non-stationary temporal process.
In particular, although our approach assumes two states of the process, the outbreaks can be clustered at multiple levels. The number of clusters is estimated directly from the data, using the Bayesian nonparametric specification. Moreover, by specifying the latent random variables as in (8), we assume that the process is characterized by long memory properties, with high and persistent autocorrelation, on average. This might be appropriate to capture the recurrent seasonality of the clusters over time. We refer to Airoldi et al. (2014) for details and further discussion of the asymptotic properties.

Modeling spatial selection
The model described in the previous section can be regarded as a zero-inflated CID sequence at each location s, which allows for unsupervised clustering of the null and alternative processes over time. We further allow for outbreaks to be clustered in space, by introducing a latent indicator at each s ∈ {s 1 , . . . s n } and t = 1, . . . , T , which specifies if the observations are a realization from the null or alternate process, respectively. Then, the probability of observing an outbreak at time t is a function of the weights of the spiked mixture of CID sequence defined in (5)-(6). More specifically, we can compute the conditional probability of an outbreak at time t given the past realizations of the process, θ (t−1) (s), as where π(s, t) s,t−1 + r s,t−1 denotes the total probability mass associated to pairing θ(s, t) with any of the past realizations of the random effect, j < t, or to a new draw from G * . Let Γ be the n × T matrix of the latent binary indicators γ(s, t). Since (6) is a CID sequence, then, conditionally on Γ, the marginal model of θ(s, t) is a spike-and-slab mixture, In order to account for spatial dependence, in our application to outbreak detection, we leverage the lattice structure of the data, and model the weights ω(s, t) in (10) using a Markov random field prior (MRF). MRF priors have been widely used to model spatial dependence for non-dynamic variable selection (see, for example Zhang et al., 2014, for an application in neuroimaging). More specifically, we define a neighbor set of each location s, N s,t , which in general could depend on the specific time t, and allow the probability of an outbreak to increase if some outbreaks are detected at the same time in neighboring locations. Let γ(N s,t ) = {γ(k, t), k ∈ N s,t } denote the vector of selection indicators in N s,t . Then, we can define the probability of an outbreak conditionally on the status of neighboring locations as where the weight ω * (s, t) captures the spatial dependence in (10) conditionally on the neighborhood N s,t and it is defined as with d ∈ R being a sparsity parameter, and e > 0 indicating a smoothing parameter that controls the degree of spatial dependence in a neighborhood of s, s ∈ {s 1 , . . . s n }. In particular, the conditional probability of an outbreak will be (exp there are no outbreaks in a neighborhood of s at time t. Since ω * (s, t) increases as a function of the number of locations k ∈ N (s, t) where γ(k, t) = 1, our specification (11) favors the detection of spatial clusters of outbreaks. Other spatial priors could be considered, e.g. a spatial probit model where the probability of an outbreak depends on a latent gaussian Markov random field process. In Section 4, we provide a sensitivity analysis to guide the selection of the parameters d and e. In order to complete our model, we select an inverse-gamma conjugate prior for the region specific variance, i.e. τ 2 s ∼ IG(a 0 , b 0 ), and we further choose a Normal distribution as the base measure for the atoms of the alternate process, that is G * (·) ≡ N (μ 0 , σ 2 0 ).

Posterior inference
In order to obtain inferences on the latent process θ(s, t) and the other parameters of the proposed dynamic selection process, we need to employ Markov chain Monte Carlo (MCMC) posterior sampling techniques. The details of the MCMC are provided in the supplementary material (Cassese et al., 2018).
The detection of outbreaks relies on the evaluation of the posterior probability of selection (PPS) of the alternate process at each location s ∈ {s 1 , . . . , s n } and at each time t = 1, . . . , T , which is computed from the MCMC output for the elements of the selection matrix Γ as where γ b (s, t) denotes the value of the selection indicator γ(s, t) sampled at iteration b out of B MCMC iterations after burn in.
. . , T be the vector of decisions (actions) in the multiple hypothesis testing problem, with a(s, t) = 0 indicating that the observations are assumed from the baseline process, and a(s, t) = 1 otherwise. In a compound decision theoretic framework, the spatio-temporal multiple testing problem for pointwise detection can be addressed by minimizing the posterior expectation of the loss function, which weights false negatives and false positives results, through a penalty parameter, λ. The optimal decision rule minimizes E(L(Γ, a)|data) and corresponds to a threshold on the posterior probability of selection, i.e. a * (s, t) = 1 if P P S(s, t) > 1/(1 + λ), and a * (s, t) = 0 otherwise (Sun et al., 2015). The value of λ can be chosen so to guarantee a small expected Bayesian FDR, where h = 1/(1 + λ) (Newton et al., 2004). For example, h may be chosen so that BF DR(h) < 0.05.

Detection of P&I outbreaks from CDC data
In this Section, we illustrate the performances of the proposed spiked CID process prior for the analysis of weekly data on pneumonia and influenza (P&I) mortality. The data are made publicly available by the U.S. National Center for Health Statistics (NCHS), a section of the Centers for Disease Control and Prevention (CDC)(http://www.cdc.gov/ nchs/), and they are released weekly, in order to provide a system capable of supporting near real-time surveillance. Here, we focus on weekly percentage of P&I death in the continental U.S. states, over the years 2011-2014. This leads to a total of n = 48 regions and T = 208 time points. P&I data are often used as a proxy measure for influenza activity and have been extensively employed for the detection of influenza epidemics. They are considered a reliable and specific endpoint for studying timing and amplitude of influenza related mortality, both at the local and national scales.
We also comment on the inference obtained by the thresholding approach in Muscatello et al. (2008), which is similar to the method currently employed by the CDC for their analyses. The method relies on a threshold approach, which is based on an estimate of a non-epidemic seasonal baseline. However, since the threshold is based on a nationally defined baseline, local patterns of influenza outbreaks may be difficult to detect. The model also assumes spatial and temporal independence of the outbreaks (Amorós et al., 2015), which may result in decreased power of the testing procedures. To overcome those shortcomings, Martínez-Beneito et al. (2008) have proposed a method based on hidden Markov switching models to identify and cluster epidemic periods over time, although with no spatial dependence. They also assume only two levels of the process, whereas our method allows to cluster outbreaks at multiple levels. We also consider two purely Bayesian nonparametric methods, namely the spike-and-slab Dirichlet Process prior (SS-DP) of Kim et al. (2009) and the spike-and-slab Pitmann-Yor prior (SS-PY) of Canale et al. (2017), which assume exchangeability of the data. A more comprehensive comparison on simulated data is presented in Section 4.
To provide a limited seasonal adjustment, we first center the P&I weekly percentage of death data with respect to their yearly mean. We then fit the spiked CID model (3)-(7) by setting G * (·) ≡ N (0, 25), τ s ∼ IG(a 0 = 2.001, b 0 = 0.04), corresponding to a vague specification centered around a mean of 0.04, and the Beta distribution for the weights as in (7). Lastly, we set the parameters of the MRF in (11) to d = −1 and e = 0.5, based on the sensitivity analysis presented in Section 4. In the implementation of the SS-DP and SS-PY models, we set the base measure as a weighted mixture of a point mass at zero and a N (0, 25), with a Beta(1, 1) prior on the mixture weight. Additionally, we set the concentration parameter to 1, and set the discount parameter of the Pitman-Yor process to 0.5. The results reported below are obtained by running MCMC chains with 10,000 iterations and a burn-in of 2,000. Using a 4-core 2.5GHz R Intel core i5 processor with 8 GB memory, our code takes approximately 90 minutes to run. We assessed convergence by inspecting the MCMC traces, and more formally by using the Geweke (1992) and Heidelberger and Welch (1981) tests. As an example, the z-score from the Geweke (1992) test was −0.01957 for Γ, whereas the Heidelberger and Welch (1981) test returned a p-value of 0.835.
We first illustrate our inference on the estimation of the matrix Γ, obtained as outlined at the end of Section 2.3, and the values of the process θ(s, t). Figure 1(a) shows a cloropleth map of the P&I mortality rates for the time window between the 47 th week of 2012 and the 3 rd week of 2013. The map provides an illustration of the spatio-temporal evolution of P&I mortality rates. For example, the last row shows higher intensities of P&I mortality across all states, with nearby States characterized, on average, by similar values. Figure 1(b) shows the results of our inference, namely the outbreaks detected by our method, after thresholding the PPS using a BFDR control at the α = 0.05 level. The states identified as under an epidemic state are colored in black, whereas the  As a further illustration of the temporal dependence, Figure 2 shows the centered time series of the weekly P&I mortality rates for the state of Oklahoma over 2011-2014. The black dots indicate the outbreaks flagged by our model, whereas the results of the CDC thresholding approach are indicated by empty circles. For the sake of graphical clarity, we report the plots for the SS-DP and SS-PY in the supplementary material, and only briefly discuss the comparison here. All methods identify outbreaks in the Winter, and more specifically close to the end of one year and the beginning of the next. Indeed, it is known that P&I epidemics usually start in this period, with patterns that are quite irregular and with unpredictable persistence over time, as highlighted also by our results. The method employed by CDC identifies also some isolated time points, which are not detected by any of the other methods. Those isolated peaks occur at times generally characterized by low P&I mortality, which may be due to the use of a national baseline for threshold determination, possibly affecting the sensitivity of the method. On the other hand, SS-DP and SS-PY share similar performances and tend to select only higher values of P&I mortality rates. Based on the simulation study in Section 4, we speculate this might be due to the fact that those methods do not account for the spatio-temporal dependence between observations. Instead, our method clearly identifies temporally persistent clusters of P&I outbreaks, which may provide relevant information to delineate the duration of the outbreaks.
We further compared our results with those obtained fitting a Markov switching model on each series of differenced incidence rates, following the guidelines in Martínez-Beneito et al. (2008). More specifically, we set 0.001 and 25 as respectively the lower and upper bounds to specify the uniform distribution for the variance parameters in The first dynamic is of lower variance and corresponds to low incidence rate levels, while the other one is of relatively higher variance and corresponds to high incidence rate levels. Although this assumption may be appropriate to describe the rates of influenzalike illness data, it seems not to be realistic to capture the behavior of the P&I mortality data.
Lastly, Figure 3 shows the effect of modeling spatial dependence on within-series clustering, for the States of New York, Pennsylvania and Ohio. Each cluster partition is obtained by post-hoc analysis of the MCMC iterations. For the sake of illustration, here we show the partitions obtained from the last MCMC iteration. Alternatively, one could find a modal partition (Kim et al., 2009). Each cluster is represented by a different shape in Figure 3. Observations assigned to the same cluster show similar mortality rate levels, with clear temporal persistence. Also, the clusters of those time points characterized by high mortality rates (i.e., significantly above the yearly mean) have similar patterns across the three neighboring regions.

Simulation study
In this Section, we assess the performance of our model on a set of simulated scenarios, and compare the results with the threshold based model of Muscatello et al. (2008), which is similar to the method currently employed by the CDC, the Markov switching model of Martínez-Beneito et al. (2008), the SS-DP of Kim et al. (2009) and the SS-PY of Canale et al. (2017). We assume D as the spatial lattice comprising the 48 US continental States and consider two data generating mechanisms, mimicking the type of spatio-temporal dependence of epidemic incidence data.
Data Generating Processes: First, we consider a deterministic model of spatio-temporal propagation of epidemic outbreaks across the US, and the corresponding recovery of a non-epidemic situation (baseline). More specifically, we assume that a process may spring from the North West of the United States, say the state of Washington, at time t, based on a first order HMM with only two latent states, characterizing the epidemic and non-epidemic processes. At time t + 1, the process originated in the state of Washington spreads to nearby states, say Oregon and Idaho, following the pattern defined by state borders. The process then propagates similarly in neighboring states at times t + 2, t + 3, . . .. This deterministic process generates a strong spatio-temporal dependence in the selection of the baseline and alternate processes.
The second scenario is designed to simulate more localized spatial dependence. Specifically, we subdivide all the continental US states into nine divisions (New England, Middle Atlantic, South Atlantic, East South Central, West South Central, East North Central, West North Central, Mountain, and Pacific), using the R function state.division. For each of those divisions, we simulate a time series of binary indicators γ(·, t), according to a two states HMM. We assume the same realizations γ(s, t), t = 1, . . . , T , for all states s in a division. Yet, we allow for a range of different distributions to generate each observation y(s, t) under either the baseline model f 0 (γ(s, t) = 0) and the alternate model f 1 (γ(s, t) = 1), as we detail further below. We set the transition probability matrix of the HMM modeling the change points between epidemic and non-epidemic processes (only in the state of Washington in the first scenario; in each division in the second scenario) to γ(s, t) = 0 γ(s, t) = 1 γ(s, t − 1) = 0 0.9 0 .1 γ(s, t − 1) = 1 0.2 0 .8 , with initial condition γ(s, 1) = 1. The diagonal elements contain the probabilities of persistence, which are assumed relatively higher for the epidemic than for the nonepidemic process. In particular, we assume a low probability of transitioning to an epidemic state, P r(γ(s, t) = 1|γ(s, t − 1) = 0) = 0.1, reflecting the irregular nature of epidemic periods.
In both scenarios, the observations are generated independently for each state according to the following procedure. If γ(s, t) = 0, we assume y(s, t)|(γ(s, t) = 0) On the other hand, if γ(s, t) = 1, we assume y(s, t)|(γ(s, t) = 1) i.e. a mixture of L components with weights c l , l = 1, . . . , L. More precisely, here we show the results for a four component mixture of normal distributions, f 1 ≡ 0.25N (−1, τ 2 sim ) + 0.25N (−0.5, τ 2 sim ) + 0.25N (0.5, τ 2 sim ) + 0.25N (1, τ 2 sim ). We further set τ 2 sim = 0.04 and the number of time points T = 200. We note that in both scenarios the data generating processes are different than the proposed model, and allow for both higher and lower process means with respect to the baseline. Furthermore, the mixture framework allows to cluster the realizations of the process, without imposing any temporal persistence to such clusters, contrary to the assumptions of our model (6).
Model Fitting: We employ our Spiked Process prior model (6)-(8). We use a vague specification for the base measure, namely G * (·) ≡ N (0, 5 2 ). We choose a vague inverse gamma distribution for the variance of the measurement error (3), i.e. τ 2 s ∼ IG(a 0 = 2.001, b 0 = 0.04), which leads to a prior mean of 0.04 with relatively large variance. Unless otherwise noted, we present here the results obtained by informing spatial dependence via the MRF prior with d = −1 and e = 0.5. We ran 10,000 iterations of the MCMC scheme, discarding the first 1,000 iterations as burn-in. We assessed convergence by inspecting the MCMC sample traces for all parameters. We further tested the convergence and stationarity of the chains by the Geweke (1992) diagnostic test and the Heidelberger and Welch (1981) approach, respectively. Figure 4(a) shows the map of the posterior probabilities of selection P P S(s, t) for one of the simulated datasets in the first scenario, across the 48 US continental states and for 6 representative time points (100-105). The maps illustrate the spread of an epidemic process both spatially and temporally. Figure 4(b) shows the corresponding map of binary indicators γ(s, t) of the true spatio-temporal processes over the same period, with black-colored states indicating γ(s, t) = 1. Our method appears to recover the true spatio-temporal pattern of the processes quite closely. Some results for the second scenario are reported in the Supplementary material.

Posterior inference and sensitivity analysis:
Next we investigate the effect of the choice of the MRF parameters d and e on the selection performances. It is well known that allowing e to vary widely can lead to a phase transition problem, i.e. the expected number of selections increases greatly for small increments of e. This is warranted by (11) being an increasing function of the number of indicators γ(s, t) equal to 1. Therefore, a careful selection of the parameter e is necessary to avoid inaccurate inferences, due to too many false selections. Li et al. (2015) show that e should be small when the average number of neighbors is large; viceversa, a small d is required to induce small prior odds of θ(s, t) ∼ f 1 for a large number of sites. Indeed, for e = 0, the MRF prior reduces to an independent Bernoulli, with parameter exp(d)/[1 + exp(d)], and thus can be used to provide a lower bound on the prior probability of selection. As for e, any value of e below the phase transition point can be considered a good choice, with values closer to the phase transition point leading to higher prior probabilities of selection in a neighborhood of sites already assigned to the alternate process.
Here, we report the effect of different choices of the parameters d and e in terms of the resulting specificity, sensitivity and accuracy rates. For each s and t, sensitivity  and accuracy is the ratio of observations correctly assigned to either the baseline or alternate processes (TP+TN) over the total number of observations. Table 1 shows the average and standard error of the previous measures, obtained for each of the two scenarios over 20 independently generated datasets, for varying values of e with d = −1 fixed. As expected, the table suggests that greater values of e generally lead to increased sensitivity at the expense of the specificity. In the first scenario, setting e = 0.6 leads to the best performance in terms of accuracy with e = 0.5 having similar performance, whereas in the second scenario the best performance is obtained for e = 0.5. Those results suggest that an optimal setting 0.5 ≤ e ≤ 0.6, and we discuss the case e = 0.5 for the results presented in this manuscript. For e = 0, the model assumes no spatial dependence, and deviations from the baseline process are detected only based on each single time series, since the MRF reduces to a Bernoulli prior. We observe markedly increased sensitivity and accuracy for considering the spatial information in the selection prior, with only a small reduction in specificity. We also explored the effect of different choices of the sparsity parameters d, for which accuracy appears to be maximized at d = −1.(see Table 2).

Performance comparisons:
We further compare the performances of our method with the thresholding method used by CDC (Muscatello et al., 2008), the Markov Switching model in Martínez-Beneito et al. (2008) (using the public open source code of Conesa et al., 2009) and the Bayesian nonparametric methods, SS-DP and SS-PY. The comparison results are averaged over the 20 independently generated datasets. The thresholding procedure of Muscatello et al. (2008) achieves high specificity, but performs much worse in terms of sensitivity, and accuracy: in the first scenario, the CDC procedure achieves 0.4245(0.0678) sensitivity, 0.9998(0.0002) specificity, and 0.7993(0.0635) accuracy; in the second scenario, 0.4273(0.0369) sensitivity, 0.9997(0.0004) specificity, and 0.8102(0.0275) accuracy. The Markov switching model performs quite well, comparably  with our model. This is in line with our expectations, since the data are simulated from a HMM with two states. The performance of the Markov switching model slightly decreases in the presence of stronger spatial dependence: in the first scenario, the model achieves 0.7747(0.0931) sensitivity, 0.9832(0.0018) specificity, and 0.9046(0.0423) accuracy; in the second scenario, the achieved performance is 0.7905(0.0869) sensitivity, 0.9820(0.0041) specificity and 0.9164(0.0246) accuracy. However, the Markov switching model assumes the differenced time series are realizations from a mixture with two hidden dynamics, characterized by low and high variance. This may limit the ability of the model to detect outbreaks, if more than two states are reasonably supported by the data, as in our data analysis. The results from the comparison with the two exchangeable Bayesian nonparametric models confirms the conclusions from the data analysis: failing to account for spatio-temporal dependence leads to a loss in sensitivity, despite maintaining high specificity. Specifically, in the first scenario, the SS-DP achieves 0.6526(0.06163) sensitivity, 0.9935(0.0019) specificity and 0.8768 (0.0359) accuracy, whereas the SS-PY achieves 0.66051(0.0625) sensitivity, 0.9932(0.001) specificity, and 0.8810(0.0222) accuracy. In the second scenario, the SS-DP achieves 0.6725(0.0451) sensitivity, 0.9930(0.0021) specificity and 0.8869(0.0209) accuracy, whereas the SS-PY achieves 0.6835(0.0322) sensitivity, 0.9922(0.0010) specificity and 0.8879(0.012) accuracy. Those performances are slightly worse than those achieved by our Spiked-CID process with no-spatial dependence (e = 0, see Table 1). Therefore, we are led to conclude that borrowing information across time and sites leads to more stable inference and higher accuracy for the Spike-CID process with respect to exchangeable Bayesian non-parametric priors.
Prediction: Outbreak detection typically relies on the nearly real-time monitoring of carefully chosen proxy measures, and not on the prediction of future values of the observed process. This is due to the difficulty of identifying the early determi-nants/predictors of an outbreak in syndromic surveillance. For example, P&I data are usually considered as proxy measures for influenza activity. The analysis of past data is important for monitoring disease trends. At the same time, detection of past outbreaks is meaningful in evaluating the effectiveness of disease control programs and policies. Thus, most models typically employed for outbreak detection do not focus on statistical prediction. Nevertheless, here we investigate the prediction performance of the Spiked-CID process in a leave-one-out simulation, where we assume the value of the process at the last available time-point, y(s, T ), is not observed. More in detail, let γ * = {γ * (s)} indicate the vector comprising the true epidemic status at each location at time T . Then, we can compute the posterior predictive probability of assigning the correct epidemic status, p(γ(s, T ) = γ * (s)| y(s, 1), . . . , y(s, T − 1)) = p(γ(s, T ) = 1|θ (t−1) (s)) × p(θ (t−1) (s)|y(s, 1), . . . , y(s, T − 1)), using Monte Carlo composition sampling from the MCMC posterior samples. Results are obtained for the 48 states, averaged across 20 simulated datasets, according to the data generating mechanisms previously outlined. The assessment about the epidemic status is conducted by thresholding the posterior predictive probability at the median, i.e. we setγ(s, T ) = 1 if p(γ(s, T ) = 1| y(s, 1), . . . , y(s, T − 1)) > 0.5. In the first simulation scenario, our method correctly detects the true status 69.8% of the times (median across all datasets, IQR 40.6% − −98.4%); in the second simulated scenario, which describes more localized patterns of spatial dependence, the method correctly detects the true status 62.5% of the times (median across all datasets, IQR 55.2% − −83.3%).

Discussion
We have proposed a Bayesian nonparametric approach for modeling temporary and localized changes in the distribution of a spatio-temporal process. Our proposal defines a spiked process prior, where clustering-inducing CID species sampling prior processes and a Markov random field prior are used to identify and propagate changes over time and space. We show good performances of our methods in the analysis of CDC data on P&I mortality rates and in a simulation study. Several generalizations of the proposed modeling framework are possible. The Markov random field prior formulation is particularly suitable for the analysis of spatio-temporal areal data. For point-referenced data, spatial selection could be obtained via a latent logistic regression with a Gaussian spatial random effect. Alternatively, it could be possible to incorporate spatial dependence in a more direct way, by defining interacting partially CID sequences (Fortini et al., 2017). Scalability of the MCMC posterior sampling is a limitation of the proposed method. However, CID sequences provide a promising flexible framework for scalable and fast implementation through sequential Monte Carlo approaches and approximate MCMC algorithms. Those extensions are the objective of ongoing work.