Bayes factors for partially observed stochastic epidemic models

We consider the problem of model choice for stochastic epidemic models given partial observation of a disease outbreak through time. Our main focus is on the use of Bayes factors. Although Bayes factors have appeared in the epidemic modelling literature before, they can be hard to compute and little attention has been given to fundamental questions concerning their utility. In this paper we derive analytic expressions for Bayes factors given complete observation through time, which suggest practical guidelines for model choice problems. We extend the power posterior method for computing Bayes factors so as to account for missing data and apply this approach to partially observed epidemics. For comparison, we also explore the use of a deviance information criterion for missing data scenarios. The methods are illustrated via examples involving both simulated and real data.


Introduction
This paper is concerned with the problem of choosing between a small number of competing infectious disease transmission models, given partial observation of an epidemic outbreak through time. A key reason to consider such a problem is that the models represent different hypotheses about disease spread, such as the infection mechanism, the nature of the contact structure between individuals in the population, or other disease characteristics. A secondary reason to compare models is that they are often used to simulate potential future outbreaks, perhaps with a view to designing control strategies, in which case it is computationally more efficient to employ the simplest possible model which can represent observed data reasonably well. Note that it is typically the case that we only have a few models under consideration, in contrast to variable-selection problems that occur in regression modelling. Also, our focus here is not on the assessment of model fit, itself concerned with whether or not a single specific epidemic model adequately describes the data to hand.
Within the epidemic modelling literature, there is to date no definitively preferred method for model choice. In the Bayesian setting, approaches include the use of Bayes factors, (e.g. Neal and Roberts, 2004;O'Neill and Marks, 2005), criteria such as the Deviance Information Criterion (DIC) (e.g. Worby, 2013;Lau et al., 2014;Deeth et al., 2015), and methods based on the predictive distribution of future outbreaks (Zhang, 2014). Here we focus on the use of Bayes factors. The most common approach to calculating Bayes factors for epidemics has been to use reversible jump Markov chain Monte Carlo methods (Neal and Roberts, 2004;O'Neill and Marks, 2005), although these are often problematic in practice due to the challenge of designing efficient algorithms. Touloupou et al. (2015) used a combination of MCMC methods and importance sampling to estimate marginal likelihoods, from which Bayes factors can be calculated. Knock and O'Neill (2014) used a path-sampling method (Gelman and Meng, 1998) to compare epidemic models given data on the final outcome of the epidemic. This is somewhat related to the methods we develop, albeit for different kinds of data.
In this paper we extend the power posterior method for calculating Bayes factors (Friel and Pettitt, 2008;Friel et al., 2014) to a missing-data situation that commonly occurs in epidemic modelling. In addition to this computational method, we also derive analytic expressions for Bayes factors in the setting where an epidemic outbreak is completely observed. Although such detailed observation is not that common in practice, our results are of theoretical interest and also provide practical insight into the choice and influence of prior distributions for parameters of the competing models. For comparison, we also consider a form of DIC suitable for missing data.
Throughout the paper we focus on the Susceptible-Infective-Removed (SIR) epidemic model, the most widely-studied stochastic epidemic model. However, the computational methods we develop could be applied to more complex models. For illustration we consider two specific kinds of model choice question: one in which models with different infectious period distributions are compared, and one in which models with different infection mechanisms are compared. Again, other comparisons are possible using our methods.
The paper is arranged as follows. Section 2 recalls preliminary information on epidemic models, Bayes factors, DIC methods and power posterior methods, extending the latter to a missing-data situation. In Section 3 we derive analytical expressions for Bayes factors given completely observed outbreaks, and in Section 4 we describe computational methods for partially observed outbreaks. Section 5 contains illustrative examples of the methods, and concluding comments are found in Section 6.

The stochastic SIR epidemic model
The stochastic SIR epidemic model is defined as follows (see e.g. Andersson and Britton, 2000). Consider a closed population of N individuals. At any time, each individual is either susceptible, infective or removed. Susceptible individuals have not contracted the disease but are able to do so. Infective individuals have the disease and can pass it on to others. Removed individuals are no longer infective and play no further part in disease spread. In applications, the removed state is context-specific and could correspond to immunity, isolation, death, or similar outcomes. The population initially consists of n susceptible individuals and a infectives who have just become infective. Each infective individual remains so for a period of time, called the infectious period, which is drawn from a specified non-negative probability distribution T I . At the end of its infectious period an individual is immediately removed. During its infectious period, a given infective has contacts with a given susceptible in the population at times given by the points of a Poisson process of rate βn −1 . The first such contact, if it occurs, results in the susceptible immediately becoming infective, and later contacts have no effect. All infectious periods and all Poisson processes are mutually independent. The epidemic ends as soon as there are no more infectives remaining in the population.
For any time t, denote by X(t) and Y (t) respectively the numbers of susceptible and infective individuals currently in the population. It follows that infections occur in the population according to a Poisson process of rate βn −1 X(t)Y (t). We will also consider a variant of the SIR model in which the overall infection rate is βn −1 X(t)Y p (t) for p ∈ (0, 1). Such models were first introduced by Severo (1969) and relax the assumption that the overall infection rate increases linearly with Y (t) (see also O'Neill and Wen, 2012, and references therein). We will focus on two particular choices of infectious period distribution, namely exponential and gamma, both of which frequently appear in the epidemic modelling literature. Finally, we will assume throughout that there is one initial infective, although this constraint can easily be relaxed.

Bayes factors
Suppose we have two competing epidemic models, m 1 and m 2 , with parameters θ 1 and θ 2 , respectively, and that we have observed data y. The Bayes factor for m 1 relative to m 2 is defined by where here, and throughout the paper, π denotes a probability mass or density function, as appropriate. For partially observed epidemic models, the likelihood term π(y|θ) is typically intractable, and a common approach is to introduce auxiliary variables, x say, such that the augmented likelihood π(y, x|θ) is available in closed form and can be computed efficiently. Estimation of the posterior distribution of θ can then be achieved via Markov chain Monte Carlo methods (Gibson and Renshaw, 1998;O'Neill and Roberts, 1999). In most situations, x will describe the infection process, since this is usually unobserved. The computation of Bayes factors is, in general, a challenging problem. Many approaches exist, but here we focus specifically on the power posterior method. One attractive aspect of this approach is that it is relatively prescriptive, meaning that the user does not have that many implementation choices which could seriously affect the performance of the resulting algorithm. Also, as we now explain, the method can be extended to cover the kind of missing data problem that usually arises in the context of epidemic modelling.

The power posterior method for models incorporating missing data
We now extend the power posterior approach (Lartillot et al., 2006;Friel and Pettitt, 2008;Friel et al., 2014) to models incorporating missing data. The method itself provides a way of calculating the marginal density π(y|m i ), i = 1, 2, from which the Bayes factor can be obtained. Let y and x denote the observed and the missing data respectively with θ representing the model parameters. We refer to (y, x) as the complete data. Note that in the epidemic settings that we will consider, neither π(y|θ) nor π(y|x, θ) are typically tractable, but we can compute the augmented likelihood function π(y, x|θ).
We shall evaluate the final integral numerically, by evaluating it at a finite number of t values, namely 0 = t 0 < t 1 < . . . < t r = 1. To reduce the resulting approximation error, we follow Friel et al. (2014) and make use of the fact that the gradient of the expected log-likelihood curve equals its variance. Specifically, Hence, Using the corrected trapezoidal rule form (Atkinson and Han, 2004), we have the following extended power posterior estimate of log(π(y)) : Algorithm 1 can be used to implement the extended power posterior method for missing data models. We follow the recommendation of Friel and Pettitt (2008) for the choice of t j values in (2). This choice ensures that many of the t j values are close to zero, where the expected log-likelihood curve often changes rapidly in practice. The t j values are often called temperatures, and the collection of values called a temperature ladder, this terminology arising because the power posterior method is a form of so-called thermodynamic integration.

DIC for models with missing data
Although Bayes factors are our primary focus, for comparison we will also compute a form of DIC suitable for missing data situations. Celeux et al. (2006) propose various options; the one best-suited to our setting, in the sense that it is suitable for situations where the missing data are not our main focus, and we can compute it, is DIC 6 = −4E θ,x|y [log(π(y, x|θ))] + 2E x|y,θ [log(π(y, x|θ))]. ( Calculation of this quantity requires two runs of an MCMC algorithm. The first is to derive E θ,x|y [log(π(y, x|θ))] and E θ|y (θ|y) =θ, and the second run Algorithm 1 MCMC algorithm for estimating the marginal likelihood via the missing-data power posterior approach 1. Initialise algorithm with x 0 and θ 0 .
is to obtain E x|y,θ [log(π(y, x|θ))] setting θ =θ and allowing x to vary. In our epidemic setting,θ contains posterior point estimates of the model parameters, the identity of the initial infective and the initial infection time. The preferred model from those under consideration is the one with the lowest DIC 6 value.

Model selection given complete outbreak data
In this section we show that Bayes factors for the epidemic models of interest can be computed explicitly if complete data are available. This situation is rare in practice, although it can arise when outbreaks are being closely monitored (e.g. in the early stages of a suspected major epidemic, or in experimental settings for animal diseases). Nevertheless, we gain some insight into the value of Bayes factors as a tool for model choice, particularly with respect to the choice of within-model prior distribution.

The SIR model with different infectious periods
Suppose we observe an epidemic among a population of N individuals of whom initially n are susceptible and one is infective. Denote by n R the total number of individuals ever infected, including the initial infective, and label these n R individuals 1, . . . , n R . The remaining individuals are labelled n R + 1, . . . , N . For j = 1, . . . , N let I j and R j denote, respectively, the infection and removal time of individual j, with I j = R j = ∞ for j > n R . Let z denote the label of the initial infective, so that I z < I j , for all j = z. Finally let I = (I 1 , . . . , I z−1 , I z+1 , . . . , I n R ) denote the vector of infection times of infected individuals other than the initial infective, and let R = (R 1 , ..., R n R ) denote the vector of removal times of all infected individuals.
We consider two competing SIR models with identical infection mechanisms but different choices of infectious period distribution T I . Specifically, model m 1 has T I ∼ Exp(γ) and model m 2 has T I ∼ Gamma(α, δ) with shape parameter α assumed known. The likelihoods of (I, R) under the two models are Kypraios (2007). By assigning an independent gamma prior distribution for each of the model parameters, namely Gamma(λ ζ , ν ζ ), where ζ = β, γ, δ, the Bayes factor can be derived explicitly in this case as follows. . (4) The resulting Bayes factor is independent of the infection rate prior parameters. This is a consequence of the fact that the likelihood expressions can be factorized into parts corresponding to the infection and removal processes, and the former are the same in both models. Note also that the Bayes factor is identical to that obtained by comparing exponential and gamma distributions for a sequence of independent and identically distributed observations R 1 − I 1 , . . . , R n R − I n R , although in our setting things are slightly different because the observations and n R are not independent.
In the epidemic modelling literature, it is often the case that prior parameters for positive quantities such as rate parameters are assigned the same vague prior distributions. Here, this assumption gives λ γ = λ δ = λ and ν γ = ν δ = ν, where λ ≥ 1 and ν is a small positive number. The Bayes factor in (4) then becomes . (5) Reformulating (5) in terms of the mean and variance of the prior distribution yields , where µ = λ/ν and σ 2 = λ/ν 2 . Thus as σ 2 → ∞, the prior becomes increasingly diffuse and BF 12 converges to its lower limit, that is However, as σ 2 → 0 the prior gets increasingly concentrated at µ, and the Bayes factor becomes more decisive in supporting m 1 , that is BF 12 → ∞. These results suggest that diffuse priors are more appropriate. Table 3.1 shows the results of a simulation exercise in which data sets were simulated under either m 1 or m 2 , and in each case BF 12 was calculated using (6). The resulting mean values, and the proportion of times that m 1 was favoured, are presented. It can be seen that the Bayes factor discriminates effectively between the two models, even in the relatively small population sizes of N = 30 and N = 50. As expected, increasing β increases the outbreak size which in turn makes the comparison more decisive.

The SIR model with different infection mechanisms
Consider now the situation where we have two competing SIR models with different infection mechanisms but the same infectious period distribution, the latter being essentially arbitrary. Specifically, m 1 is the standard SIR model in which infections occur at rate βn −1 X(t)Y (t), and m 2 the model in which infections occur at rate βn −1 X(t)Y p (t). We assume p ∈ (0, 0.5), so that m 1 and m 2 are clearly distinct, and that p is known. We assume, a priori, that (i) β ∼ Gamma(λ β , ν β ), and (ii) the parameters of the infectious period distribution are independent of β.  (1) while m 2 has T I ∼ Gamma(α, α), so that both models have the same mean infectious period. Each row gives parameter values and results from 1000 simulated epidemics from the true model in which at least one new infection occurred.
Adopting the notation and arguments of the previous section leads to where As expected, the Bayes factor only involves the infection process part of the likelihoods since the removal processes in the two models are assumed to be the same. Rewriting the prior distribution for β in terms of its mean and variance, σ 2 say, we find that and It is natural to suppose that the diffuse prior setting is a natural candidate for consideration. It is evident from (8) that larger Y (I j −) values in the product term will improve model discrimination; conversely, if all these values equal 1 then the product term is independent of p. This in turn suggests that we require either larger or faster-growing epidemics to effectively discriminate between m 1 and m 2 . This is illustrated by the results in Table 3.2, which shows that we require larger values of N than the infectious period distribution comparison of  the previous section in order to obtain clear evidence in favour of the true model, and that increasing β also improves discrimination. Also, as expected, as p decreases then m 1 and m 2 become less similar, which also makes discrimination easier.

Model selection given incomplete outbreak data
We now consider the situation in which we observe removal times but not infection times, which in turn means that the Bayes factors of interest are no longer analytically tractable. In this section we describe how to apply the extended power posterior methods in section 2.3 to the model comparison scenarios in sections 3.1 and 3.2. In both cases, Algorithm 1 requires an MCMC scheme that provides samples from the power posterior distribution for any given value of t j . Since the infection times are unobserved, these are included as additional components of the posterior distribution. Thus the required MCMC algorithm may be specified by defining the updates for each of the model parameters, the details of which are given below. We also briefly explore the performance of the power posterior methods and DIC 6 , via simulation studies. It should be noted that such simulations are highly computationally expensive and time-consuming, since we require separate runs of an MCMC algorithm for every single t j value in the temperature ladder. Throughout we set c = 5 so that t j = (j/r) 5 . In all cases, the results are based on MCMC runs of 27,000 iterations of which the first 2,000 were discarded as burn-in, and then thinned by taking every 5th value. Convergence and mixing were assessed visually, and found to be satisfactory.

The SIR model with different infectious periods
Recall the models and notation from section 3.1. The parameters β, γ and δ are assigned independent exponential prior distributions Exp(λ ζ ), where ζ = β, γ, δ. We assume a priori that the initial infection time satisfies I z = R min − Y , where Y ∼ Exp(ψ) and R min = min {R 1 , . . . , R n R }, and that the initial infective z is equally likely to be any of the n R infected individuals.
At temperature t, the full conditional power posterior distributions for β, γ and δ are β|t, γ, z, I z , I, R ∼ Gamma 1 + t(n R − 1), λ β + tn −1 A , γ|t, β, z, I z , I, R ∼ Gamma and the full conditional power posterior density for (I, z, I z ) under model m 2 is given by while the corresponding expression for model m 1 is obtained by setting α = 1 and δ = γ. An MCMC algorithm to update the model parameters and infection times then consists of (i) updating β, γ and δ according to their full conditional distributions, and (ii) updating infection times using a suitable Metropolis-Hastings step as in O' Neill and Roberts (1999); full details can be found in Alharthi (2016).
The DIC 6 calculation involves two steps. An initial MCMC run provides point estimates of the model parameters, the identity of the initial infective and the initial infection time. The model parameters and initial infective conditions are fixed for a second MCMC run in which the remaining infection times are allowed to vary. From each run we also obtain an estimate of the posterior mean of the augmented log-likelihood, from which DIC 6 can be computed according to (3).

Simulation study
We now briefly assess the impact of the temperature ladder (i.e. the set of t j values in Algorithm 1), the within-model prior distribution, and the size of the observed epidemic on the calculation of Bayes factors. We set ψ = 1 in the prior distribution of I z .
• Length of temperature ladder Table 3 shows the impact of the number of t j values, r, on the marginal likelihoods and corresponding Bayes factor BF 12 . The results are based on a single, but fairly typical, data set simulated from model m 1 with N = 30, β = 1, γ = 0.5 and in which n = 22 individuals were infected in total. In model m 2 , α = 10. Two choices of prior distribution are illustrated. The results show that the estimates are relatively insensitive to the value of r, although as expected more t j values are required as the prior distribution becomes more diffuse. Figure 1 shows typical expected log-likelihood curves, and illustrates the sharp change near t = 0 which motivates the choice of temperature ladder t j = (j/r) c .

• Choice of prior distribution
It is well known that Bayes factors can exhibit strong dependence on the model parameter prior distributions. Here, we explore this issue via two simulated data sets from the two models under consideration. In both cases the data set itself was fairly typical of epidemics that did not die out quickly. For model m 1 we set β = 2, γ = 1 and N = 50 and obtained n R = 41 infected individuals. For model m 2 we set β = 2, α = 10, δ = 10 and N = 30, and obtained n R = 22. Prior distributions for β, γ and δ were set to be Exp(1), Exp(0.1) and Exp(0.01), for which we used r values of 20, 20 and 40 respectively, inspired by our previous findings regarding the length of temperature ladder. Results of the simulation study are summarised in Table 4. The expected loglikelihood curves for both SIR models are shown in Figure 1 in the case when the prior distributions are Exp(1). When model m 1 is the true model, the results
in Table 4 indicate, in general, that log(BF 12 ) values support the true model for all prior distributions with a noticeable decrease as the prior distribution becomes more diffuse. These findings are in harmony with the behaviour of the Bayes factor for complete data case described in section 3.1. When model m 2 is the true model, the value of log(BF 12 ) varies quite dramatically with different prior distributions. As the latter become more diffuse then m 2 is identified correctly, whereas using an Exp(1) prior gives the opposite conclusion. There is some intuition to explain this conflict, as follows. The true value of δ parameter used in the simulation is 10, so an Exp(1) prior is a strong prior distribution which is in conflict with the data and in turn results in poor posterior mean estimates for both β and δ, namelyβ = 1.286 andδ = 6.629. This in turn yields lower values of log(π(R|m 2 )) and thus m 2 is not identified correctly. However, with Exp(0.1) and Exp(0.01) priors, the estimation was improved givingβ = 2.053,δ = 11.490 andβ = 2.184,δ = 12.133, respectively, and consequently the correct identification of model m 2 was obtained. These results highlight the sensitivity of Bayes factors to prior distributions, but also suggest that in this situation diffuse priors are likely to be more appropriate. The values of DIC 6 are also prior-dependent. More seriously, model m 2 is preferred as the prior distributions become more diffuse, which suggests that DIC 6 is not a suitable tool for model discrimination in this setting.

• Size of outbreak
It is natural to suppose that, given a small epidemic outbreak, it is hard to effectively distinguish between competing models. Here we show that even small outbreaks can be informative in the setting where we compare infectious period distributions.
We simulated, under model m 1 with N = 50, β = 1.15 and γ = 1, 20 datasets of size n R = 5 removals each. For each data set we calculated log(BF 12 ) for the complete data (infection and removal times), and for incomplete data (removal times), the latter using r = 20, assuming that α = 10 in model m 2 . Two different prior distributions were used for β, γ and δ, namely Exp (1)  Complete data Incomplete data Figure 2: Boxplots of log(BF 12 ) values calculated using 20 simulated data sets of n R = 5 removals each simulated under the SIR model with T I ∼ Exp(γ) (m 1 ), while m 2 has T I ∼ Gamma(α = 10, δ). The log(BF 12 ) values were computed using (5) and the missing-data power posterior method for complete and incomplete data respectively. The parameters β, γ and δ were assigned two choices of prior distribution, namely Exp(1) and Exp(0.01).
Interestingly, the results are not as one might expect with a small outbreak data set, giving decisive support to the true model m 1 under both complete and incomplete data. These findings suggest that, in this particular epidemic setting, a few infectious periods of infected individuals might be enough for the Bayes factor criterion to favour the SIR model with exponential infectious period over the SIR model with gamma infectious period. The impact of the parameter prior distributions is also evident which is in agreement with our previous findings. Note also that there is more variance in the 20 Bayes factors for complete than incomplete data. This is essentially a consequence of the fact that calculating BF 12 for incomplete data involves averaging over the unobserved infection times, so the variability we obtain from the original simulations is reduced.

The SIR model with different infection mechanisms
We now consider the models in section 3.2, so that m 1 is the standard SIR model and m 2 has infection rate βn −1 X(t)Y p (t). Prior distributions are assigned as in section 4.1, and we also set p ∼ U (0, 0.5) a priori.
For model m 2 at temperature t, we have β|t, γ, p, I z , z, I, R ∼ Gamma 1 + t(n R − 1), λ β + tn −1 Conditional densities for p and I are given by The corresponding expressions for model m 1 can be obtained by setting p = 1.

Simulation study
We consider the factors described in section 4.1.1, and again set ψ = 1.
• Length of the temperature ladder Table 5 shows how estimates of marginal likelihoods and Bayes factors vary with r. The results are based on a simulation from model m 2 in which N = 100, β = 2, γ = 0.2 and p = 0.3 which resulted in n R = 87 infections in total.
Our findings are the same as those described in section 4.1.1, namely that as the prior distributions become more diffuse, more temperatures are required to provide an accurate estimates.

• Choice of prior distribution
We simulated one data set from each model, in both cases fairly typical. For model m 1 we set N = 100, β = 0.5, γ = 0.2 and obtained n R = 83 infections. For model m 2 we set N = 100, β = 2.5, γ = 0.2 and p = 0.3 and obtained n R = 88. Table 6 shows estimates of log(BF 12 ) and DIC 6 under three choices r log(π(R|m 1 )) log(π(R|m 2 )) log(BF 12 ) β, γ ∼ Exp (1) Table 5: Estimates of log(π(R|m 1 )), log(π(R|m 2 )) and log(BF 12 ) using data simulated from the SIR model with modified infection rate (m 2 ; N = 100, β = 2, γ = 0.2, p = 0.3 and n R = 87 infections), while m 1 is the standard SIR model. The parameters β and γ were assigned Exp (1)  of prior distribution, where we used r values of 20, 40 and 40 for Exp(1), Exp(0.1) and Exp(0.01) prior distribution, respectively. Results of simulations are displayed in Table 6. Again there is evidence of sensitivity of BF 12 to the choice of prior, although the results themselves show that the correct model is identified in all cases. Furthermore DIC 6 also performs well in this setting. Figure 3 shows plots of the expected log-likelihoods against the temperature t for the two simulated data sets. In contrast to Figure 1, here the curves for m 1 and m 2 are much closer together, indicating that it is harder to effectively discriminate between models with different infection mechanisms than with different infectious period, at least for the settings we have considered.

• Size of outbreak
We consider two scenarios for model m 1 , corresponding to small and large epi-demics. We simulated 20 epidemics for each, with the same number of removals n R . In the first scenario N = 50, β = 1.15, γ = 1 and n R = 7. In the second N = 50, β = 2, γ = 1 and n R = 42. We set p = 0.3 in model m 2 . Bayes factor calculations were carried out using r = 20, and under two different prior distribution assumptions for β and γ. Figures 4 and 5 illustrate the results. In contrast to the comparison of infectious period distributions, here we see that small outbreaks may not be sufficient to differentiate between models, which again agrees with our earlier findings that this is an inherently harder problem requiring more data. However, even larger outbreaks appear problematic. The most likely reason for this is that the key to differentiating between m 1 and m 2 is the number of infected individuals present in the population at the time of each infection, as discussed in section 3.2, as opposed to the outbreak size itself. This suggests that data from epidemics that grow quickly would be required to distinguish the competing models, at least for moderate population sizes.
To explore this further, we simulated 20 outbreaks of size n R = 47 from model m 1 with N = 50, β = 5 and γ = 1. Thus β/γ = 5, in contrast to the value of 2 shown in Figure 5. Figure 6 shows the resulting histogram of log(BF 12 ) values from which we see that the evidence in favour of the true model m 1 is much clearer.

Abakaliki smallpox data
We now briefly consider a widely-studied temporal data set obtained from a smallpox outbreak that took place in the Nigerian town of Abakaliki in 1967. The outbreak resulted in 32 cases, 30 of whom were members of a religious organisation whose 120 members refused vaccination. Numerous authors have considered these data by focussing solely on the 30 cases among the population of 120, and the time series of symptom-appearance times which in our notation is given by (0,13,20,22,25,25,25,26,30,35,38,40,40,42,42,47,50,51,55,55,56,57,58,60,60,61,66,66,71,76), where to set a time scale we set R 1 = 0. In fact, the original data set includes far more information, particularly on the physical locations of the cases and the other members of their households. Analyses of this full data set can be found in Eichner and Dietz (2003) and Stockdale et al. (2017).
Here our purpose is to illustrate our model comparison methods, and so we only consider the partial data set, specifically assuming that the 30 symptomappearance times correspond to removals in an SIR model. Several authors have previously considered departures from the standard SIR model for these data, and in particular both Becker and Yip (1989) and Xu (2015) considered models in which the infection rate was allowed to vary through time. Here we address this issue by comparing the standard SIR model, m 1 , with a model (m 2 ) in which the infection rate parameter β is replaced by β(t) = βn −1 e −bt , with Complete data Incomplete data Figure 4: Boxplots of log(BF 12 ) values calculated using 20 simulated data sets of n R = 7 removals each simulated under the standard SIR model (m 1 ), while m 2 has modified infection rate βn −1 X(t)Y p (t), p = 0.3. The log(BF 12 ) values were computed using (7) and the missing-data power posterior method for complete and incomplete data respectively. The parameters β, γ and δ were assigned two choices of prior distribution, namely Exp(1) and Exp(0.01).
b > 0 a model parameter to be estimated from the data. It is straightforward to modify our methods to this situation.
Results are presented in Table 7 for different prior distributions for β and γ for both SIR models, where b and R 1 −I 1 are assigned Exp(1) prior distributions throughout. Figure 7 shows plots of the expected log-likelihoods against the temperature t when the prior distribution is Exp(1), using a temperature ladder with t j = (j/r) 5 , j = 0, . . . , 20 and r = 20.  Table 7: Results of log(BF 12 ) and DIC 6 for smallpox data based on the standard SIR model (m 1 ) and the modified SIR model (m 2 ). For each case, the parameters β and γ were assigned prior distributions as indicated.
From Table 7, there is clearly little to choose between the two models. This Exp (1) Exp ( Complete data Incomplete data Figure 5: Boxplots of log(BF 12 ) values calculated using 20 simulated data sets of n R = 42 removals each simulated under the standard SIR model (m 1 ), while m 2 has modified infection rate βn −1 X(t)Y p (t), p = 0.3. The log(BF 12 ) values were computed using (7) and the missing-data power posterior method for complete and incomplete data respectively. The parameters β, γ and δ were assigned two choices of prior distribution, namely Exp(1) and Exp(0.01).
comparison is borne out from parameter estimation from model m 2 , specifically Figure 8 which shows that b is close to zero. These findings are in keeping with those in Xu et al. (2016), in which a Bayesian nonparametric time-varying estimate of the infection rate parameter was found to be fairly close to constant over time.

Hagelloch measles data
Our second data set describes an historical outbreak of measles in 1861 in the German village of Hagelloch, as described in Pfeilsticker (1861). The outbreak was very severe, as every one of 188 individuals deemed to be susceptible became infected, these individuals all being children. These data have been considered by a number of authors (see e.g Britton et al., 2011, and references therein), and were specifically analysed in a model choice context in Neal and Roberts (2004), where the authors used reversible-jump MCMC methods to evaluate Bayes factors for a number of competing models. In view of our earlier discussion on the possible sensitivity of Bayes factors to within-model prior distributions, it is natural to explore this further for the Hagelloch data. The data themselves are unusually detailed, consisting for each case of the name, age, sex, date of symptom onset, date of rash onset, class of child in the village school, date of death if this occurred, location of the child's home and several other covariates. We shall adopt the transmission models described in Neal and Roberts (2004), defined as follows.
Each individual belongs to a household and the community at large, and either attends school or is of pre-school age. If an individual becomes infected, they first undergo a symptom-free infectious period T S which is assumed to be one of two models: either (i) a fixed length of one day, so T S = 1, or (ii) T S ∼ Gamma(30, δ). At the end of this period, the individual displays symptoms and subsequently develops a rash. The dates of both symptom and rash appearance are given by the data and so we do not model the time between these events. Following the rash appearance, the individual is assumed to be removed three days later, unless they die first (as indicated in the data). The infectious period is assumed to start upon initial infection, and finish at either removal or death.
While infectious, individual i has infectious contacts with susceptible individual j at rate α ij where α ij depends on the relationship of i and j. Neal and Roberts (2004) first considered a model in which where (i) 1 A denotes the indicator function of the event A, (ii) ρ(i, j) denotes the Euclidean distance between the households of individuals i and j, (iii) L i denotes the school classroom (either 1 or 2) which individual i belongs to, and (iv) L i = 0 if individual i is of pre-school age. The non-negative parameters β H , β 1 C and β 2 C denote the within-household, within-classroom 1 and withinclassroom 2 infection rates respectively. Finally, β G denotes a global infection rate while θ governs the extent to which distance between individuals reduces this infection rate. Neal and Roberts (2004) called the model described above the full model, denoted by M .
In order to investigate the relative importance of different transmission routes, Neal and Roberts (2004)   We employed an MCMC algorithm which targets the joint power posterior distribution of the parameters (θ, β H , β 1 C , β 2 C and β G when T S = 1, and additionally δ and the unknown infection times when T S ∼ Gamma(30, δ)) given the observed data. None of the full conditional power posterior distributions for the model parameters are standard, and so parameters were updated using Metropolis-Hastings steps. The variances were tuned manually to achieve an acceptance rate approximately between 25% and 40%. To estimate the marginal likelihood via the power posterior method we used a temperature ladder such that t j = (j/r) 5 , where j = 0, . . . , 100 and r = 100. Convergence and mixing was assessed visually, and found to be satisfactory.
Results are given in Table 8 for different prior distributions for the model parameters. In Neal and Roberts (2004), an Exp(0.1) prior distribution is assigned to the parameter θ and an Exp(10) to the parameters β H , β 1 C , β 2 C and β G . We also consider two alternative sets of prior distributions, specifically assigning an Exp(1) or an Exp(0.1) distribution to all the parameters in each model.
The results illustrate a certain amount of sensitivity to both the T S model and the chosen prior distributions. Regarding the former, the posterior mean of δ was found to be around 4, so that when infection times are not assumed to be known the length of the symptom-free period was estimated to be around 7 days. This is in stark contrast to the assumption that T S is one day, and goes some way to explaining the differences between the Bayes factors for the two model scenarios. Note also that our estimate of δ is almost certainly connected to the fact that measles actually has a latent period of around 7-10 days, a feature missing from the SIR model we consider; roughly speaking, increasing the infectious period length is one way to account for the time taken for the whole outbreak. We also briefly investigated what happens if T S was set equal to integer values from 2 to 7 days, and found that the average posterior loglikelihood increased accordingly, which also supports the view that the T S = 1 model is less appropriate for these data than T S ∼ Gamma(30, δ).
Regarding the Bayes factors in Table 8, the only robust conclusion appears to be that there is strong evidence that β 1 C > 0, itself a key finding from Neal and Roberts (2004). For the T S ∼ Gamma(30, δ) model there is also strong support for the hypotheses that β 2 C > 0 and that θ = 0. These findings make sense, since as illustrated in Britton et al. (2011) it is clear from the raw data that the epidemic moves through the two school classes in turn, suggesting that classroom transmission is important, while there is no apparent evidence of purely spatial transmission.

Conclusions
We have described a method for computing Bayes factors for epidemic models, specifically by utilising an extension of the power posterior method to accommodate missing data of a certain kind. The methods appear to work reasonably well in practice. Although we have focussed on single population SIR models, the methods can be applied to more complex epidemic models, as illustrated by our examples featuring the Abakaliki smallpox data and the Hagelloch measles data.
In common with the original power posterior method itself, the main drawback with our methods is one of computational efficiency, specifically that it can be quite time-consuming to perform the numerous MCMC runs required for the method. However, the procedure is relatively easy to implement, and lends itself to parallel computation.
We have also demonstrated that Bayes factors can be obtained analytically for situations where infection times are known. This in turn enables us to explore the impact of different prior assumptions analytically. Although the infection process is not usually observed in real-life settings, if the infectious period does not exhibit that much variation then we can approximate it by a fixed value. Under this assumption the infection times would be known, which could enable explicit evaluation of Bayes factors for comparing different possible infection mechanisms or routes of infection, depending on the problem in question.