Seven challenges for metapopulation models of epidemics, including households models

This paper considers metapopulation models in the general sense, i.e. where the population is parti- tioned into sub-populations (groups, patches,...), irrespective of the biological interpretation they have, e.g. spatially segregated large sub-populations, small households or hosts themselves modelled as popu- lations of pathogens. This framework has traditionally provided an attractive approach to incorporating more realistic contact structure into epidemic models, since it often preserves analytic tractability (in stochastic as well as deterministic models) but also captures the most salient structural inhomogeneity in contact patterns in many applied contexts. Despite the progress that has been made in both the theory and application of such metapopulation models, we present here several major challenges that remain for future work, focusing on models that, in contrast to agent-based ones, are amenable to mathematical analysis. The challenges range from clarifying the usefulness of systems of weakly-coupled large sub-populations in modelling the spread of speciﬁc diseases to developing a theory for endemic models with household structure. They include also developing inferential methods for data on the emerging phase of epidemics, extending metapopulation models to more complex forms of human social structure, develop- ing metapopulation models to reﬂect spatial population structure, developing computationally efﬁcient methods for calculating key epidemiological model quantities, and integrating within- and between-host dynamics in models.


Introduction
The simplest epidemic models assume a homogeneously mixing population of homogeneous hosts, with each infective host being equally likely to make infectious contact with each susceptible host. Fundamental results and a great deal of insight have been gained from such models but, for anything but the smallest population, these assumptions are likely to be a serious oversimplification. It has therefore been important to see how epidemic transmission dynamics are affected by population structure. On the other hand, increasing computational power has allowed a wealth of largescale individual-based stochastic simulations to include an ever more detailed description of human society (see, e.g. Eubank et al., 2004;Ajelli et al., 2010, and references therein), which have been invaluable in answering specific questions of public health relevance, but which can suffer from known problems such as lack of robust parametrisation and limited insight in the key determinants of model output. However, we focus here on simpler models that aim to capture the essence of the social structure in a mathematically tractable fashion and in particular we focus on metapopulation models, leaving other modelling approaches, such as networks and other spatially explicit models, to other papers in the same issue (see Pellis et al. and Riley et al. this issue).
Metapopulation models (Levins, 1969;Hanski, 1999) were first introduced in ecology, for situations where a population can be divided into a number of geographically separated subpopulations. Here, we use the term more widely, to cover any division of a population into groups that influences infectious disease dynamics. In the models that we consider there is no migration http between groups. Typically, contacts between hosts in the same group will occur at a higher rate than those between hosts in different groups. Models can also allow for more than two levels of mixing, or for overlapping groups, as for example households, schools and workplaces.
The most common form of metapopulation model consists of a number of sub-populations, where each sub-population is assumed to be large. The structure may reflect the spatial separation of the sub-populations, in which case the contact rate might vary with spatial separation, although the simplest models have just two levels of mixing. From a mathematical point of view, the model is identical to that commonly referred to as a multitype model (see Diekmann et al., 2013) and has similarly involved both deterministic and stochastic approaches. However, apart from often not having the same spatial interpretation, multitype models often focus on single outbreaks of SIR type, while these metapopulation models have historically been used to investigate issues such as local and global extinction/persistence and critical community size, and therefore involve models with recovery (e.g. SIS, SIRS) or SIR models with demography.
When sub-populations are small, the mathematical problems are typically a lot more challenging and except in the case of constant recovery rates -in which a self-consistent ODE approach can be used (House and Keeling, 2008) -the vast majority of technical developments have been achieved in the framework of stochastic modelling, because of the intrinsically stochastic nature of the spread among small sets of individuals. It is not surprising that most progress has been made with models with just two levels of mixing (Ball et al., 1997;Ball and Neal, 2002), and particularly the case of household models (Becker and Dietz, 1995), in which there are a large number of small non-overlapping groups. Among all the features of a realistic human society that can affect disease spread, the household is one of the most important: most individuals live in a household; contacts with household members are frequent, long, and often closer than with others; ill individuals often stay home; data are available about household structure, composition (especially age stratification) and, in many cases, transmission intensity of various infections; control policies are often targeted at households; and compliance with control policies (e.g. vaccination) is often decided at the household level.
High-quality data for diseases among households are often collected, allowing models to be tested and parameters estimated (e.g. Cauchemez et al., 2009). However, such potentially informative statistical analysis was performed in only a minority of studies during the recent influenza A (H1N1) pandemic (House et al., 2012). Promoting wider adoption of these methods forms a significant part of the motivation for the challenges we present here.

Clarify the usefulness and limitations of systems of weakly coupled large sub-populations in modelling the spread of infections
When the strength of between-group transmission, often referred to as coupling, is negligible, epidemics in different subpopulations evolve essentially independently of each other; when it is large, then outbreaks occur simultaneously in all subpopulations ("synchrony"). Hence the interesting behaviour is when the coupling is relatively weak. The case of two subpopulations has been extensively studied (Keeling and Rohani, 2002) and was shown to be the limit of a model where individuals live in one location and work in the other, when the commuting becomes frequent and rapid. Furthermore, the temporal correlation between the numbers of infectives in the two sub-populations as a function of the coupling parameter was shown empirically to take a particularly simple, approximately sigmoidal, shape. The problem of synchrony with more than two-subpopulations is treated more extensively in Lloyd and May (1996) and Lloyd and Jansen (2004).
More complex metapopulation structures (especially when considered in stochastic, spatial and/or seasonally forced models) exhibit a variety of phenomena, which are relatively known and understood singly, but which generate a complex interplay of antagonistic forces (Grenfell and Harwood, 1997;Keeling et al., 2004). The overall behaviour of the system is therefore highly dependent on their relative strengths, and ultimately on parameter values and initial conditions. Such behaviour has been extensively studied for measles (Grenfell and Harwood, 1997;Keeling, 1997;Keeling et al., 2004;Ferrari et al., 2008, and references therein), but there remains the need to gain similar understanding for a wide range of other pathogens. This requires the development of advanced statistical tools for accurate, unbiased estimation of model parameters, in particular coupling strength, from collectable data. Also, more advanced model comparison tools need to be developed to match the coupling strength of a simple metapopulation model with the complex rules of human mobility at the base of spatially explicit models, such as diffusive, gravity or radiation models (Riley et al., in this issue).
Finally, Ajelli et al. (2010) noticed how the still oversimplistic mixing of metapopulation structure tends to lead to a larger fraction of population affected by an epidemic, compared to a corresponding agent-based model, and Keeling et al. (2010) showed that metapopulation-spatial models that do not keep track of each single individual's identity tend to consistently overestimate the epidemic speed and peak. Along the same lines, more work is needed to clarify in which contexts and for which questions of public health relevance, simple metapopulation models represent good enough caricatures of realistic models of human and animal societies, and when instead they are oversimplifications or lead to inaccurate predictions, and in what respects they are inadequate.

Develop a theory for endemic models with household structure
There has been very little work on metapopulation models with small group sizes for endemic diseases, with the only theoretical work to date being concerned with closed-population SIS models (e.g. Britton and Neal, 2010); modelling epidemiologically-relevant network dynamics is discussed in Pellis et al. (in this issue). To analyse the long-term behaviour of endemic diseases we need models which allow population structure to change over time and/or immunity to wane. Household models are a natural starting point, and these would have to incorporate the following demographic changes over time: births of new household members, deaths of household members and splitting of households (typically one person leaving the household to create a new household). It is probably easiest to assume that the overall population growsotherwise the population will eventually die out, thus precluding the possibility of "true" endemicity. Given such a dynamic demographic model one can then study what can happen if an infectious disease is introduced into the community. It makes sense to begin with a simple model where the disease is of SIR-type and where transmission is of two types: at some rate H infectious individuals infect any given household member, and at another rate G infectives infect globally (i.e. individuals chosen uniformly at random from the population).
Although progress has been made by Glass et al. (2011) using simulation, no analytical results are currently available for such epidemic models allowing for dynamic household demographics. What will the community "look" like after a long time? What will the endemic level be? How could a reproduction number be defined and how would it depend on demographic and transmission parameters? To analyse this type of model is very hard and therefore a model should be defined as simply as possible in order to allow mathematical progress. Nevertheless, answers to the above questions can give qualitative insight into endemic equilibria and effects of preventive measures.

Generalise the framework of households model to more complex social structures
Extensions of household models, in which global contacts occur on a network have been proposed (Ball et al., 2009(Ball et al., , 2010. However, in these models the network is always assumed to be locally tree-like and hence realistic events of multiple introduction of the infection within the same household early on in an epidemic, due to short loops in the human social structure (for example, when two siblings attend the same school), are neglected. Furthermore, it is still unclear for which infections parsimonious models for the dependence of the intensity of within-household transmission on the household size (e.g. density-or frequency-dependent transmission) are suitable and for which ones more complex models (see e.g. Cauchemez et al., 2009, Section 1.1 of supplementary material) are to be preferred.
Households are not the only recognisable structure in human societies. In addition to members of their family, most individuals have significantly more contacts at work or school than with other individuals in the same neighbourhood, and probably more in their neighbourhood than in their city or country. This suggests a hierarchical human social structure (Watts et al., 2005). Extensions of the original household models with only two levels of mixing have been proposed, allowing overlapping groups of hosts (Ball and Neal, 2002); and with three levels of mixing (Britton et al., 2011b). A more complete theory is needed.
Although many of these issues have been investigated using large-scale agent-based simulations (e.g. Eubank et al., 2004), a key challenge associated with all the generalisations above is obtaining models that are both realistic and amenable to mathematical analysis.

Develop metapopulation models to reflect spatial population structure
Metapopulation models typically allow for greater contact within groups of hosts than between them. By defining the groups in terms of spatial proximity, they provide a simple way of representing coarse spatial structure, with between-group contact rates depending in some way on spatial distance. In these models, each host is in contact with every other host and, in principle, can be directly infected by them, even though the contact rates may vary greatly and for some pairs of individuals may be very small indeed. Alternatively, an underlying network contact structure based on spatial separation may provide a more realistic basis for the epidemic dynamics (see Pellis et al., in this issue). In such models, network nodes represent hosts and infection can only be transmitted by hosts directly linked by an edge (Bansal et al., 2007). At least in principle, the edges can be weighted, with the contact rates between adjacent nodes varying over the network (Britton et al., 2011a;Britton and Lindenstrand, 2012). Such weights may reflect spatial distance although, for theoretical progress, equal rates are often assumed. A combination of the two approaches, whereby groups of hosts are located at the nodes of a network is an attractive one. Recent work (Trapman, 2007;Gleeson, 2009;Ball et al., 2010) has looked at the epidemic dynamics that result when the two types of structure are combined in this way. Both metapopulation and network models employ simple population structures as a surrogate for a full spatial representation. In contrast, Riley et al. (in this issue) considers challenges for models where space is represented explicitly.
With metapopulation models, substantial analytic progress has been made, particularly when there is a considerable degree of symmetry in the model structure and the population consists of a large number of small groups, and some asymptotic progress has also been possible when the groups form the nodes of a relatively simple network (see above). More generally, it remains an important challenge to incorporate a reasonably realistic spatial structure into metapopulation models, using an underlying network structure as appropriate, in such a way that its main effects as a driver of transmission dynamics are taken into account, and yet in a sufficiently simple way that analytic progress is possible.

Develop inferential methods for data on the emerging phase of epidemics in structured populations
In order to specify a metapopulation model, several distributional assumptions and parameters are needed and it is often desirable to estimate all or some of these from observations. Inferential methods depend on when, how and what observations of the epidemic development are made. Recently, much interest has been devoted to the possibility of inferring properties of the epidemic process from observations from the first phase of spread, in particular when the disease in question is "new", the so called emerging epidemic situation. Examples include the initial assessments of the threat potential of SARS (Lipsitch et al., 2003) and of A (H1N1) influenza pandemic, when it started out in Mexico in 2009 . In both cases, inference was primarily focused on the reproduction number R 0 of the disease, both as a measure of potential societal impact and as an indicator of the needed strength of countermeasures. The inference was mainly based on combining estimates of doubling time and generation time. However, many challenges remain when analyzing the early part of an epidemic.
As it is now well accepted that the household structure of most populations has a significant impact on the spread of diseases, it becomes necessary to estimate the relative weights of withinand between-household spread. However, it is also recognized that observing the early part of an epidemic induces various kinds of biases that are not fully understood, in particular in combination with different observation schemes (cf. Nishiura et al., 2009). Thus, understanding how to carry out unbiased estimation during the early phases of an epidemic using the relevant observation scheme is among the primary challenges in this field. Some presumably simpler, but important, challenges also remain concerning, for example, the best way to estimate doubling time of a disease. The purely exponential early growth phase need not set in immediately, nor continue for a long time, which poses a problem of choosing which data points to use for the fit and what method of fit to then use. A similar inference problem concerns the generation time distribution, which becomes distorted by exponential growth if observed in the population at large and which may not be representative of global transmission if observed within households. Finally, since many inference methods need to plug in other uncertain estimates to arrive at the final result, statistically correct methods to assess total uncertainty in estimates and predictions are needed.

Develop computationally efficient methods for calculating thresholds and early exponential growth rates
The ability to calculate actual numerical values of the epidemiological quantities of interest in metapopulation models is essential if these models are to be of practical use. This requires accurate and fast algorithms to be designed and implemented.
In particular, for household models, calculation of critical thresholds for different vaccination schemes and other interventions requires methods to calculate the household reproduction number R * , while the early exponential growth rate r is often strongly constrained by data. Methods for calculating these fundamental quantities, aside from r, were given by Ball et al. (1997). Methods exist for determining the exponential growth rate r for models with Markovian within-household disease dynamics (Ross et al., 2010;Goldstein et al., 2009;Pellis et al., 2011). For non-Markovian models, Fraser (2007) (see also Pellis et al., 2011) developed a closed-form method for approximating r, which works well when household sizes are small and the generation interval of the disease has sufficiently small variance. Calculation of r for non-Markovian models involving small groups under less restrictive conditions remains a significant challenge.
For metapopulation models involving large groups, easily computed accurate approximations of epidemiological quantities of interest can often be obtained from asymptotic results as the group sizes all tend to infinity. For models involving moderate group sizes, such asymptotic results may not provide sufficiently accurate approximations and exact methods for small group sizes often become numerically infeasible or unstable; see House et al. (2013) for a systematic numerical comparison of computational methods for the final-size distribution, whose mean is needed to calculate R * .
In summary, a significant challenge remains in obtaining and understanding computational efficiency and numerical stability in calculation of R * , R 0 , r and other epidemiologically important quantities that occur in metapopulation models.

The individual as a habitat
Historically, epidemic models have concentrated on the dynamics of infections spreading between individual hosts, with simple assumptions about the infectious process within the individual: usually these amount to specifying the distribution in time and within the population of the others whom the individual will infect. One context considered in much more detail has been the modelling of macroparasite infections (see Hollingsworth et al., in this issue), where (as a minimum) the number of worms within a host is represented explicitly, rather than simply categorising an individual as infective or otherwise (Grenfell et al., 1995;Barbour et al., 1996;Herbert and Isham, 2000). This can be regarded as a special case of our broadly defined metapopulation framework, with individual hosts taking the role of sub-populations (see e.g. Hess et al., 2002), within which worm populations evolve according to their own dynamics, and the interactions between individuals comprising the interactions between sub-populations of worms.
More recent developments in phylogenetics of pathogens have opened up a whole new set of challenges for this kind of model (see Frost et al., in this issue), since we may want to model a whole variety of within-host dynamics, as strains evolve and compete under selection pressure from the host's immune response. Mutation is critical for the success of diseases such as influenza and HIV (Lythgoe et al., 2013). On the other hand it can be helpful in combatting disease by facilitating contact tracing, as in the case of hospital infections (Eyre et al., 2013). Both scenarios, however, can be seen as special cases of our general metapopulation framework, and are particularly relevant for emerging pathogens whose success in invasion at the population level crucially depends on their behaviour within individual hosts (see King et al., 2009 and references therein).
Most studies to date have simplified or approximated the transmission dynamics (e.g. by cutting the feedback loop in the internal process) in order to make analytical progress or have resorted to simple deterministic models. The challenge here is to develop a full metapopulation framework, on the lines of Ball et al. (1997), in order to integrate within-and between-host dynamics. This needs to be done in collaboration with epidemic phylogeneticists to ensure that it addresses key questions of public health interest.