Eight challenges for network epidemic models.

Networks offer a fertile framework for studying the spread of infection in human and animal populations. However, owing to the inherent high-dimensionality of networks themselves, modelling transmission through networks is mathematically and computationally challenging. Even the simplest network epidemic models present unanswered questions. Attempts to improve the practical usefulness of network models by including realistic features of contact networks and of host-pathogen biology (e.g. waning immunity) have made some progress, but robust analytical results remain scarce. A more general theory is needed to understand the impact of network structure on the dynamics and control of infection. Here we identify a set of challenges that provide scope for active research in the field of network epidemic models.


Introduction
Networks (or graphs) are extremely flexible tools for representing complex systems of interacting components (Boccaletti et al., 2006;Durrett, 2007;Newman, 2010). Each component is represented by a node (or vertex) and each link (or edge) between nodes describes some sort of interaction between them. Here, we focus on the specific application of networks in the field of infectious disease modelling (Andersson, 1999;Danon et al., 2011).
Because of their flexibility, networks have been used to model infection spread in different forms. Nodes can describe single individuals, groups of individuals (e.g. households, farms, cities) or locations to which individuals are connected (e.g. see Riley et al., in this issue). Links can represent infectious attempts or transmission events (in which case the network is directed) or simply acquaintances between them (social or sexual relationships through which the infection can spread, usually in both directions), movements of animals between farms (direct or via intermediate markets), flight routes, etc.
This apparent simple and intuitive representation of a population of interacting components has the drawback that it might be difficult to work with. Even in the case of a simple undirected network with n nodes, we still need n(n − 1)/2 binary digits to fully describe the presence or absence of each possible edge. Thus, particularly for large networks, the general approach is to summarise most of the network information in a small set of statistics and then study their impact on infection spread. Among the myriad network properties (Boccaletti et al., 2006;Newman, 2010), in this paper we consider some of those that appear both epidemiologically relevant and amenable to analysis, such as: degree distribution, the distribution of the number of links from each node; assortativity, the propensity of epidemiologically similar nodes to be connected to each other, an important example of which is the degree correlation between neighbouring nodes; clustering, the propensity of two nodes with a common neighbour to be neighbours of each other (i.e. the fraction of triplets that form triangles); modularity, the partitioning of the network into internally well-connected groups; and betweenness centrality of a node, i.e. the number of shortest paths between all pairs of nodes that pass through that node.
Here, we have in mind nodes as individuals and links as acquaintances between them, and therefore primarily consider infection spread on undirected networks. Furthermore, we mostly have in mind permanently immunising infections (i.e. SIR epidemic models). Although most challenges apply also in the absence of permanent immunity (i.e. SIS and SIRS models), this analytically much harder case is the focus of Section 'Incorporating waning immunity in network epidemic models'. In Section 'Understanding the effect of heterogeneity on parameter estimation and epidemic outcome', we consider the so-called configuration model (Danon et al., 2011;Durrett, 2007, Chapter 3): beside the Erdös-Rényi random graph (Durrett, 2007, Chapter 2), this is the most analytically tractable network because of its locally tree-like structure, but it lacks many features of real-world networks that can dramatically impact transmission dynamics. We then discuss complex networks (i.e. not locally tree-like), first unweighted and static (Section 'Developing analytical methods to generate and study epidemics on static unweighted complex networks') and then weighted and dynamic (Section 'Developing analytical methods to model weighted and dynamic networks and epidemics thereon'). Approximate methods are discussed in Section 'Developing and validating approximation schemes for epidemics on networks'. Finally, in Sections 'Clarifying the impact of network properties on epidemic outcome', 'Strengthening the link between network modelling and epidemiologically relevant data' and 'Designing network-based interventions' we discuss the impact of network structure on infection spread, the relationship between network models and data, and interventions, respectively.

Understanding the effect of heterogeneity on parameter estimation and epidemic outcome
In homogeneously mixing populations, the relationships between key epidemiological quantities are generally well understood. For example, it is well known that for SIR epidemics in the large population limit (starting with a negligible fraction of the population infected), R 0 and the final size of a large outbreak, z say, are strongly linked by the simple relationship 1 − z = e −R 0 z (Diekmann et al., 2013).
However, even for an SIR epidemic on a configuration-type network, this simple relationship is lost: R 0 and final size of a large outbreak both depend on the degree distribution, but the former is affected by the degree variance, which is much more sensitive to changes in probabilities of high-degree than low-degree vertices, while the latter is highly dependent on the exact probabilities of low-degree vertices, but hardly depends on high-degree ones. Similar considerations apply when individuals vary in susceptibility and/or infectivity, with the additional problem that attainable data are unlikely to provide much information of this type.
It therefore remains an important problem to understand how, not only R 0 , probability of a large outbreak and its final size, but also duration of the epidemic and peak incidence, relate to each other and how the dependencies are affected by potentially unobserved heterogeneity in susceptibility/infectivity and degree.
Furthermore, during an outbreak, early predictions for public health purposes are typically needed. Therefore, it is important to quantify how such heterogeneities affect early parameter estimates (e.g. of R 0 ) and the repercussions of potential estimation biases on epidemic predictions.

Developing analytical methods to generate and study epidemics on static unweighted complex networks
Although convenient for its analytical tractability, the configuration model fails to capture some important properties of realistic contact networks. The POLYMOD study (Mossong et al., 2008) revealed strong assortativity by age (people make more contacts of similar age to their own than of others) with the additional trans-generational contact between children and adults, while Read et al. (2008) highlighted significant clustering in an empirically measured social network. Metapopulation and multitype epidemic models (see Ball et al., in this issue) are epidemiologically important examples of modular networks. Spatial (see Riley et al., in this issue) and highly heterogeneous networks of size n, unlike the configuration model, exhibit path lengths of order other than log(n). Finally, higher-order correlations such as fourmotif structure or correlations at the triple level are likely to occur in any network generated by complex social processes (Miller, 2009).
A number of models for constructing random networks have been developed to incorporate realistic graph properties. Generally, as the random graph model under consideration becomes more complex, rigorous results about the properties of the resulting network, and of epidemics running on it, become less general. For example, the preferential attachment network model allows for rigorous analysis of most network properties and also asymptotic epidemic threshold behaviour (Durrett, 2007, Chapter 4). For random geometric graphs network properties are known but analysis of epidemic dynamics has so far required Monte Carlo simulation (Isham et al., 2011). For exponential random graphs (Danon et al., 2011) and related models that seek to generate networks with specified properties in the most random way possible, there are essentially no exact results.
Rigorous analysis is, however, possible for SIR epidemics defined on some random network models with clustering. These include models incorporating small cliques of individuals, e.g. random intersection graphs, triangle-or household-based models (see Ball et al., 2013, and references therein). However, analytical tractability stems from the fact that all such models have a tree-like structure at some level (e.g. a tree of fully connected cliques).
Although these models enable analysis of the effect of clustering and sometimes also degree correlation on epidemic properties, it must be recognised that the networks they produce are rather special and not easily generalisable. Also, epidemics on distinct network models having common degree distribution, clustering coefficient and degree correlation may have different properties (Ball et al., 2013). Therefore, major challenges involve identifying which, if any, of the current models reflects reality well enough for the question at hand and developing other network models that are both sufficiently realistic and amenable to rigorous mathematical analysis.

Developing analytical methods to model weighted and dynamic networks and epidemics thereon
Links within real-world social networks are not all identical: some interactions carry a greater risk of disease transmission than others. To account for this additional heterogeneity, we can consider weighted networks, in which a link's weight (which may vary over time) can be thought of as its relative transmission potential. Some models have attempted to include information about link weights (Kamp et al., 2013), but their inherent highdimensionality is a significant challenge if the intention is to avoid detailed micro-simulations. Furthermore, it is not always clear how the transmission potential relates to observable quantities, as available data in social networks are limited, and are always restricted to information that is easily measured or estimated (see Eames et al., in this issue): for example, contact diary studies often ask about whether an encounter included physical (skin-to-skin) contact, how long it lasted, and how often a specific individual is encountered (Mossong et al., 2008); networks measured using electronic proximity sensors offer more precise estimates of the duration of an encounter (Stehlé et al., 2011), but only of an encounter in which unobstructed sensors were within a given functioning distance.
On the other hand, social contacts are neither continuous nor permanent. Various forms of network dynamics are known to be relevant to infectious disease epidemiology (Bansal et al., 2010): extrinsic processes (e.g. births, deaths, school terms, changes in social relationships, migration, host mobility, seasonal or long-term socially or economically-driven changes); individuals' spontaneous changes (avoidance behaviour) or public health interventions (vaccination, school closure); and the spread of the infection itself (recovered individuals become irrelevant in future chains of transmission, infected individuals may alter their behaviour).
These changes can alter local network topology (in the form of added/removed nodes and edges, or as altered edge weights) and even affect global network structure and properties. In response to each of the processes highlighted above, respectively: a. Models have successfully included varying contact durations (Kretzschmar and Morris, 1996), formation and dissolution of contacts (Eames and Keeling, 2002), contact exchange (Volz and Meyers, 2007). However, the inclusion of demographic processes in a tractable and realistic manner remains elusive (with a few recent exceptions; see e.g. Kamp, 2010). b. Models have included infection-avoidance using network models with adaptive contact exchange (e.g. susceptibles replacing infected neighbours with other randomly chosen susceptible ones; Gross et al., 2006) or with serosorting models for HIV where individuals choose sexual partners matching their infections status (Volz et al., 2010). These models show a significant impact on epidemiological outcomes of this behaviour; however, it is unclear whether data support such modelling assumptions as realistic behavioural responses to ongoing epidemics (Funk et al., in this issue). Public health interventions are discussed more broadly in Section 'Designing network-based interventions'. c. Finally, for respiratory diseases such as influenza, illness has been found to reduce contact and generate a shift in age-specific mixing (van Kerckhove et al., 2013). However, a more complete understanding of the impact of disease on contact structure is necessary for a broad class of pathogens.
These recent developments are promising, but we still lack a mathematical framework that tractably handles a broad range of realistic dynamic networks.

Incorporating waning immunity in network epidemic models
Most of the theory of epidemics on static random networks concerns the SIR model because the assumption of permanent immunity significantly increases analytical tractability. Many quantities do not depend on when events happen but only on whether they happen or not: therefore, the real-time dynamics can often be ignored and properties such as R 0 , the probability of a large outbreak and its final size can be computed using theory from branching processes (Jagers, 1975) or percolation theory (Grimmett, 1999). When immunity is lacking or waning at the same time scale as the infection dynamics (e.g. SIS, SIRS models), rigorous analysis become much harder: the time at which events occur cannot be ignored, and dependencies appear not only between the states of neighbours but also between those of distant individuals.
Models without permanent immunity are seldom studied in a rigorous way, with the notable exception of the Markov SIS epidemic (i.e. with constant infection and recovery rates), extensively considered in the physics literature as the contact process (Liggett, 1999). However, even in the simple case of the Markov SIRS epidemic there are no rigorous results about the survival probability on an infinite graph and whether it increases as the infection rate increases (e.g. high rates might not give enough time for recovered individuals to regain susceptibility before infection goes extinct locally; van den Berg et al., 1998). Furthermore, it is not known whether an epidemic that survives for a long time reaches endemicity in all parts of the network or whether different parts of the network experience recurrent waves of infection. This problem is closely related to weak and strong survival in the contact process (Liggett, 1999).

Developing and validating approximation schemes for epidemics on networks
Approximate results are available through a great many methods. These are used to describe the limiting dynamics of stochastic epidemics on networks in terms of sets of differential equations (e.g. pair approximations, triple-based models, effective-degree approaches). For some locally tree-like networks a differential equation model is asymptotically exact, but for clustered networks the situation is much more complex. Typically, the heuristic arguments used to motivate approximations rely on an implicit assumption such as that the network in question is selected uniformly at random from the set of all graphs having specified properties. For example, for clustered networks, approximations usually assume that all encountered triplets form closed triangles independently with constant probability (Danon et al., 2011) and hence are not designed for networks where, say, triangles all cluster in cliques (e.g. households, see Section 'Developing analytical methods to generate and study epidemics on static unweighted complex networks'). As yet, however, there is no complete theoretical understanding of when a given approximation will work, and a major challenge is to put such approaches on a rigorous mathematical footing, for example by finding an asymptotic regime under which the approximation becomes exact as the population size tends to infinity.

Clarifying the impact of network properties on epidemic outcome
A commonly stated challenge for complex network models is to understand how network characteristics affect epidemiological quantities of interest. The problems are similar to those highlighted for simple networks in Section 'Understanding the effect of heterogeneity on parameter estimation and epidemic outcome', with additional complications due to the shortage of analytical results. Even simple questions like the dependence of R 0 and the probability and size of a large outbreak on clustering and degree correlation (Section 'Developing analytical methods to generate and study epidemics on static unweighted complex networks') need care, as structurally different networks can exhibit the same clustering and correlation (Ball et al., 2013), and answers will depend on other aspects of network topology.
For weighted networks, models could be used to consider the impact of: the distribution of link weights; the role of correlation of weights (does it matter whether weights are distributed randomly, or whether weights are correlated at the individual level or 'locally' within the network?); the relationship between weight and degree (do people with more contacts have contacts of lower weight?); the relevance of low-weight links (can such links be ignored, or do they drive the emergence of infection from dense local cliques?).
For dynamic networks, previous work has shown that concurrency (Kretzschmar and Morris, 1996) for sexually transmitted infections, and contact repetition (Smieszek et al., 2009) and exchange (Volz and Meyers, 2007) for respiratory diseases, can influence disease dynamics. A more complete understanding of the epidemiological significance of dynamic contact patterns across all classes of pathogens in influencing both disease spread and the efficacy of various intervention strategies (Section 'Designing network-based interventions') is needed. This will inevitably depend on pathogen-specific characteristics, disease timescales and the questions at hand.
Answers to these questions are vital for public health modelling by providing guidance on the levels of heterogeneity and detail that are required (e.g. can interactions that underlie disease transmission be adequately captured by a static network model or should complex dynamics be modelled explicitly?).

Strengthening the link between network modelling and epidemiologically relevant data
The challenges described above are rather theoretical in nature, but are strongly motivated by the need to capture those characteristics of social behaviour that are deemed to affect infection spread. As more data become available, modellers need to improve their analytical and computational toolkit. Agent-based simulations are undoubtedly useful, but often face significant algorithmic and computational problems with respect to network representation, measurement of topological features, and dynamical models for pathogen spread, as well as a lack of generality and unproven robustness to uncertainties in model structure and parameter values. Advances in data-driven analytic modelling would solve some of these challenges while enhancing understanding of the determinants of model behaviour. At the same time, modelling should play an important role in guiding future data collection, in particular by highlighting those data to which epidemic outcomes are most sensitive, thus closing a virtuous feedback loop between theoretical understanding and real-world observations.
The past decade has seen such a feedback loop more heavily tilted towards the analytical modelling side. Although further work in that direction is needed, particularly exciting is the emerging world of 'big data', in the form of genetic information (Cottam et al., 2008;Ypma et al., 2012), contact diaries (Mossong et al., 2008;van Kerckhove et al., 2013) and electronic sensors Stehlé et al., 2011), and the increased power of modern statistical methods to deal with these data (Cauchemez et al., 2011). These raise the possibility of more direct observation of epidemic networks than has previously been possible, and are discussed in However, connecting observations to model structure and parameters is far from trivial. For example, little work has been done to relate quantifiable measures of link weight directly to the risk of transmission across the link. Studies are required to collect both social contact and epidemiological data to systematically assess a wide range of 'weight' measures and to determine the relevant mapping between weight and risk. Those studies that have been carried out have indicated that risk of transmission varies by (among other factors) type of sexual contact (Boily et al., 2009), and by social setting (Cauchemez et al., 2011;te Beest et al., 2013). It remains unclear how generally such results can be applied, and what role is played by the various properties of the individuals and their relationship: for example, is a link between two school friends of high weight because they are at school, or because they are of particular (and similar) ages, or because they share other social activities? The appropriate measures will differ for different pathogens -consider, for example, influenza and HIV -so studies should include pathogens with different modes of transmission.

Designing network-based interventions
Public health interventions can aim to reduce transmission along network edges without fundamentally altering the network topology (face masks, handwashing) or can have local and population-scale effects on the topology of the contact network (e.g. school closures, social distancing or vaccination, which reduce contacts by removing network edges).
Understanding network structure is vital, as network features can also be exploited to design optimal strategies. Two such strategies include targeting high-degree nodes to make the network sparser and targeting central nodes to fragment the population into hard-to-reach subgroups. While theoretically sound ideas such as these are in general difficult to implement in practice when lacking knowledge of the complete network, a few recent approaches have been proposed to make these strategies feasible: for the former, identifying high-degree nodes (e.g. acquaintance immunization) or identifying individual traits that serve as proxies for high connectivity (e.g. age and occupation in human populations: Bansal et al., 2006; social role in wildlife populations: Otterstatter and Thomson, 2007; or activity in livestock populations: Shirley and Rushton, 2005); for the latter, identifying social roles or occupations that correlate with high betweenness (e.g. sex workers: Mishra et al., 2012) or employing local algorithms that identify highly central individuals without requiring knowledge of the entire network (e.g. the community bridge finder algorithm: . Further such work is required for efficient and feasible network-based intervention strategies in the absence of complete network data, and for a better understanding of the relationship between partial network data and intervention efficacy. Contact tracing (i.e. real-time tracking of infected individuals and their exposed contacts) is a typical network-based intervention (and is the standard of care in some locations, e.g. syphilis in the United States). By automatically identifying high-risk individuals, it can be highly effective as a preventative or control strategy, and is particularly useful for asymptomatic infections. Previous work indicates that contact tracing effectiveness increases with clustering (Eames and Keeling, 2003), but questions remain about tracing of 'high-risk' individuals, the optimal timing of contact tracing, the interactions between timescales of tracing and the natural history of infection, as well as interactions with other interventions.
An additional challenge lies in the modelling of behavioural responses to interventions as they pertain to changes in network structure. Examples include changing of age-specific mixing patterns during school closures to control respiratory disease outbreaks (Cauchemez et al., 2008) or rewiring of links during a movement standstill implemented to control livestock disease outbreaks (Robinson et al., 2007). This challenge is discussed further in Funk et al. (in this issue).

Conclusions
Modelling transmission within networks is a broad and challenging field. As we have outlined above, it offers a range of problems including fundamental theoretical work, understanding and capturing observed network data, and guiding network-based public-health interventions. While the list of potential challenges is practically endless, here we have attempted to identify a set of problems, covering a range of facets, that merit study. Within this issue can be found reference to related challenges including networks in phylodynamics (Frost et al., in this issue), measurement of network data (Eames et al., in this issue), and the place of network models in relation to other modelling structures (Riley et al., in this issue; Ball et al., in this issue). While we certainly do not claim to have identified all -nor necessarily the most urgent -questions in network modelling, we hope that this paper will play a role in spurring advances in this important and fascinating field.