Nine challenges in modelling the emergence of novel pathogens

Highlights • We summarize key challenges in modeling the emergence of novel infectious agents.• We focus on connections to data, including epidemiologic and genetic data.• Zoonoses are emphasized, because they are the source of most new human pathogens.• Challenges span reservoir dynamics, cross-species spillover, and outbreak dynamics.• Estimation of fatality rates and overall risk assessment are also addressed.


Introduction
While humankind continues to battle ancient adversaries such as tuberculosis and malaria, there is constant concern about the emergence of new human pathogens from sources in non-human animals (Jones et al., 2008). At the very least, this concern is justified by devastating pandemic emergences of HIV-1, HIV-2, and Spanish influenza. We have also seen the near-establishment of SARS-Coronavirus, and a relentless series of zoonotic threats competing for our attention and public health resources. At the time of writing, influenza A H7N9 in China (Centers for Disease Control and Prevention, 2013) and MERS-Coronavirus in the Saudi Arabian peninsula (Penttinen et al., 2013) are both causing substantial numbers of cases, and deaths, and health authorities are searching for effective responses.
This article focuses on challenges in modelling the emergence of pathogens that newly appear in human hosts, such as MERS-CoV or zoonotic influenza strains. We consider problems at the interface of models and data that pertain to interpreting patterns in observed outbreaks, and contributing to rational and robust assessment of risks posed by putative emerging pathogens. We assume that candidate zoonotic pathogens are circulating in some non-human reservoir population or populations, from which they can spill over to infect humans. Humans infected directly by animals are known as spillover or primary cases. If human-to-human transmission occurs, then subsequent cases infected by humans are termed non-primary.
In assessing pathogen emergence, it is useful to delineate what is known about a pathogen's ability to spread between humans. A crucial distinction exists between pathogens that are capable of sustained human-to-human transmission in some settings (i.e. R 0 > 1 in humans), and those that exhibit inefficient spread, with subcritical dynamics (i.e. 0 < R 0 < 1). This latter group includes many pathogens viewed as significant future threats, such as influenza A H5N1, influenza A H7N9, MERS-CoV and monkeypox virus. Another group includes microbes detected by 'pathogen discovery' in various non-human animal populations (Lipkin and Firth, 2013), including many that are previously unknown to science (e.g. Anthony et al., 2013), the relevance of which is often unknown.

Better capture the disease dynamics in proximal non-human species
One can imagine two extreme conceptual models for the dynamics of emergence from non-human hosts into humans. In 'static reservoir emergence', the dynamics of the pathogen in the reservoir do not change from their long-term pattern. Because of chance or some change in human behaviour, the pathogen spills over from this static reservoir system to cause human infection. In 'dynamic reservoir emergence', the ecology of the pathogen in its non-human hosts changes substantially prior to emergence in humans; changes could include transmission into domestic animals, or gains in transmissibility due to evolutionary changes in the pathogen. However, while the conceptual differentiation between static and dynamic reservoir emergence is attractive, key case studies point much more towards dynamic emergence. For example, Nipah virus caused outbreaks in pigs prior to infecting humans (Parashar et al., 2000) and outbreaks of Sin Nombre virus infection (including the first identified outbreak) have been linked to elevated rodent population densities following periods of increased rainfall (Hjelle and Glass, 2000).
Current assessments of emergence risks from novel pathogens focus heavily on the frequency of particular pathogen genotypes (Russell et al., 2012) or predicted (static) distributions of reservoir species (Fuller et al., 2013), and do not include dynamic factors in reservoir ecology. Therefore, an important, broad challenge is to use models in conjunction with available data to help detect and characterize potentially dangerous changes in the ecology of infectious diseases in key wildlife or livestock reservoirs.

Expand models for cross-species spillover transmission from general principles to specific, mechanistic frameworks integrating all relevant data types
Characterization of the spillover force of infection is crucial to emergence dynamics. Very general frameworks have been advanced, for instance to decompose the spillover force of infection into (Lloyd-Smith et al., 2009): Spillover FOI = prevalence in reservoir × reservoir-human contact rate × P(infection|contact).
We need a new generation of approaches that take advantage of broader developments in infectious disease dynamics and epidemiology. For instance, ecologic, economic or environmental factors giving rise to interactions among the three terms should be considered, and their dynamical consequences explored. Constructing more mechanistic models of spillover transmission will raise specific challenges, but may also present new solutions. For instance, when human infection occurs via environmental reservoirs, or through food, it may be possible to integrate many complexities of reservoir ecology into their impact on environmental burden, then use dose-response relationships to understand risk to humans. Otherwise, there can be many challenges associated with finding a relevant characterization of prevalence in the reservoir, particularly when the system involves multiple host species and multiple pathogen strains, each possibly posing different risks to humans. Transmission dynamic models that incorporate data from sequence-based or niche modelling approaches may help to predict spillover risk more generally.
Epidemiology has well-developed frameworks for risk factor analysis, which can be applied to spillover because primary cases can be viewed as independent outcomes (at least approximately). Thus there are opportunities to integrate a biostatistical approach to primary cases with a stochastic model of subsequent transmission, creating a joint inference framework. For example, primary infection with Nipah virus in Bangladesh is associated with drinking date-palm sap, while on-going transmission is associated with close contacts among humans (Gurley et al., 2007). A joint framework that links these co-factors using a mechanistic model may aid in distinguishing between primary and non-primary cases (see Challenge 4). Studies of age-based mixing patterns have shed light on transmission dynamics of endemic pathogens (Mossong et al., 2008); there could be similar benefits to linking spillover risk factor information to data on mixing patterns in relevant human populations. Analogously, the coupling between spatial distribution of spillover risk and spatial factors influencing human-to-human transmission may govern the risk of a major outbreak (for instance, risk will be lower if spillover occurs chiefly in remote settlements than if it happens in crowded urban areas).

Harness pathogen genetic data across the human-animal interface to map transmission and detect adaptation
Pathogen sequence data could shed light on central questions in zoonotic emergence, by reconstructing transmission connections or looking for adaptation in a new host. However, numerous challenges persist. Because of historic interdisciplinary divisions, isolates from animals and humans have often been grown, detected or analyzed using different approaches, which effectively precludes useful inference. Pathogen isolates are often rare, particularly in difficult-to-culture genera, so isolates from linked cases are unusual. Animal sources of human spillover cases are often gone (dead, eaten, or moved away) by the time the human cases are detected and investigated, so that sequences come from other animal individuals that may not be closely linked. Any inferences about pathogen evolution must include uncertainty arising from (typically poorly known) transmission and evolutionary processes in animal hosts. This situation is particularly challenging if multiple species are involved in circulation of the pathogen, as for avian influenza. Many current examples are based on coarse sampling and use appropriately coarse analyses, such as phylogenies, but higher-resolution methods (preferably not sensitive to missing samples) will be needed as isolate detection improves. These methods will need to explicitly link transmission mechanisms to sequence evolution.
Challenges also arise when trying to assess whether evolutionary adaptation played a role in a past emergence event, and when any adaptive mutations occurred (Pepin et al., 2010). Often, there is no baseline surveillance prior to emergence, so ancestral genotypes cannot be assessed, or available samples are separated by substantial gaps. There is typically poor information about pathogen diversity in animal hosts, let alone in individual animals.

Improve methods to analyze stochastic dynamics after pathogen introduction, accounting for heterogeneities and imperfect observation
Substantial progress has been made on modelling the stochastic dynamics of early generations of transmission after a novel pathogen is introduced to a population, yet major challenges remain.
Particular challenges arise from heterogeneities (typically uncharacterized) in host contact patterns, host susceptibility and infectiousness, environmental factors, and possibly pathogen phenotypes. Which of these matter for a given outbreak, and how can this be determined? Further challenges arise from host population structure at scales from households to cities, and the resulting possibility that local pools of susceptibles will be depleted. These effects are often neglected for emerging pathogens, with the rationale that many hosts are available, but this assumption fails easily (Cross et al., 2007). Non-stationary dynamics are also challenging: any changes through time arising from control measures or behaviour change will be entangled with non-stationarities driven by contact network effects (whereby the most connected individuals are infected early) or superspreading effects (whereby transmission rates revert to the mean after outbreaks are kicked off by superspreaders).
These factors combine to shape the dynamics of early transmission chains, and the resulting data form the basis for inference about outbreaks of emerging infections. Outbreaks are often small, so data are commonly pooled from multiple introduction events at different times and locations, introducing further heterogeneities. Such 'mixed distributions' have been used to estimate reproductive numbers for post-elimination measles, for instance, though it is recognized that the largest outbreaks occur in populations with unusually low vaccination rates, where different parameters would apply (King et al., 2004). Furthermore, imperfect observation causes cases and outbreaks to be missing from data sets, likely in non-random ways. We need models that can account for these problems, and ideally correct for them, to enable robust inference of parameters of interest.
A particular set of challenges arises for subcritical pathogens, whose epidemiology is characterized by a mix of spillover events and self-limited chains of human-to-human transmission (socalled 'stuttering chains'). An essential and pervasive challenge is to disentangle contributions from these two sources. There has been recent progress in methods to estimate R 0 based on the distribution of chain lengths (Blumberg and Lloyd-Smith, 2013a) or ratio of primary to non-primary cases (assuming these can be distinguished) (Cauchemez et al., 2013). These approaches have complementary strengths and weaknesses, so hybrid or alternative approaches would be valuable. Also, little attention has been paid to the joint characterization of temporal variation in spillover hazard and human-to-human transmissibility. In some scenarios it may be possible to extract the contribution of subcritical human-to-human transmission, leaving a robust description of the spillover hazard itself -or conversely to extract the contribution of spillover and see the human-to-human transmission more clearly (Kucharski et al., 2014).

Improve data collection and analysis to learn from the frequency of singleton (sporadic) cases
For many zoonotic infections, particularly those with low transmissibility among humans (or high variation in transmissibility), many primary cases fail to transmit and thus appear as singleton cases. Most attention is focused on larger disease outbreaks, when transmission in the human population raises concern. Sporadic singleton cases are viewed as low priorities for surveillance. Singletons are fundamentally more difficult to detect and data on singletons are even sometimes dropped from analyses (discussed in Blumberg and Lloyd-Smith, 2013b).
Data on singletons are needed to achieve correct estimates of total spillover rates, as well as infection fatality rates (see Challenge 8), and also to study risk factors for primary infection. They comprise an important component of the distribution of outbreak sizes, which can be used to estimate R 0 (Farrington et al., 2003). The frequency of singleton cases has surprising influence when R 0 and other parameters are estimated from chain size distributions (Blumberg and Lloyd-Smith, 2013b). If R 0 can be estimated through other means, then measuring the frequency of singleton cases allows heterogeneity in transmission to be estimated (Lloyd-Smith et al., 2005). An important challenge for the field is to improve reliability of singleton case data, and hence to incorporate these data into epidemiological analyses -or else to develop robust methods for parameter estimation that account for missing or biased data on singletons.

Develop theory and case studies for the role of intermediate hosts in pathogen emergence
Many recent zoonotic outbreaks feature an intermediate host species, acting as a transmission bridge between the 'true reservoir' where the pathogen is maintained, and the human 'target' population. Prominent examples include Nipah virus passing from flying foxes to pigs to humans in Malaysia, Hendra virus passing from flying foxes to horses to humans, and SARS-CoV passing from fruit bats to palm civets to humans. These intermediate hosts might contribute strictly via contact, by bridging between two host populations with no direct contact, or they might have a more biological role, e.g. as an 'amplifying host' that can generates high pathogen titres, or by facilitating pathogen evolution that increases transmission in the human host.
There is no general framework to define, compare, or contrast these various roles for intermediate hosts in emergence events, and modelling has been ad hoc and system-specific. We need to identify general principles and defining characteristics of the different scenarios and then to apply them to case studies. Such a model framework could also guide decisions about allocation of surveillance or control effort.

Expand models for emerging infections to account for host immunity
Most models of emerging infections assume a completely susceptible host population, but this is not valid if parts of the population have been exposed to low doses of the pathogen or to less virulent ancestors or related pathogens. Those in frequent contact with animals may have been exposed to zoonotic pathogens such as SARS-CoV or influenza, and more elderly sub-populations might have historic exposures.
These partially immune groups can cause profound dynamic effects. Having a population fraction immune, or partially immune, can facilitate disease persistence by reducing the chance of extinction in the post-epidemic trough (Pulliam et al., 2012). If those most at risk of exposure to a zoonotic pathogen are those with the highest levels of immunity (because of multiple previous exposures), they could form an effective barrier preventing an infection from spreading to the rest of the population; such a pattern is seen for influenza antibodies in numerous studies of swine industry workers (Myers et al., 2006). Data from this scenario might also lead to underestimation of the reproduction number of the pathogen if it spreads into a population that is truly naive. Models could help distinguish between pathogens that fail to spread because their transmissibility is low in all humans, versus those that fail because of low transmissibility in the human population in contact with the reservoir. We need to understand when these effects matter, and how to identify them.

Devise approaches to measuring infection fatality rates
When a new infection emerges there is great concern to know its 'case fatality rate'. This is formally defined as the number of deaths from infection divided by the number of cases -but this clear-sounding definition hides a deep source of confusion in the definition of a 'case'. Often only severely ill cases are counted, leading to overestimation of case fatality rates. A basic tenet of infectious disease biology is that infection does not always lead to disease, and case fatality rates calculated with 'cases of disease' as the denominator can differ greatly from those calculated using 'cases of infection'. Such differences have led to widely varying estimates of case fatality rates for new emerging infections, with some corresponding alarmism, e.g. for pandemic H1N1 swine flu when it first emerged in 2009 (Garske et al., 2009).
Valuable progress has been made by focusing on case fatality rates in hospital-admitted cases or symptomatic cases only (Yu et al., 2013). However, to make sensible projections of the total expected mortality from novel infectious diseases, we need to move beyond hospital case fatality rates towards infection fatality rates that explicitly use the number of infections as the denominator. Estimating this quantity is challenging if many infections are asymptomatic or cause only mild illness. One possible solution is to use serology to estimate numbers infected; this was deployed in Hong Kong during the first wave of the 2009 H1N1 epidemic (Riley et al., 2011), though challenges could arise from unknown background seroprevalence and delayed availability of serological data. An alternative is to estimate total numbers of infections from analyses of pathogen sequence data, which is possible in principle during the exponential phase of an outbreak (Frost and Volz, 2010). Since pathogen sequences are often deposited in the public domain in real time during pandemics this might provide very early estimates of the numbers of infections, including those that are asymptomatic. Mathematical modelling could clarify what can be expected from such an approach. Given the numerous assumptions involved, would estimates of the numbers infected ever be accurate enough to make this approach useful? How many sequences are needed? Could we correct for sampling bias and heterogenous transmission? (see the challenges in phylodynamic inference, in this volume, Frost et al., 2015). With the widespread availability of viral sequence from the 2009 H1N1 pandemic and a 'gold standard' estimate of infection rates from published serology, that event could be a testing ground for new methods for calculating infection rates and thus infection fatality rates.

Design robust and efficient approaches to empirical studies of novel pathogens aimed at risk assessment
Massive effort is going into surveying for possible zoonotic pathogens in various wildlife and domestic animal populations. Wide application of sensitive technologies identifies many pathogens, but it is unclear how these results map onto public health risk, especially when based on detection of pathogen nucleic acid rather than pathogen isolation. Further work aims to assess risks from particular pathogens by focused laboratory or infection experiments.
In the distant future, one might be able to detect a pathogen, sequence it and then know enough to make a reasoned estimate of the risks associated with its emergence. Far too many gaps exist now for that to be feasible. However, modelling studies can help by forcing definitions of emergence in clear, quantitative frameworks, and keeping the focus on key processes. In particular, models can define what properties of novel pathogens need to be measured. Pathogen phenotypes are often assessed in animal models and modelling should be used to analyze such studies and shed greater light on their optimal design. For example, some influenza research has focused on infectiousness per unit time rather than the duration of infectiousness. Is this design sufficient to characterize risk? Another essential contribution would be developing modelling approaches to link data from experimental infections to data attainable in the field.
Models can also help to design surveillance programmes for emerging pathogens. Model-guided fieldwork (MGF) has been advanced recently as a useful tool for all of disease ecology (Restif et al., 2012), and could find useful application here. MGF approaches can help empirical scientists to focus on particular sample types (seroprevalence versus infection prevalence or incidence) and subpopulations, rather than following ad hoc or unfocussed data collection plans. Careful modelling can help define the best sentinel groups, how they should be surveyed, and with what sample sizes. MGF can also help to identify animal species (or groups of species) that act as disease reservoirs (a notoriously tough problem (Buhnerkempe et al., 2015;Viana et al., 2014)).

Summary
Research on emerging pathogens has highlighted the essential need to integrate insights from many disciplines, and to link processes acting at multiple scales (there are obvious connections to other articles in this issue, Frost et al., 2015;Buhnerkempe et al., 2015;Gog et al., 2015;Wikramaratna et al., 2015). This drives home the need for focused modelling efforts, to link these disparate data types, explore case studies, and define priorities in data collection. There is a general pattern of serious challenges arising from missing information -and from not adequately addressing these data gaps. At the same time, expanded sampling efforts and new technologies are bringing a flood of data that must be analyzed with a focus on mechanistic principles and possible imbalances in sampling design. The long-term goal is to draw robust conclusions about past events and make appropriate assessments of risk (and uncertainty!) about pathogens that appear to be threatening.