Sub- or supercritical transmissibilities in a finite disease outbreak: Symmetry in outbreak properties of a disease conditioned on extinction

Highlights • The reproduction number R can quantify the transmissibility of infectious diseases.• Sub or supercritical R (1) determine whether extinction is certain or not.• Conditional on extinction, super- and sub-critical dynamics are indistinguishable.• We give analytical formulation to map supercritical R onto subcritical counterpart.• Our novel approach allows us to give additional results for key distributions.


Introduction
Over the past two decades, outbreaks of SARS ( Riley et al., 2003 ), pandemic flu , MERS-CoV ( Cauchemez et al., 2016 ), Ebola ( WHO Ebola Response Team, 2014 ) and more recently Zika ( Ferguson et al., 2016 ), have illustrated the importance of early characterisation of the transmissibility of emerging or re-emerging pathogens. Typically, characterisation is achieved by estimating either the basic reproduction number R 0 , which applies in a large population that is fully susceptible, or the instantaneous reproduction number R t , which applies at time t after start of an outbreak, and which accounts for past exposure and any interventions which have mitigated the spread of the pathogen. Such estimates of transmissibility have been used to assess the level of intervention required to control an outbreak. For instance, if R 0 > 1 and vaccination is possible, then treating a proportion of the population that exceeds 1-1/ R 0 will bring an outbreak under control ( Fine et al., 2011 ). Furthermore, the reproduction number (either R t or R 0 ) has multiple uses including: (i) forecasting future incidence of a pathogen , (ii) monitoring the impact of newly implemented control strategies ( International Ebola Response Team, 2016 ), and (iii) predicting current and future attack rates ( Ferguson et al., 2016 ).
Future preparedness, however, ultimately relies on the early assessment of future threats, ideally before a full blown outbreak or epidemic occurs ( Karesh et al., 2012 ). This requires characterising the transmissibility of pathogens that are responsible for short lived outbreaks which have become extinct without any preven-  ( Farrington et al., 2003;Farrington and Grant, 1999 ). Intuitively, the limited nature of such an outbreak could reflect a low transmissibility of the pathogen or it could reflect stochastic behaviour of the early transmission process.
Here, we provide properties and a characterisation of outbreaks that have become extinct in the absence of any intervention. A key parameter of such outbreaks is the reproduction number R , namely the mean number of new infections produced by a single infectious individual ( Cori et al., 2013;Wallinga and Teunis, 2004 ) . We shall assume that R remains constant throughout an outbreak that goes extinct.
Due to individual or environmental variation, there is variation in the number of new infections produced by different inf ected individuals. The number of new infections (secondary cases) thus follows a probability distribution which is usually referred to as the offspring distribution ( Farrington et al., 2003;Farrington and Grant, 1999 ). By definition, the mean of this distribution is the reproduction number R .

Choice of offspring distribution
Typically, the offspring distribution is modelled by either a Poisson distribution, a geometric distribution, or negative binomial distribution ( Garske and Rhodes, 20 08;Lloyd-Smith, 20 07;Lloyd-Smith et al., 2005 ). In this work we shall adopt a negative binomial form for the offspring distribution. This is a somewhat flexible choice since this distribution depends on two parameters and has two standard distributions as limiting cases (see below).
We write the negative binomial distribution as NB( δ, p ) where δ and p are parameters of the distribution, and lie in the ranges δ > 0 and 0 < p < 1. In terms of these parameters, the form of the negative binomial distribution we adopt is such that the mean and variance are given by pδ/ ( 1 − p ) and pδ/ ( 1 − p ) 2 , respectively .
In what follows, we shall take p = R/ ( R + δ) , in which case the mean and the variance of the offspring distribution are R and R + R 2 /δ, respectively. The mean, R , is thus the reproduction number and the quantity δ is conventionally referred to, in the literature, as the dispersion . Generally, large values of δ correspond to low levels of heterogeneity (i.e., a low variance) in realised values of the infectiousness, while small values of δ correspond to high levels of heterogeneity. For more details about heterogeneity, see the work of Garske and Rhodes (2008 ), Lloyd-Smith et al. (2005 ).
In the particular case of no over-dispersion ( δ = ∞ ), the negative binomial distribution reduces to a Poisson distribution, while when there is unit dispersion ( δ = 1 ) the negative binomial distribution reduces to a geometric distribution.

Modelling the dynamics
In modelling the dynamics of infection we shall, for the sake of simplicity, work with a discrete time model. We define the mean time between a primary case becoming infected and the resulting production of a secondary case as the generation time . The model then describes the number of new infections at discrete multiples of the generation time-i.e., in different generations.
We proceed by first defining the disease incidence. This is the random number of infected individuals at a given time. For generation t = 0, 1, 2, …, we denote the disease incidence by X t , and model the dynamics of disease incidence as a branching process ( Lloyd-Smith et al., 2005 ), based on an offspring distribution of negative binomial form. We then have where ξ j is the number of newly infected individuals (number of 'offspring') arising from the j th infected individual. The ξ j are independent and identically distributed (iid) random variables that are drawn from the negative binomial distribution described above, hence ξ j ∼ NB ( δ, R R + δ ) . This form of offspring distribution leads to the distribution of X t+1 , conditional on the value of X t , also having a negative binomial distribution, which is given by (see the Supplementary Information). Eq. (2) is a convenient representation since even in an appreciable population, only a single random number needs to be generated to produce all 'offspring' of a generation.

Two possible reproduction numbers
An outbreak that has become extinct can always be described by a 'small' reproduction number R s that lies below unity (thus R s is subcritical). However, as we show in the Supplementary Information, all dynamical properties of such an outbreak are completely indistinguishable from an outbreak characterised by a 'large' reproduction number R l that lies above unity (thus R l is supercritical), but where the incidence of the disease is conditioned so that it eventually achieves extinction. The two reproduction numbers satisfy the inequality R l > 1 > R s > 0.
Generally, all dynamical results calculated for R l , when conditioned on eventually achieving extinction, are identical to the results calculated for R s . Another way of saying this is that given the eventual occurrence of extinction, and given that there were X t cases in generation t , the probability of observing X t + s cases in generation t + s is the same for outbreaks with either R s or R l . This is ultimately a property of Eq. (1) and follows from mathematical results which apply for branching processes ( Athreya and Ney, 1972;Waugh, 1958 ). We shall refer to the identity of results for R s and R l as the symmetry of the problem.
To the best of our knowledge, this symmetry was first implied by Kendall (1956 ) in the context of stochastic dynamics of epidemics. The symmetry was then further explored in the context Markov processes ( Athreya and Ney, 1972;Daly, 1979;Waugh, 1958 ) and more recently in the context of infectious disease outbreaks ( Guttorp and Perlman, 2015 ).
In the Supplementary Information we use elementary approaches to provide intuition about the origin of this symmetry and establish some relevant results. Our demonstration of the symmetry, unlike previous ones does not rely on probability generating functions, but rather is derived directly from the properties of the transition matrix.
In what follows, when we discuss the large reproduction number R l , we shall often leave implicit the fact that the population has been conditioned upon the occurrence of eventual extinction.
To establish some useful results and relations, we have found it useful to introduce the quantity ɛ , which is the probability of eventual extinction of a disease outbreak when: (i) the outbreak starts with a single infected individual, and (ii) the reproduction number is larger than unity . We note that condition (i) readily generalises: when there are n infected individuals, the probability of eventual extinction is ɛ n , (see the Supplementary Information). Furthermore, because of condition (ii) eventual extinction is not inevitable and so ɛ < 1.
It is convenient to proceed using ɛ and δ as the independent parameters in the problem, and then determining other quantities in terms of these. We find that for ɛ < 1, independent of the actual initial size, n , of the outbreak, the two reproduction numbers are given by

Relationships involving the reproduction numbers and parameters
We note that Eq. (3b) relates the large reproduction number, R l , to the dispersion, δ, and to the probability of extinction given a single infected individual, ɛ . Thus given a reproduction number that is greater than unity, and given the value of the dispersion, we can use Eq. (3b) to compute the probability of extinction. This result allows us to illustrate that increasing the dispersion (i.e., decreasing δ) reduces the probability of extinction ( Blumberg and Lloyd-Smith, 2013a,b;Lloyd-Smith et al., 2005 ), see Fig. 2 A. We additionally note that using Eqs. (3a-c) ) allows us to determine a very simple relation between R s and R l , namely This can be interpreted as saying that the small reproduction number R s is a discounted version of the large reproduction number R l , with the discount a power of the extinction probability, ɛ , associated with R l .
Additionally, writing ε = e −α , ( 3b ) and ( 3c ) yield the exact re- for a geometric distribution of offspring Eq. (4) provides a simple way to map reproduction numbers that are above unity (supercritical R values) to dynamically equivalent reproduction numbers of a finite outbreak that all lie below unity (subcritical R values). The result of Eq. (4) also allows us to show how, for a given value of R l , the corresponding value of R s is higher as over-dispersion increases (i.e., δ decreases), see Fig. 2 B.

Outbreak size
Finite outbreaks are often quantified in terms of the outbreak size ( Farrington et al., 2003 ) (also called final size ), which is a random variable we denote by Z and which is defined by Z = ∞ t=0 X t . This random variable counts the total number of individuals who suffered inf ection during the entire course of the outbreak. In the Supplementary Information, we provide an analytical treatment of the outbreak size distribution, when the offspring distribution is a negative binomial. This treatment can be used to estimate the reproduction number given observation(s) of outbreak size(s). We also present a numerical method to evaluate the distribution of outbreak sizes associated with outbreaks that eventually become extinct.
We verify to high accuracy, with simulations, that the reproduction numbers R s and R l of Eqs. (3a) and ( 3b ) lead to the same distribution of outbreak sizes (see Fig. 1 A).

Outbreak duration
The outbreak duration ( Farrington and Grant, 1999 ) is a random time we denote by T ext , that corresponds to the number of generations where the incidence of the disease is different from zero. This also has identical dynamics for the reproduction numbers R s and R l of Eqs. (3a) and ( 3b ), see Fig. 1 B. An analytical treatment of the outbreak duration distribution can be found in the Supplementary Information.

Mean and standard deviation of final size and duration
For a finite outbreak, rather than characterising the complete distribution of the final-size and the complete distribution of the duration (as in Fig. 1 ), we note that knowledge of the mean and standard deviation of those distributions can be sufficient to characterise broad dynamical patterns ( Fine et al., 2011 ). In the Supplementary Information, we provide a simple formulation that allows precise numerical computation of the mean and variance of both the final size distribution and the duration distribution, see Fig. 2 C and D. Again, we show that the means and standard deviations for both of these distributions are identical when the reproduction number is R s and when it is R l and eventual extinction is conditioned upon.

Estimate of reproduction number
From a practical point of view, on observing an extinct outbreak of final size z (with z = 1 , 2 , 3 . . . ), the key issue is to estimate the reproduction number characterising the disease. Estimating such reproduction number involves computing the probability of observing a final outbreak size, given a value of the reproduction number, i.e. the likelihood of the parameter. Conditional on eventual extinction, given the symmetry in outbreak size distributions for either R l or R s , the individual likelihoods of observing the final size, given a reproduction number of either R l or R s are also identical ( Fig. 3 A).
However, 1 the full likelihood, i.e. unconditional on extinction, must be weighted by the probability of extinction having occurred. Given this weight is unity for R s and ɛ n for R l , the symmetry in the likelihood is lost after accounting for extinction, as illustrated in Fig. 3

B (see the Supplementary information).
Conveniently, when estimating the reproduction number, the mapping of R s onto R l , means that the likelihood can be evaluated only on the R s domain (i.e., 0-1) and the likelihood of R l follows by simply weighting the results on R s by the probability of extinction. We note that the increased level of randomness/uncertainty associated with a small outbreak leads, perhaps counterintuitively, to a greater chance of inferring a larger reproduction number than when a large outbreak is observed. For example, Fig. 3 C shows that there is a larger probability of R l being greater than 2, when the final outbreak size is small, than when the final outbreak size is large (e.g., Pr ( R l > 2 | Z = 1 ) > Pr ( R l > 2 | Z = 10 ) ).  3) and (4) , where one was greater than unity (red) and one smaller than unity (blue), using a negative binomial offspring distribution with δ = 0.5. The solid lines show the predicted distributions (see Supplementary Information). (An equivalent figure, using a Poisson offspring distribution, can be found in the Supplementary information. Finally, given that extinction occurs, the symmetry demonstrated in this work allows the mean and standard deviation of outbreak size (C) and duration (D), for R 's greater than unity, to be easily obtained from the analytical formulation of those quantities that applies for R 's below unity. The same likelihood profile, without conditioning on extinction. This could be interpreted as a posterior distribution of R given that Z cases were observed. (C) Probability that R is above a threshold, given a past outbreak size, Z . (D) Probability that a future outbreak will not naturally become extinct, given the same pathogen was previously observed to cause an outbreak of size Z : future risk. We assume a negative binomial offspring distribution with δ = 0.5. (Equivalent figures related to time to extinction and using a Poisson offspring distribution can be found in the SI).
Ultimately, given observation of a single outbreak of final size z, one would want to know the risk of a future outbreak of the pathogen running out of control, and leading to an epidemic. The probability of such an epidemic, which we write as Pr( Epidemic | Z = z ), can be formalised in a branching process as the probability that extinction will not occur in a future outbreak, given that a total of z cases were previously observed in an ultimately extinct outbreak. With FutureExt the event of extinction given reintroduction of the pathogen, we can write Pr( Epidemic | Z = z ) = 1 − Pr (F ut ureExt | Z = z) and this is illustrated in Fig. (3D) . Interestingly, this result allows us to show that Pr( Epidemic | Z = z ) decreases as the size of the initial extinct outbreak increases. This illustrates the effect, on prediction, of an increased uncertainty associated with a small outbreak.

Discussion
The analysis presented in this work provides a demonstration that the dynamics of a supercritical process conditioned on extinction is in every way identical to the dynamics of a subcritical process. Unlike previous demonstrations ( Guttorp and Perlman, 2015;Kendall, 1956;Waugh, 1958 ), we have relied on the properties of the transition matrix to derive the result. A benefit in this approach is to allow us to derive (i) explicit analytic results for mapping a supercritical reproduction number to its subcritical counterpart, and (ii) explicit analytic results for the expected distributions, means and variances of the final size and duration of the extinct outbreak. The results obtained are applicable when the offspring distribution is characterised by a Poisson, geometric or negative binomial distributions.
While the collapsing of supercritical dynamics conditioned on extinction onto subcritical dynamics has been known ( Guttorp and Perlman, 2015;Kendall, 1956;Waugh, 1958 ), it is clearly underexploited, or has not been appreciated in the literature of infectious disease dynamics. As highlighted by Nishiura et al. (2012 ), studies of minor outbreaks typically adopt ' the key assumption that a subcritical process ( i.e. R 0 < 1) resulted in the observation ' ( Blumberg and Lloyd-Smith, 2013a , b ;Cauchemez et al., 2013;Ferguson et al., 2004;Kucharski and Edmunds, 2015 ). However, we show the likelihood of a supercritical process can also be explored, therefore we see no reason to restrict any such analyses to subcritical process assumptions. We provide a simple way to explore the full range of reproduction number values (i.e. beyond the subcritical range) using likelihood framework. Such framework is easily applicable and has already been in part implemented in an R package (see http://github.com/reconhub/branchr) and used in a recent publication .
Like many other applications ( Blumberg and Lloyd-Smith, 2013a , b ;Cauchemez et al., 2013;Ferguson et al., 2004;Kucharski and Edmunds, 2015 ), our work assumes that the reproduction number remains constant during the duration of the outbreak. While public health and clinical interventions ( Cori et al., 2013 ), as well as behaviour ( Funk et al., 2009 ), can affect the reproduction number during an outbreak, we believe that it is reasonable to assume a constant reproduction number when the outbreaks are of limited durations and sizes. As well as there often being limited available time to intervene, for many diseases (including emerging zoonotic diseases), we have limited knowledge of how to curb transmission, which is further complicated by availability and access to treatment. Additionally, when interventions have been implemented, one could view the estimated reproduction number as an effective reproduction number, see e.g., Blumberg et al. (2015 ). In practical terms, once again, our conclusions best applies in the context of relatively small outbreaks of short duration, where the conditioning on extinction is realistic. For large outbreaks, it is likely that other factors such local depletion of the susceptible population or the implementation of public health interventions, played a stronger role in the extinction process than stochasticity by reducing the reproduction number.
Our work illustrates the importance of stochastic factors during the early phase of an outbreak ( Ferguson et al., 2004;Lloyd-Smith et al., 2005 ). The symmetrical property of the dynamics emerging for either a small R or a large R conditioned on extinction may appear initially counter-intuitive. However, a simple intuition may come from the following: when R is small, large outbreaks are unlikely to occur, while when R is large, large outbreaks are unlikely to remain finite, therefore after conditioning on extinction, only small outbreaks will be observed. The occurrence of extinction when R is larger than unity must not be under-estimated. For instance, perhaps slightly counterintuitively, when a disease reappears, a full-blown outbreak is more likely to occur following a small extinct outbreak rather than following a moderately large one, assuming the same transmissibility applies to the new and extinct outbreaks. However, as the size of an outbreak becomes larger, there is a shift in the impact of stochastic factors toward deterministic behaviour: moderately large (ultimately extinct) outbreaks reflect an R close to unity, but given extinction occurred despite a relatively large outbreak size, stochastic factors have a decreased influence, leading to higher confidence that R lies below 1. Conversely, a small outbreak may reflect a small R , but the large uncertainty means a large R cannot be ruled out.
Overall, we have considered the transmissibility of a disease, on observing a finite outbreak, and believe we have shed light on its nature and provided some useful analytical and numerical methods for its investigation.