Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Estimation of the Basic Reproductive Number and Mean Serial Interval of a Novel Pathogen in a Small, Well-Observed Discrete Population

  • Kendra M. Wu ,

    kendrawu@connect.hku.hk

    Affiliations School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom

  • Steven Riley

    Affiliation Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom

Abstract

Background

Accurately assessing the transmissibility and serial interval of a novel human pathogen is public health priority so that the timing and required strength of interventions may be determined. Recent theoretical work has focused on making best use of data from the initial exponential phase of growth of incidence in large populations.

Methods

We measured generational transmissibility by the basic reproductive number R0 and the serial interval by its mean Tg. First, we constructed a simulation algorithm for case data arising from a small population of known size with R0 and Tg also known. We then developed an inferential model for the likelihood of these case data as a function of R0 and Tg. The model was designed to capture a) any signal of the serial interval distribution in the initial stochastic phase b) the growth rate of the exponential phase and c) the unique combination of R0 and Tg that generates a specific shape of peak incidence when the susceptible portion of a small population is depleted.

Findings

Extensive repeat simulation and parameter estimation revealed no bias in univariate estimates of either R0 and Tg. We were also able to simultaneously estimate both R0 and Tg. However, accurate final estimates could be obtained only much later in the outbreak. In particular, estimates of Tg were considerably less accurate in the bivariate case until the peak of incidence had passed.

Conclusions

The basic reproductive number and mean serial interval can be estimated simultaneously in real time during an outbreak of an emerging pathogen. Repeated application of these methods to small scale outbreaks at the start of an epidemic would permit accurate estimates of key parameters.

Introduction

Uncertainty dominates important early policy decisions for emerging respiratory pathogens such as Severe Acute Respiratory Syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), and pandemic influenza A/H1N1. This uncertainty is reduced greatly once the basic reproductive number and the serial interval of infection are known. The basic reproductive number quantifies the transmissibility of a pathogen and is defined as the average number of secondary infections generated by one typically infectious individual in an otherwise susceptible population. The serial interval in each infector-infectee pair is the time between disease onset in the infector and the onset in the infectee.

In this study, we present a temporal likelihood model that allows real-time simultaneous estimation of R0 and the average of serial interval Tg in a small well-observed population, so that it provides timely information for informed public health responses. We build on previous inferential studies [14] by focusing on capturing unbiased parameter estimates as the epidemic leaves the exponential phase. Our two illustrative scenarios were designed to have similar exponential growth phases and are based loosely on pandemic influenza A/H1N1 and SARS-CoV infections.

Methods

We first define the underlying stochastic process that simulates an epidemic, before describing our calculation of the likelihood of data arising from this simulated process and the parameter estimation based on this likelihood. The required inputs of the likelihood estimation are; the number of infected people from outside the community who triggered the epidemic, the size of susceptible in the community, and the case incidence time-series. Conversely, the primary outcomes are estimates of the basic reproductive number, R0, and the mean serial interval, Tg.

Simulate epidemic process

In the analysis, we defined our simulation process to produce the random time series of new cases with onsets at time 1 to T. Specific realisations of X were defined to be x = {x1, …, xj, …, xT}.

We defined serial interval by assuming the probability that an individual with symptom onset on day i generates a new infection on day i + τ is (1) where the right-hand side is the probability mass τ for a Poisson distribution with mean Tg. Implicitly, this assumption about the serial interval implies that infectiousness was not constant for individuals in the days following their infection. The value of k, where kTg, was always sufficiently high that the probability of a secondary case was negligible after k relative to the infector’s onset time i. The offspring distribution [5] also assumed to be Poisson distributed with mean R0.

To initiate the simulation we assumed that there were n initially infectious individuals in a population of size N. Those initially infectious were assumed to have onsets at time t1. The simulation algorithm kept track of the onsets of the total number of infected individuals at each time unit t = {t1, …, tj, …, tT} during the observed period. Symbolically, the total number of infected individuals having onsets prior to and at time j was denoted as Ij, where . For example I1 = n.

For an arbitrary infector of onset time i, we first simulated the number of secondary cases it had infected from the offspring distribution of mean R0, fO(R0), without considering the depletion of the susceptible population. For each of these secondary cases, we then drew the delay from the onset of the infector i to the onset of each secondary case from the serial interval distribution. The onsets fall between {i + 1, …, i + k} with mean i + Tg. This process yielded the secondary case time-series x for a specific infector that had an onset time i.

When the number of infectees becomes significant relative to the susceptible population size, saturation effects must be accounted for. The effects are two-fold [6, 7]: first, through susceptible depletion, the number of secondary cases per primary case is reduced until there are no longer any susceptible people in the homogeneous population, which is when the number of secondary cases per primary case is reduced. Second, when there are many infectious cases, competition to infect the remaining susceptible shortens the serial interval. In this manuscript, we took the serial interval distribution to mean the distribution in the absence of competing sources of infection. The mean Tg was the average over of this distribution in the absence of competition.

In order to represent the finite pool of susceptible individuals, we defined the actual number of new cases with onsets at time j that accounted for the depletion of the susceptible population as , and they were drawn from the as-yet uninfected pool using a binomial distribution, with probability of infection equal to the ratio of zj and N: (2)

We repeated this process for each infector at time i and added up the secondary cases between {i + 1, …, i + k}. It should be noted that each infector produced different numbers of secondary cases and each of these infector-infectee pairs had different serial intervals. Each of these secondary cases were assumed to be infectious and were considered subsequent infectors if i > j. We repeated this process for all infectors with onset time between 1 and T. The overall case time-series X was the sum of each of the realization of secondary case time-series x of each infector.

Likelihood

We defined Yj to be the expected number of secondary cases of infectors Xi from {jk, …, j − 1} with onsets at time j that could be expressed as, (3) where fs was the Poisson density of serial interval.

We note that the formulation used above may not be correct for offspring and serial interval distributions that are not Poisson.

Therefore, the likelihood of a whole times series of observations could then be expressed in terms of the expected numbers of secondary cases in each time unit and the parameters of the model, when accounting to the finite pool of susceptible, as such: (4) where was the probability mass function of a binomial distribution, and Yj is defined in Eq (3).

Standard maximum likelihood were used to obtain point estimates and confidence intervals. We used the optim function in R [8] to find point estimates and the optimise function to find univariate confidence regions.

To initiate an epidemic and ensure that it would form a chain of transmission to establish an exponential phase of growth without early extinction, we decided to seed fifty infectious individuals on day 1 in each of our simulated epidemics. The subsequent number of incidences were then generated by the stochastic process described in the Simulate epidemic process.

All results in the paper can be reproduced using R code available in the github repository c97sr/EpiInference.

Illustrative parameter regimes

We constructed two illustrative scenarios to investigate the properties of our model and likelihood function. Scenario 1: R0 = 1.8, Tg = 2.5 when using Poisson-distributed serial interval distribution; and Scenario 2: R0 = 3.0, Tg = 6.25 when using Poisson-distributed serial interval distribution. We chose these scenarios because they cover the range of SARS in hospitals (R0 = 2.7 [9]), pandemic A/H1N1 influenza in the community (R0 = 1.5 [10, 11]) and pandemic A/H1N1 in schools (R0 = 2.4 [12]). Then we compute Tg so that they had growth rate during the exponential phase was 0.32 according to [13]. Unless stated otherwise, we seeded day 1 with fifty individuals to avoid the epidemic prematurely died out. In each scenario, 1,000 people were assumed to be living in this community, where all but fifty were assumed to be susceptible initially. This population size was chosen to be consistent with a small town or large village in many social settings.

When using Poisson-distributed serial interval, we set k = 10 in Scenario 1 to simulate influenza infections [14] and k = 25 in Scenario 2 to loosely mimic SARS infections [15, 16]. They were chosen so that these values were at least four times greater than Tg and the probability of a secondary case after k, relative to the onset time i of the infector, was negligible. The effect of the truncation of the distribution was subsequently pushed to the right-hand tail of the estimate time-series when (i + k) > T.

Results

The different stages of an epidemic pass quickly in an outbreak of respiratory infections in a small population (Fig 1). In our simulated population of 1,000 people for an influenza-like pathogen (Scenario 1) there was an average delay of only 16.5 days (median, 95% prediction interval: 8.4, 26.2) from the introduction of 10 infectious individuals to peak incidence. The more transmissible, but slower Scenario 2 still achieved a peak of incidence after 23 days (18.1, 29.2). In addition to an early peak of incidence, partial saturation occurred very quickly in this population. An initial exponential increase in incidence can be seen on the log y-axes in Fig 1 (as a straight line). However, this exponential phase ends quickly in such a small population. In Scenario 1, incidence appears to be occurring sub-exponentially as early as 10 days after the initial seed.

thumbnail
Fig 1. Simulated epidemics.

Part A shows Scenario 1 with R0 = 1.8, Tg = 2.5. Part B shows Scenario 2 with R0 = 3.0, Tg = 6.25. Offspring and the serial interval assumed to follow Poisson distribution. In both scenarios, 1,000 individuals were initially susceptible and 10 people were infectious on day 1. Grey lines trace 5 single example realisations. Black lines show average case incidence, and the green dotted lines are the 5% and 95% prediction intervals from 30 realisations.

https://doi.org/10.1371/journal.pone.0148061.g001

The speedy transition from exponential growth to partial depletion of susceptible individuals and then peak incidence permitted rapid and accurate univariate parameter estimation. We attempted to recover either R0 or Tg (but not both at this stage) from simulated data similar to those described above. As would be expected, for both Scenario 1 (Fig 2a and 2b showed 8 arbitrary epidemics and prediction intervals of 100 epidemics) and Scenario 2 (Fig 3a and 3b showed 8 arbitrary epidemics and prediction intervals of 100 epidemics), we were able to obtain accurate values of R0 prior to the peak of the epidemics when Tg was known, even during the earliest exponential phase of incidence. Estimates of R0 did not become any more accurate once the peak of incidence had passed. Although the pattern of univariate inference for Tg was similar to that for R0, the final confidence intervals were wider (in relative terms) than for R0.

thumbnail
Fig 2. Univariate and bivariate parameter estimation from simulated daily data for Scenario 1.

Part a shows real-time univariate estimates of R0 in which the true value of Tg was assumed known. Part b shows real-time univariate estimates of Tg with R0 assumed known. The results of jointly estimating R0 and Tg are shown in parts c (R0) and d (Tg). Scenario 1 is as per Fig 1A with R0 = 1.8, Tg = 2.5. The epidemic was simulated in a population of 1,000 people all of whom were initially susceptible, other than 50 who were infectious on day 1. Grey lines trace the corresponding estimates of 8 arbitrary epidemics. Black lines represent the average of 100 estimates at that time point, red lines represent the median, and the green dotted lines are the 5% and 95% prediction intervals for 100 epidemics. The dark blue solid horizontal lines show the true parameter values.

https://doi.org/10.1371/journal.pone.0148061.g002

thumbnail
Fig 3. Univariate and bivariate parameter estimation from simulated daily data for Scenario 2.

Part a shows real-time univariate estimates of R0 in which the true value of Tg was assumed known. Part b shows real-time univariate estimates of Tg with R0 assumed known. The results of jointly estimating R0 and Tg are shown in parts c (R0) and d (Tg). Scenario 2 is as per Fig 1B with R0 = 3.0, Tg = 6.25. The epidemic was simulated in a population of 1,000 people all of whom were initially susceptible, other than 100 who were infectious on day 1. Grey lines trace the corresponding estimates of 8 arbitrary epidemics. Black lines represent the average of 100 estimates at that time point, red lines represent the median, and the green dotted lines are the 5% and 95% prediction intervals for 100 epidemics. The dark blue solid horizontal lines show the true parameter values.

https://doi.org/10.1371/journal.pone.0148061.g003

We extended our analysis of the univariate parameter estimation routines to test the model’s performance (Fig 4 for Scenario 1). For a day 50 in the simulated epidemic of 50 realizations, we held R0 and Tg constant in the simulated epidemics while varying the values of a) R0 estimates, denoted as , or b) Tg estimates, denoted as , by ±1.5 in the likelihood function. Next, we repeated the process by c) varying R0 from 1.0 to 3.0 and d) varying Tg from 1.1 to 10.0 in the simulated epidemics, then computed the residuals of c) and d) from the likelihood function respectively as previously done.

thumbnail
Fig 4. Extended assessment of univariate inference for Scenario 1.

Sensitivity analysis using univariate estimations at single time point and assumed poission-distributed serial interval. a and b show the distribution of the marginals for estimation of R0 and Tg of 50 realisations respectively. All estimates were made at day 50. c shows the mean and confidence intervals for the marginals for estimates for R0 across a wide range of known values for Tg. Similarly, d shows the mean and confidence intervals for Tg across a wide range of value for R0. Scenario 1 is as per Fig 1A with R0 = 1.8, Tg = 2.5, and k = 10.

https://doi.org/10.1371/journal.pone.0148061.g004

Nevertheless, it was possible to jointly estimate both R0 and Tg in this small population. We assumed that neither Tg nor R0 were known for an emerging pathogen and attempted to estimate them jointly for Scenario 1 (Fig 2c and 2d showed 8 arbitrary epidemics and prediction intervals of 100 epidemics) and Scenario 2 (Fig 3c and 3d showed 8 arbitrary epidemics and prediction intervals of 100 epidemics). Compared to the univariate cases, for both scenarios, reliable information was obtained on both R0 and Tg. However, prediction intervals were wider. Once the peak had been reached, the convergence of estimates of Tg in bivariate case were achieved, but in relative terms, the accuracy was considerably lower.

The estimates of different values of k were tabulated on Table 1 for the model using Poisson-distributed serial interval for the estimates on the last day of observation using different numbers of iterations: (i) 10, (ii) 20, (iii) 100, along with different k values.

thumbnail
Table 1. Univariate and bivariate estimates of the model using Poisson distributed serial interval, different k values, and number of simulation iterations.

The following parameters were used: (a) R0 = 1.8, Tg = 2.5, k = 10; (b) R0 = 1.8, Tg = 2.5, k = 25; (c) R0 = 3.0, Tg = 6.25, k = 10; (d) R0 = 3.0, Tg = 6.25, k = 25. PI: prediction interval.

https://doi.org/10.1371/journal.pone.0148061.t001

Discussion

We have described a quantitative method that allows the rapid joint estimation of the basic reproductive number, and mean and variance of serial interval for an emerging pathogen based only on case data from an outbreak in a small population. These estimates show no signs of bias and could be available quickly for small populations. In essence, this approach is able to tease apart the two parameters from case data because it extracts key information from the transmission dynamics of different phases of a respiratory epidemic: the initial stochastic period of growth, the deterministic period of exponential growth, the sub-exponential period of growth prior to peak incidence, and the peak itself. Previous similar work [1, 2, 4] has incorporated either the initial stochastic phase or one or more of the deterministic phases, but not all phases together.

The serial interval distribution plays a crucial role in the estimation accuracy. It is because when the variance was large, the estimator had more difficulty to tease apart the two parameters from the observed incidence data, especially during phase II and beyond. In this model, the serial intervals are assumed to follow a Poisson distribution. When it has large variance and thus less distinguishable between generations of infection comparing to a shape Gamma distribution, the joint estimations have poorer performance, although they remain reasonably accurate in infinite population [3]. In finite population, the estimations of showed no bias (Table 1) if it had sufficiently large seed size.

By seeding fifty and a hundred infectious people on day 1 in each epidemic, the number of incidences at the early stage of the outbreaks would likely be higher and plausibly reached the exponential phase of growth earlier than if there were smaller number of infectious individuals seeded. However, with the same transmissibility and speed of transmission, higher number of incidences and large seed size appear to have no impact to the accuracy of estimates aside from avoiding outbreaks prematurely died out.

Our study is intended as an initial theoretical illustration of the potential for this approach to be applied more widely. As such, it suffers from a number of limitations caused directly by the restrictive assumptions we have made. For example, we have assumed that all cases in this small population were observed. However, despite the possibility of intense surveillance in a small population, it is unlikely that even all symptomatic cases will be detected. Also, most pathogens generate some mild or entirely asymptomatic cases that would evade even the most intense surveillance efforts. Although we doubt, based on related previous study [17] that the reporting of only a proportion of cases would bias results (if that proportion is constant), it would most likely increase the delay from the start of the epidemic to the acquisition of robust parameter estimates.

We have also assumed throughout that offspring distribution and serial interval distribution follow a Poisson density. These are convenient assumptions that allow very rapid likelihood calculations for it is a single-parameter defining function. We anticipate few theoretical issues in the extension of these approaches to include distributions with more than one parameter (other than careful modification of Eq (3) as indicated above). However, the ability of these methods to independently estimate both means and variances of the offspring distribution and serial interval distribution remains to be described. We would anticipate that the estimation of variance for the serial interval distribution would be challenging [18] for cases other than where there are very high volumes of data for the first few generations (which could arise from a large seeding event) or when the true distribution has a low variance.

There are likely substantial opportunities to gain valuable insights into outbreak dynamics if we relax our assumption that transmissibility is constant over time and thus consider an inferential framework that accounts for both changes in transmissibility and depletion of susceptible individuals. The reproduction number would vary over time because of both effects. This approach might be especially useful for severe infections that affect small populations, such as Ebola.

Acknowledgments

KMW thanks MRC Centre for Outbreak Analysis and Modelling of Imperial College London for hosting her visiting research fellowship and thanks LKS Faculty of Medicine of The University of Hong Kong for funding her stay in London, where the foundation of the theoretical work on this paper was partially completed. KMW also acknowledges scholarship support from John Swire & Sons and The University of Hong Kong, as well as funding from the LKS Faculty of Medicine of The University of Hong Kong.

Author Contributions

Conceived and designed the experiments: SR. Performed the experiments: KMW. Analyzed the data: KMW SR. Wrote the paper: KMW SR.

References

  1. 1. White LF, Pagano M. A likelihood-based method for real-time estimation of the serial interval and reproductive number of an epidemic. Stat Med. 2008;27(16):2999–3016. pmid:18058829
  2. 2. White LF, Wallinga J, Finelli L, Lyn C, Riley S, Lipsitch M, et al. Estimation of the reproductive number and the serial interval in early phase of the 2009 influenza A/H1N1 pandemic in the USA. Influenza Other Respir Viruses. 2009;3(6):267–276. pmid:19903209
  3. 3. Griffin JT, Garske T, Ghani AC, Clarke PS. Joint estimation of the basic reproduction number and generation time parameters for infectious disease outbreaks. Biostatistics. 2011;12(2):303–312. pmid:20858771
  4. 4. Wallinga J, Teunis P. Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. Am J Epidemiol. 2004;160(6):506–516.
  5. 5. Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Superspreading and the effect of individual variation on disease emergence. Nature. 2005;438:355–359. pmid:16292310
  6. 6. Svensson A. A note on generation times in epidemic models. Math Biosci. 2006;208(2007):300–311. pmid:17174352
  7. 7. Kenah E, Lipsitch M, Robins JM. Generation interval contraction and epidemic data analysis. Math Biosci. 2008;213(1):71–79. Available from: http://www.sciencedirect.com/science/article/pii/S0025556408000394. pmid:18394654
  8. 8. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2013. ISBN 3-900051-07-0. Available from: http://www.R-project.org/.
  9. 9. Riley S, Fraser C, Donnelly CA, Ghani AC, Abu-Raddad LJ, Hedley AJ, et al. Transmission dynamics of the etiological agent of SARS in Hong Kong: Impact of public health interventions. Science. 2003;300(5627):1961–1966. pmid:12766206
  10. 10. Fraser C, Donnelly CA, Cauchemez S, Hanage WP, Kerkhove MDV, Hollingsworth TD, et al. Pandemic Potential of a Strain of Influenza A (H1N1): Early Findings. Science. 2009;324(5934):1557–1561. pmid:19433588
  11. 11. Boëlle PY, Ansart S, Cori A, Valleron AJ. Transmission parameters of the A/H1N1 (2009) influenza virus pandemic: a review. Influenza Other Respi Viruses. 2011 September;5(5):306–316.
  12. 12. Yang Y, Sugimoto JD, Halloran ME, Basta NE, Chao DL, Matrajt L, et al. The Transmissibility and Control of Pandemic Influenza A (H1N1) Virus. Science. 2009;326(5953):729–733. pmid:19745114
  13. 13. Anderson RM, May RM. A framework for discussing the population biology of infectious diseases. In: Infectious disease of humans: dynamics and control. Oxford Univ Press; 1992. p. 17–19.
  14. 14. Lau LLH, Cowling BJ, Fang VJ, Chan KH, Lau EHY, Lipsitch M, et al. Viral Shedding and Clinical Illness in Naturally Acquired Influenza Virus Infections. J Infect Dis. 2010;201(10):1509–1516. pmid:20377412
  15. 15. Anderson RM, Fraser C, Ghani AC, Donnelly CA, Riley S, Ferguson NM, et al. Epidemiology, transmission dynamics and control of SARS: the 2002–2003 epidemic. Philos Trans R Soc Lond B Biol Sci. 2004;359(1447):1091–1105. pmid:15306395
  16. 16. Donnelly CA, Ghani AC, Leung GM, Hedley AJ, Fraser C, Riley S, et al. Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong. Lancet. 2003;361:1761–1766. pmid:12781533
  17. 17. Kucharski A, Mills H, Pinsent A, Fraser C, Van Kerkhove M, Donnelly CA, et al. Distinguishing Between Reservoir Exposure and Human-to-Human Transmission for Emerging Pathogens Using Case Onset Data. PLoS currents. 2013;6.
  18. 18. Kucharski AJ, Mills HL, Donnelly CA, Riley S. Transmission potential of Influenza A(H7N9) virus, China, 2013–2014. Emerg Infect Dis. 2015;21(5):852–855. pmid:25897624