Exploring how generation intervals link strength and speed of epidemics

Infectious-disease outbreaks are often characterized by the reproductive number R and exponential rate of growth r. The reproductive number R is of particular interest, because it provides information about how hard an outbreak will be to control, and about predicted final size. However, directly estimating R is difficult. In contrast, the rate of growth r can be estimated directly from incidence data while an outbreak is ongoing. R is typically estimated from r by using information about generation intervals – that is, the amount of time between when an individual is infected by an infector, and when that infector was infected. In practice, it is infeasible to obtain the exact shape of a generation-interval distribution and it is not always qualitatively clear how changes in estimates of the distribution translate to changes in the estimate of R. Here, we show that parameterizing a generation interval distribution using its mean and variance provides a clear biological intuition into how its shape affects the relationship between R and r. We explore approximations based on estimates of the mean and variance of an underlying gamma distribution, and find that use of these two moments can

Infectious-disease outbreaks are often characterized by the reproductive number R and exponential rate of growth r. The reproductive number R is of particular interest, because it provides information about how hard an outbreak will be to control, and about predicted final size. However, directly estimating R is difficult. In contrast, the rate of growth r can be estimated directly from incidence data while an outbreak is ongoing. R is typically estimated from r by using information about generation intervals -that is, the amount of time between when an individual is infected by an infector, and when that infector was infected. In practice, it is infeasible to obtain the exact shape of a generation-interval distribution and it is not always qualitatively clear how changes in estimates of the distribution translate to changes in the estimate of R. Here, we show that parameterizing a generation interval distribution using its mean and variance provides a clear biological intuition into how its shape affects the relationship between R and r. We explore approximations based on estimates of the mean and variance of an underlying gamma distribution, and find that use of these two moments can

Introduction
Infectious disease research often focuses on estimating the reproductive number, i.e., the number of new infections caused on average by a single infection. This number is termed the reproductive number -R. The reproductive number provides information about the disease's potential for spread and the difficulty of control. It is described in terms of an average [2] or an appropriate sort of weighted average [6].
The reproductive number has remained a focal point for research because it provides information about how a disease spreads in a population, on the scale of disease generations. As it is a unitless quantity, it does not, however, contain information about time. Hence, another important quantity is the population-level rate of spread, r. The initial rate of spread can often be measured robustly early in an epidemic, since the number of incident cases at time t is expected to follow i(t) ≈ i(0) exp(rt). The rate of growth can also be described using the "characteristic time" of exponential growth C = 1/r. This is closely related to the doubling time (given by T 2 = ln(2)C ≈ 0.69C).
In disease outbreaks, the rate of spread, r, can be inferred from caseincidence reports, e.g., by fitting an exponential function to the incidence curve [18,21,17]. Estimates of the initial exponential rate of spread, r, can then be combined with a mechanistic model that includes unobserved features of the disease to esimate the initial reproductive number, R. In particular, R can be calculated from r and the generation-interval distribution using the generating function approach popularized by [30].
The generation interval is the amount of time between when an individual is infected by an infector, and the time that the infector was infected [26]. While r measures the speed of the disease at the population level, the generation interval measures speed at the individual level. Generation interval distributions are typically inferred from contact tracing, sometimes in combination with clinical data [4,14,10]. Generation interval distributions can be difficult to ascertain empirically, and the generation-function approach depends on an entire distribution -which makes it difficult to determine which features of the distributions are essential to connect measurements of the rate of spread r, with the reproductive number, R.
Here, we explore the qualitative relationship between generation time, initial rate of spread r, and initial reproductive number R using means, variance measures and approximations. By doing so, we shed light on the underpinnings of the relationship between r and R, and on the factors underlying its robustness and its practical use when data on generation intervals is limited or hard to obtain. We are interested in the relationship between r, R and the generationinterval distribution, which describes the interval between the time an individual becomes infected and the time that they infect another individual. This distribution links r and R. In particular, if R is known, a shorter generation interval means a faster epidemic (larger r). Conversely (and perhaps counter-intuitively), if r is known, then faster disease generations imply a lower value of R, because more individual generations are required to realize the same population spread of disease [7,22] (see Fig. 1).
We define the generation-interval distribution using a renewal-equation approach. A wide range of disease models can be described using the model [9,30]: where t is time, i(t) is the incidence of new infections, S(t) is the proportion of the population susceptible, and K(s) is the intrinsic infectiousness of individuals who have been infected for a length of time s. We then have the basic reproductive number: and the intrinsic generation-interval distribution: The "intrinsic" interval can be distinguished from "realized" intervals, which can look "forward" or "backward" in time [5] (see also earlier work [26,19]). In particular, it is important to correct for biases that shorten the intrinsic interval when generation intervals are observed through contact tracing during an outbreak. Disease growth is predicted to be approximately exponential in the early phase of an epidemic, because the depletion in the effective number of susceptibles is relatively small. Thus, for the exponential phase, we write: where R = R 0 S. We then solve for the characteristic time C by assuming that the population is growing exponentially: i.e., substitute i(t) = i(0) exp(t/C) to obtain the exact speed-strength relationship: This fundamental relationship dates back to the work of Euler and Lotka [16]. We will explore the shape of this relationship using parameters based on human infectious diseases, and investigate approximations based on gammadistributed generation intervals.

Approximation method, theory
We do not expect to know the full distribution g(τ ) -particularly while an epidemic is ongoing -so we are interested in approximations to R based on limited information. We follow the approach of [21] and approximate the generation interval with a gamma distribution. This is a biologically more realistic starting point than the standard normal approximation used in many applications, since the gamma distribution is confined to non-negative values.
For biological interpretability, we describe the distribution using the mean G and the squared coefficient of variation κ (thus κ = 1/a, andḠ = aθ, where a and θ are the shape and scale parameters under the standard parameterization of the gamma distribution). Substituting the gamma distribution into (5) then yields the gamma-approximated speed-strength relationship: We write: where ρ =Ḡ/C = rḠ measures how fast the epidemic is growing (on the time scale of the mean generation interval) -or equivalently, the length of the mean generation interval (in units of the characteristic time of exponential growth). The longer the generation interval is compared to T c , the higher the estimate of R (see Fig. 1). We then explore the behaviour of the generalized exponential function X defined above (equivalent to the Tsallis "q-exponential", with q = 1 − κ [29]): its shape determines how the estimate of R changes with the estimate of normalized generation length ρ. For small ρ, X always looks like 1 + ρ, regardless of the shape parameter 1/κ, which determines the curvature: if 1/κ = 1, we get a straight line, for 1/κ = 2 the curve is quadratic, and so on (see Fig. 2). For large values of 1/κ, X converges toward exp(ρ). The approximate relationship (6) between mean generation time (relative to the characteristic time of exponential growth, ρ = rḠ =Ḡ/C) and reproductive number. The curves correspond to different amounts of variation in the generation-interval distribution.
The limit as κ → 0 is reasonably easy to interpret. The incidence is increasing by a factor of exp(ρ) in the time it takes for an average disease generation. If κ = 0, the generation interval is fixed, so the average case must cause exactly R = exp(ρ) new cases. If variation in the generation time (i.e., κ) increases, then some new cases will be produced before, and some after, the mean generation time. Since we assume the disease is increasing exponentially, infections that occur early on represent a larger proportion of the population, and thus will have a disproportionate effect: individuals don't have to produce as many lifetime infections to sustain the growth rate, and thus we expect R < exp(ρ).
The straight-line relationship for κ = 1 also has a biological interpretation. In our approximation, this corresponds to a generation distribution that is approximated by an exponential distribution. In this case, recovery rate and infection rate are constant for each individual. The rate of exponential growth per generation is then given directly by the net per capita increase in infections: R − 1, where one represents the recovery of an infectious individual.

Approximation method, in practice
We test our approximation method by generating a pseudo-realistic generation-interval distributions using previously estimated/observed latent and infectious period distributions for different diseases. For each pseudorealistic distribution, we calculate the "true" relationship between r and R and compare it with a relationship inferred based on gamma distribution approximations. These approximations are first done with large amounts of data, allowing us to evaluate how well the approximations describe the r-R relationship under ideal conditions, and then tested with smaller amounts of data.
Estimating generation intervals is complex; our goal with pseudo-realistic distributions is not to precisely match real diseases, but to generate distributions that are likely to be roughly as challenging for our approximation methods as real distributions would be. We construct pseudo-realistic intervals from sampled latent and infectious periods by adding the sampled latent period to a an infection delay chosen uniformly from the sampled infectious period: where G i , E i and I i are the sampled intrinsic generation interval, latent period, and infectious period, respectively, and U represents a uniform random deviate. This implicitly assumes that infectiousness is constant across the infectious period [8]. We sample from latent and infectious periods obtained from observations (for empirical distributions), or by using a uniform set of quantiles (for parametric distributions). For the purpose of constructing pseudo-realistic distributions, we do not attempt to correct for the fact that observed intervals may be sampled in a context more relevant to backward than to intrinsic generation intervals (see [5]). We sample latent periods at random, and infectious periods by length-weighted resampling (since longer infectious period implies more opportunities to infect). For our examples, we used 10000 quantiles for each parametric distribution and 10000 sampled generation intervals for each disease. We then calculate "exact" relationships (for our pseudo-realistic distributions) by substituting sampled generation intervals into the exact speedstrength relationship (5). This relationship is then compared to the corresponding gamma-approximated relationship (6).
All calculations, numerical analyses and figures were made with the software platform R [23]. Code is freely available at https://github.com/dushoff/ link calculations.

Results
We investigate this approximation approach using three different examples. These examples also serve to demonstrate that robust estimates could be made with less data and potentially earlier in an outbreak -a point we revisit in the Discussion. Our initial investigation of this question was motivated by work on the West African Ebola Outbreak [31], so we start with that example. To probe the approximation more thoroughly, we also chose one disease with high variation in generation interval (canine rabies), and one with a high reproductive number (measles). For simplicity, we assumed that latent and infectious periods are equivalent to incubation and symptomatic periods for Ebola virus disease (EVD), measles, and canine rabies.

Ebola
We generated a pseudo-realistic generation-interval distribution for Ebola virus disease (EVD) using information from [4] and a lognormal assumption for both the incubation and infectious periods. In contrast to gamma distributed incubation and infectious periods assumed by [4], we used a lognormal assumption for our components because it is straightforward and should provide a challenging test of our gamma approximation (see Appendix for results using gamma components). We used the reported standard deviation for the infectious period, and chose the standard deviation for the incubation period to match the reported coefficient of variation for the serial interval distribution, since this value is available and is expected to be similar to the generation interval distribution for EVD [4]. We then used our pseudo-realistic distribution to calculate both the exact (5) and gamma-approximated (6) speed-strength relationships (see Fig. 3). The approximation is within 1% of the pseudo-realistic distribution it is approximating across the range of country estimates, and within 5% across the range shown. It is also within 2% of the WHO estimates.

Measles
We also applied the moment approximation to a pseudo-realistic generationinterval distribution based on information about measles from [13], [15], and [3]. Incubation periods were assumed to follow a lognormal distribution [13]. Infectious periods were assumed to follow a gamma distribution with coefficient of variation of 0.2 [25,15,11]. Since variation in infectious period is relatively low [25,11], and infectious period is short compared to incubation period, this choice is reasonable (and our results are not sensitive to the details).
Here, we found surprisingly close agreement between the exact and approximate relationships between r and R across a much wider range of interest (a difference of < 1% for R up to > 20) (see Fig. 4). On examination, this closer agreement is due to the smaller overall variation in generation times in measles: when overall variation is small, differences between distributions have less effect.    (6), using the mean and CV of a pseudo-realistic generation-interval distribution. "moment" approximation is based on the observed mean and CV of the distribution whereas "MLE" approximation uses the mean and CV calculated from a maximumlikelihood fit. (dotted curves) Naive approximations based on exponential (lower) and fixed (upper) generation distributions. Gray horizontal lines represent R ranges estimated by [8]: 1.05 -1.32. (Right) histogram represents rabies generation interval distributions simulated from incubation and infectious periods observed by [8]. Dashed curves represent estimated distribution of generation intervals using method of moments and MLE (corresponding to approximate speed-strength relationships of the left figure).

Rabies
We did a similar analysis for rabies by constructing a pseudo-realistic generation-interval distribution from observed incubation and infectious period distributions (see Fig. 5). Since estimates of R for rabies are near 1, there is small difference between the naive estimates and the gamma approximated speed-strength relationship. But, looking at the relationship more broadly, we see that the moment-based approximation would do a poor job of predicting the relationship for intermediate or large values of R -in fact, a poorer job than if we use the approximation based on exponentially distributed generation times. The reason for poor predictions of the moment approximation for higher R can be seen in the histogram shown in Fig. 5. The moment approximation is strongly influenced by rare, very long generation intervals, and does a poor job of matching the observed pattern of short generation intervals (in particular, the moment approximation misses the fact that the distribution has a density peak at a finite value). We expect short intervals to be particularly important in driving the speed of the epidemic, and therefore in determining the relationship between r and R. We can address this problem by estimating gamma parameters formally using a maximum-likelihood fit to the pseudo-realistic generation intervals. This fit does a better job of matching the observed pattern of short generation intervals and of predicting the simulated relationship between r and R across a broad range (Fig. 5 Table 1: Parameters that were used to obtain theoretical generation distributions for each disease. Reproduction numbers are represented as points in figure Fig. 3-5. Ebola parameters in triples represent Sierra Leone, Liberia, Guinea.

Discussion
Estimating the reproductive number R is a key part of characterizing and controlling infectious disease spread. The initial value of R for an outbreak is often estimated by estimating the initial exponential rate of growth, and then using a generation-interval distribution to relate the two quantities [30,26,19,27]. However, detailed estimates of the full generation interval are difficult to obtain, and the link between uncertainty in the generation interval and uncertainty in estimates of R are often unclear. Here we introduced and analyzed a simple framework for estimating the relationship between R and r, using only the estimated mean and CV of the generation interval. The framework is based on the gamma distribution. We used three disease examples to test the robustness of the framework. We also compared estimates based directly on estimated mean and variance of of the generation interval to estimates based on maximum-likelihood fits. The gamma approximation for calculating R from r was introduced by [21], and provides estimates that are simpler, more robust, and more realistic than those from normal approximations (see Appendix). Here, we presented the gamma approximation in a form conducive to intuitive understanding of the relationship between speed, r, and strength, R (See Fig. 2). In doing so, we explained the general result that estimates of R increase with mean generation, but decrease with variation in generation times [30]. We also provided mechanistic interpretations: when generation intervals are longer, more infection is needed per generation (larger R) in order to produce a given rate of increase r. Similarly, when variance in generation time is low, there is less early infection, and thus slower exponential growth, also meaning that the outbreak corresponds to a larger R.
We tested the gamma approximation framework by applying it to parameter regimes based on three diseases: Ebola, measles, and rabies. We found that approximations based on observed moments closely match true answers (based on known, pseudo-realistic distributions, see Sec. 4 for details) when the generation-interval distribution is not too broad (as is the case for Ebola and measles, but not for rabies), but that using maximum likelihood to estimate the moments provides better estimates for a broader range of parameters Sec. 4.3, and also when data are limited (see Appendix).
Our key finding is that summarizing an entire generation interval distribution using two moments can give sensible and robust estimates of the relationship between r and R (see Appendix). This framework has potential advantages for understanding the likely effects of parameter changes, and also for parameter estimation with uncertainty: since R can be estimated from three simple quantities (Ḡ, κ and r), it should be straightforward to propagate uncertainty from estimates of these quantities to estimates of R.
For example, during the Ebola outbreak in West Africa, many researchers tried to estimate R from r [1,4,20,24,12] but uncertainty in the generationinterval distribution was often neglected (but see [28]). During the outbreak, [31] used a generation-interval argument to show that neglecting the effects of post-burial transmission would be expected to lead to underestimates of R. Our generation interval framework provides a clear interpretation of this result: as long as post-burial transmission tends to increase generation intervals, it should result in higher estimates of R for a given estimate of r. Knowing the exact shape of the generation interval distribution is difficult, but quantifying how various transmission routes and epidemic parameters affect the moments of the generation interval distribution will help researchers better understand and predict the scope of future outbreaks.
Office grant W911NF-14-1-0402.  Figure S1: We perform the same analysis as we did in Sec. 4.1 assuming gamma distributed incubation and infectious periods. We find that the gamma approximated speed-strength relationship matches the true relationship almost perfectly in this case. Once again, we adjust the standard deviation of the incubation period to match the reported coefficient of variation in serial interval distributions. Rest of the parameters and points as in Fig. 3 Figure S2: Approximating generation-interval distributions with a normal distribution has two problems. First, the distribution extends to negative values, which are biologically impossible. Second, as a consequence, the normal approximation predicts a saturating and eventually a decreasing r−R relationship for large r. Parameters and points as in Fig. 3.

Robustness of the gamma approximation
The moment-matching method (approximating R based on estimated mean and variance of the generation interval) has an appealing simplicity, and works well for all of the actual disease parameters we tested (the breakdown for rabies distributions occurs for values of R well above observed values). We therefore wanted to compare its robustness given small sample sizes along with that of the more sophisticated maximum likelihood method. Fig. S3 shows results of this experiment. When sample size is limited, estimates using MLE tend to be substantially close to the known true values in these experiments. As we increase sample size, our estimates become narrower. We also find that using the gamma approximated speed-strength relationship gives narrower estimates than the two naive estimates even when the sample size is extremely small (n = 10). It is important to note that Fig. S3 only conveys uncertainty in the estimate of coefficient of variation of generation interval distributions. Estimation of mean generation interval will introduce additional uncertainty into estimates of the reproductive number. Relative length of generation interval (ρ) Reproductive number moment mle Figure S3: The effect of small sample size on approximated relationship between r and R. (black solid curve) The relationship between growth rate and R using a known generation-interval distribution (see Fig. 3). (colored curves) Estimates based on finite samples from this distribution: dashed curves show the median and solid curves show 95% quantiles of 1000 sampling experiments. Note that the upper 95% quantile of the moment approximation and MLE approximation overlap. (dotted curves) Naive approximations based on exponential (lower) and fixed (upper) generation distributions.