Smaller variance in the infectious periods leads to larger reproduction number and more severe outbreak for network epidemics

For a recently derived pairwise model of network epidemics with non-Markovian recovery, we prove that under some mild technical conditions on the distribution of the infectious periods, smaller variance in the recovery time leads to higher reproduction number, and consequently to a larger epidemic outbreak, when the mean infectious period is fixed. We discuss how this result is related to various stochastic orderings of the distributions of infectious periods. The results are illustrated by a number of explicit stochastic simulations, suggesting that their validity goes beyond regular networks.


Introduction
Networks provide a useful paradigm to incorporate contact patterns and various heterogeneities within a population [20,19,11]. The basic ingredients of such models are nodes and links, usually representing individuals and the contacts between them, but they may represent also groups of individuals (such as the population at some geographic location), and the connectedness of these groups (such as transportation routes [10,18]). In simple disease outbreak models, the status of an individual can be susceptible (S), infected (I) or recovered (R). A key parameter associated with most epidemic models is the basic reproduction number (denoted by R 0 ), which denotes the expected number of secondary infections generated by a typical infected individual introduced into a fully susceptible population [4]. The reproduction number is also a threshold quantity: if R 0 < 1 the epidemic will die out, while if R 0 > 1 the disease will spread. Another important measure of epidemic severity is the final epidemic size, which is the total number of individuals who become infected during the time course of the epidemic. These two quantities are often connected via the so-called final size relation. In these simple models that assume a fully mixed population, the final fraction that is not infected s ∞ solves the implicit relation s ∞ = S(0)e −R 0 (1−s ∞ ) .
If infected individuals transmit with constant rate β , then in this well-mixed model R 0 = β E(I ) where E(I ) is the average infection duration, and so variance in the distribution of infection duration does not affect the final size [14,13].
Modelling epidemics on networks however increases the complexity of the models since the underlying population structure means that individuals are not interchangeable. Thus we must track which individuals are in each status rather than simply how many individuals are in each status. For example, in the most fundamental case of Markovian transmission and recovery, both time to infection and the time spent as infected and infectious is taken from exponential distributions with appropriate rates. Even for the purely Markovian case we need to deal with a continuous time Markov chain with a discrete state space with 3 N elements, where three stands for the three possible states a node can be in (S, I and R) and N denotes the number of nodes in the network. Writing down evolution equations for the probability of the system being in any of these states is possible but impractical due to the high dimensionality of the system. Hence, in order to deal with this complexity one need to employ some 'clever' averaging.
Probabilistic methods, such as branching processes can be used to deal with the early growth and the asymptotic behaviour [1], with percolation theory also leading to good analytical treatment for the early growth and final size [15]. For the later dynamics, we generally need to derive a mean-field model, e.g. a low dimensional system of ODEs.
There are many well established ways to derive mean-field models. Perhaps the most compact method is the so called edge based compartmental model (EBCM) [17] which has been successfully used to capture SIR dynamics with arbitrary transmission and infection processes [25] on configuration-like networks. The EBCM provides an excellent approximation of the exact stochastic network epidemic, which becomes exact in some appropriate limits and conditions on the underlying network [5,6].
Another powerful method to model epidemic spread on network is provided by the message passing approach [7] and this works for arbitrary transmission and recovery processes but at the expense of a system consisting of a large number of integro-differential equations.
In addition, pairwise models have been successfully used to approximate stochastic epidemics on networks and represent a vast improvement on compartmental models. Pairwise models also have the advantage of being easy to understand and very intuitive when compared to the EBCM or the message passing model.
All the above are able to capture the time evolution of the epidemic while also offering insights about the epidemic threshold and final size. All these models have the same starting point and not surprisingly it can be shown that often these models are equivalent [16,25,11] and they simply represent different choices of how one averages and how the reduced state space is defined [11].
While dealing with the complexity and the modelling of contact structures, the dynamics of the disease needs to be accounted for appropriately. It is well known that the duration of the infectiousness has a major impact on whether an outbreak happens and how many people it affects as being a key parameter in the basic reproduction number. To highlight a recent example, in the West-African ebola outbreak one crucial part of the intervention strategy was to reduce the length of the post-mortem infectious period [2]. In this paper we bridge the gap by considering a model that can capture both the complexity of contact structure as well as the features of the disease itself. To do this we consider pairwise models with Markovian infection but arbitrary recovery process and we focus on the outbreak threshold derived from this model and its dependence on the choice of the recovery process. The paper is structured as follows. First, we introduce the pairwise model, the analytical final epidemic size relation followed by the newly introduced basic pairwise reproduction number R p 0 . The main result of the paper is on the relation between the variance in the distribution of the recovery process and the basic pairwise reproduction number. This is followed by some discussion of our results with respect to the concept of stochastic ordering, and the possible extension of our results to heterogeneous networks. We conclude with extensive numerical results and a discussion of our findings.

Model
where τ is the per contact infection rate and γ is the recovery rate. Here [S] , where n is the average number of links per node, leads to the self-consistent system [8] [S](t) = −τ[SI](t), [SI](t) = τ n − 1 n Closing at the level of pairs with the approximation [XY ] = n[X] [Y ] N , one obtains the so called mean-field model (or compartmental model) with basic reproduction number where, E(I ) = 1/γ is the expected infectious period. The final size relation associated to the mean-field model is where S 0 is the number of susceptible individuals at time t = 0 and There are many results for the Markovian pairwise models [3,8,11], for example, the final epidemic size is given by where [S] 0 is the number of susceptible individuals at time t = 0 and

Non-Markovian Recovery
The Markovianity of the recovery process is a strong simplifying assumption. For many epidemics, the infectious period has great importance and it is measured empirically. Recently, pairwise approximations of the SIR dynamics with non-Markovian recovery have been derived, see [12,27,21,22]. In the special case of fixed recovery time σ , the meanfield model is given by while the pairwise model turned out to be [12] [S](t) = −τ[SI](t), Both systems are now delay differential equations rather than ordinary differential equations, as is the case for Markovian epidemics. In [12], the following final epidemic size relation has been derived: Considering a general distribution for the recovery period, the pairwise model can be formulated as a system of integro-differential equations [22,27], which is given bẏ Above we assume that the infection process along S-I links is Markovian with transmission rate τ > 0. The recovery part is considered to be non-Markovian given by a random variable I , with a cumulative distribution function F I (a) and probability density function f I (a). We use the associated survival function ξ (a) = 1 − F I (a) and hazard function . We note that ϕ(a) is the initial condition which gives the age of infection of individuals at time t = 0.
From Eq. (10), the associated mean-field model can be easily deduced by using the closure approximation formula for homogeneous networks (i.e. n-regular graphs) thus the node-level system becomeṡ

The Pairwise Reproduction Number and Infectious Times
In [12], a newly introduced basic reproduction-like number is defined for fixed length infectious periods as which appears also in equation (9). It has also been shown, that for arbitrary infectious periods, the basic reproduction number of the pairwise model is where L [·] is the Laplace transform and f I is the probability density function of the recovery process given by the random variable I . Numerical tests and analytical results have both confirmed that, in general, the following implicit relation for the final epidemic size holds Several important observations can be made. The first is around the interpretation of the Laplace transform of f I . Let us consider an isolated S-I link, and let E be the exponentially distributed random variable of the time of infection along this link, with parameter τ. Then the probability of transmission is the same as the probability that infection occurs before recovery, that is Hence, the Laplace transform has natural interpretation and enters the calculation of the probability of transmission across an isolated S-I link.
The intuitive derivation for R p 0 follows from considering the rate at which new S-I links are created. From (10d), and focusing on the single positive term on the right hand side, it follows that S-I links are created at rate τ(n−1) Notice that while R 0 depends on the expected value only, see (4), the pairwise reproduction number (14) uses the complete density function, thus the average length of the infectious period itself does not determine exactly the reproduction number. As a consequence, for an epidemic we have to know as precisely as possible the shape of the distribution. We shall analyse how the basic reproduction number (14), which is not only an epidemic threshold but also determines the final size via (15), depends on the variance of the recovery time distribution. In [21], using gamma, lognormal and uniform distributions we showed that within each of those distribution families, once the mean infectious period is fixed, smaller variance in the infectious period gives a higher reproduction number and consequently a more severe epidemic. Next we generalize this result without restricting ourselves to special distributions.

Main Result: Relationship Between the Variance and the Reproduction Number
In this section we give some simple conditions which may guarantee that smaller variance induces higher pairwise reproduction number. We consider a random variable I corresponding to recovery times with probability density functions f I (t), cumulative distribution function F I (t) = t 0 f I (s)ds and we shall use the integral function of the CDF F I (t) := t 0 F I (s)ds. Clearly, d 2 dt 2 F I (t) = d dt F I (t) = f I (t). Moreover, F I (0) = F I (0) = 0.
Theorem 1 Consider two random variables I 1 and I 2 such that and Var(I 1 ) < Var(I 2 ) < ∞.
Assume that and for all t > 0, holds. If I 1 and I 2 represent the recovery time distribution, then for the corresponding reproduction numbers the relation R p 0,I 1 > R p 0,I 2 holds.
Since F I (t) ≥ 0,t ≥ 0 and monotone increasing, the integral function of CDF F I (t) is monotone increasing and convex. Using (20) and (22), we obtain for all t > 0. Clearly, for R p 0,I 1 > R p 0,I 2 , it is enough to prove, that First, we perform some algebraic manipulation on the left-hand side: In conclusion, we have therefore L [ f I 1 ](τ) < L [ f I 2 ](τ), which gives R p 0,I 1 > R p 0,I 2 .
Remark 1 While one can easily construct a specific example for which the technical condition (19) does not hold, it is satisfied by all epidemiologically meaningful distributions, since extremely long infectious periods do not occur in epidemics. It trivially holds for distributions with compact support, and even for power law distributions with finite variance.
Corollary 1 Assume that the conditions of Theorem 1 hold. Then the infectious period distribution with smaller variance induces a larger epidemic outbreak.
Proof Let z = s 1 n ∞ . Then, (15) can be written as which, since we are interested in the root z ∈ (0, 1), simplifies to Clearly larger R p 0 results in smaller z, that means smaller s ∞ thus larger epidemic. Combining this with Theorem 1 yields the result.

Relation to Stochastic Ordering
In a very recent work [28], Wilkinson and Sharkey considered a general class of network based stochastic epidemic models, and proved a monotonic relationship between the variability of the infectious period and the probability that the infection will spread to an arbitrary subset of the population by time t. Below we show that, while the work [28] was done in a different context, the main conclusion is essentially the same as what follows from our main result. In [27], the variability was represented by the convex order of the distributions of infectious periods. Given two random variables I 1 and I 2 whose expectations exist, such that I 1 is said to be smaller than I 2 in the convex order, denoted by I 1 ≤ cx I 2 , see monograph [23] for a comprehensive description of various stochastic orders, their properties and relations.
Proof From the convexity of φ (x) = x and φ (x) = −x, (17) follows, and the convexity of φ (x) = x 2 yields (18). From the convexity of φ a (x) = (x − a) + , Theorem 3.A.1 in [23] deduced that I 1 ≤ cx I 2 if and only if F I 1 (t) ≤ F I 2 (t) for all t > 0. Now instead of the strict inequality of (23), we have less or equal, but from (18) the two functions are not identical, hence analogously to the proof of Theorem 1 we can conclude (24), which completes the proof.

Remark 2
The distribution I 1 is said to be smaller than I 1 in the Laplace transform order, denoted by . Now clearly the ordering of reproduction numbers R p 0,I are tied up to the Laplace order of the underlying distributions, and Theorem 1 can be viewed as providing easily verifiable sufficient conditions for Laplace ordering. There are examples in the literature (see [24]), showing that the Laplace transform order is different from the convex order. Hence, the pairwise reproduction number approach can be applied in some situations that are not covered by the convex order approach.

Implications for Heterogeneous Degree Distributions
In a Configuration-Model network, given a random S-I link, we expect the susceptible individual to have degree k with probability proportional to k[S k ] where [S k ] is the number of susceptible individuals with degree k.
Repeating our earlier derivation of equation (14) for R p 0 in the homogeneous network case, we anticipate that for fixed duration σ , where E(k) is the average degree.
Extending this to the case of heterogeneous infection duration, we find It can be shown [15,9,19] that the final number of degree k individuals infected is given by where the following implicit relation holds: Here θ ∞ is a per-edge measure of the probability of not being infected. So an initially susceptible individual with degree k remains susceptible with probability θ k ∞ . The role of θ ∞ is the same as s 1/n ∞ in Eq. (6). Note that in R p 0 , the terms capturing the distribution of infection durations separate from the terms capturing the distribution of degrees. The ordering of R p 0 as the infection duration distribution changes is independent of the degree distribution. So the ordering of R p 0 is the same as found in the regular networks. The final size depends monotonically on the Laplace transform of f I , and so the results about the ordering of final sizes in regular networks carry over to heterogeneous networks as well.

Numerical Simulations and Conclusion
The role of the shape of the distribution of infectious periods in disease spread has been in the interest of modellers for some time [26]. Our previous works already indicated that for pairwise models of network epidemic, not only the mean, but higher order properties of the distribution of the recovery times have an impact on the outcome of the epidemic. We derived useful threshold quantities for non-Markovian recovery in [12]. In [21], we showed that for particular distribution families (typically two parameter families such as gamma, lognormal, and uniform distribution), smaller variance leads to higher reproduction number within the same family when the mean is fixed. Our new result in this study allows us to make comparisons between distributions of different kinds. To show the usefulness of Theorem 1, as an example, we consider I 1 ∼ Exp(γ) and I 2 ∼ Fixed 1 γ , i.e. f I 1 (t) = γe −γt ,t ≥ 0 and f I 2 (t) = δ t − 1 γ , where δ (t) denotes the Dirac delta function. Clearly, we obtain F I 1 (t) = t + 1 γ e −γt − 1 γ and F I 2 (t) = (t − 1 γ ) + , thus there is no t 0 > 0, such that F I 1 (t 0 ) = F I 2 (t 0 ). Since E(I 1 ) = E(I 2 ) = 1 γ , 1 γ 2 = Var(I 1 ) > Var(I 2 ) = 0 and the other conditions of Theorem 1 are satisfied, we find R p 0,I 1 < R p 0,I 2 . We have carried out extensive numerical simulations to test the final epidemic size formula (15), with R p 0 taken from (29), for Fixed, Uniform, Gamma, Exponential, Lognormal, Weibull distributed infection times on regular (see Fig. 1), Erdős-Rényi (see Fig. 3) and truncated scale-free (see Fig. 5) networks. It is worth noting that the same final size relation can be obtained by combining equations (30), (31) and that for R p 0 for the heterogenous degree distributions. The agreement between the analytical final epidemic size and explicit stochastic network simulations is excellent for all distributions and networks. The parameters, mean and variance of the distributions are given in Table 1.
Several observations can be made. In Figs. 1, 3 and 5 one can note that the epidemic threshold depends heavily on the distribution of the infectious period. While all distributions have the same mean, they differ in terms of their variance. In fact, the variance of the distributions are ordered as shown in Table 1. Based on Theorem 1 and Corollary 1 we know that for a fixed transmission rate τ and for infectious period distributions with the same mean, the distribution with the higher variance will lead to a smaller R p 0 and hence smaller attack rate. This confirms that the ordering of the variances in Table 1 is reflected accurately in all attack rate versus τ plots. Moreover, the insets in Figs. 1, 3 and 5 shows that the final epidemic size relation in terms of R p 0 is universal, independently of how the infectious periods are distributed. For the truncated scale-free networks in Fig. 5, the attack rate behaves differently but the general analytical final epidemic size relation remains extremely accurate. Obviously high degree heterogeneity leads to large variance and this makes the value of R p 0 to be large and above threshold even for small values of τ.

Figures 2, 4 and 6
show the initial growth of the epidemic. The relation between variance and attack rate seems to translate into a straightforward association between variance and initial growth rate. Namely, distributions with higher variance leads to slower initial growth. This is not always the case since R p 0 is a generation rather than time based measure. However, here the mean of the distributions and the transmission rates are identical and thus the ordering seems to carry through.
As next steps one could consider the extension of R p 0 and the final size formula for epidemics where both the infection and transmission processes are non-Markovian. Such results already exist [25] but there an EBCM was used. It would also be appropriate to explore the applicability of this newly introduced pairwise reproduction number given that it lent itself to derive a number of analytical results and it fits with the network and contact concepts. In particular one would explore how could this be measured in practice and how does its value translate into control measures. Epidemic sizes in a regular network. We consider the outbreak sizes in a random network with 10 6 nodes all having degree k = 10. We take distributions of infection duration having mean 3/2 and plot the final proportion infected given different transmission rates τ. The inset shows that all final epidemic sizes collapse on a universal curve when using R p 0 as the horizontal axis. The parameters, mean and variance of the distributions are given in Table 1.  Table 1.   Epidemic sizes in a scalefree network. We look at epidemics in a truncated scale free network with 10 6 nodes having minimum degree 2 and maximum degree 954 and each degree k assigned with probability proportional to k −2 . This yields an average degree of approximately 10.
Epidemics exist even at very small τ, and R p 0 is significantly larger than in the other networks. Using the heterogeneous R p 0 all curves collapse on a universal curve. The parameters, mean and variance of the distributions are given in Table 1.  Figure 6 Dynamics of epidemics in scalefree network. Taking τ = 0.15, we show two epidemic curves for each distribution from Fig. 5. On the right we see the early dynamics. There is significant heterogeneity in the early growth, even for the same distribution. This is because the timing of rare infections to the highest degree nodes plays a significant role even in networks of 10 6 nodes.