Stochastic two-group models with transmission dependent on host infectivity or susceptibility

ABSTRACT Stochastic epidemic models with two groups are formulated and applied to emerging and re-emerging infectious diseases. In recent emerging diseases, disease spread has been attributed to superspreaders, highly infectious individuals that infect a large number of susceptible individuals. In some re-emerging infectious diseases, disease spread is attributed to waning immunity in susceptible hosts. We apply a continuous-time Markov chain (CTMC) model to study disease emergence or re-emergence from different groups, where the transmission rates depend on either the infectious host or the susceptible host. Multitype branching processes approximate the dynamics of the CTMC model near the disease-free equilibrium and are used to estimate the probability of a minor or a major epidemic. It is shown that the probability of a major epidemic is greater if initiated by an individual from the superspreader group or by an individual from the highly susceptible group. The models are applied to Severe Acute Respiratory Syndrome and measles.


Introduction
In the twenty-first century, one of the biggest threats to public health is the spread of infectious diseases [35]. According to the World Health Organization, population mobility and the interconnectedness of the world's population have contributed to the global spread of infectious diseases [31,35]. Infectious diseases are one of the leading causes of death globally and the third leading cause of death in the United States [30]. Emerging and re-emerging infectious diseases include Severe Acute Respiratory Syndrome (SARS), Middle East Respiratory Syndrome (MERS), Ebola, Zika, influenza A, yellow fever, cholera, measles, pertussis, tuberculosis, and AIDS [35].
Recent emerging diseases have been attributed to highly infectious individuals, called superspreaders, who spread the disease to a large number of people [25]. Historically, superspreading events came to the forefront at the beginning of twentieth century. In the early 1900s, an individual referred to as Typhoid Mary was identified as a superspreader; she infected 51 people before being recognized as the source of infection [23]. Superspreading events have been identified in disease outbreaks of SARS, MERS, and Ebola [40]. Superspreading events are due to many reasons including physiological, immunological, and behavioural differences in individuals or due to emergence of a new pathogen or exposure to a new environment [34]. During the SARS outbreak in China in 2003, a total of 774 people died and 8096 were infected [40]. The index case, a superspreader, was responsible for infecting 125 people. During the MERS outbreak in 2015 in South Korea, 186 people were infected, and 36 people died [40]. Two superspreaders were responsible for infecting 106 people [40].
The waning of vaccine effectiveness or loss of protection from prior infection also cause some re-emerging diseases. Individuals may be highly susceptible to infection due to waning immunity. For example, in pertussis the immunity from vaccination remains effective for 4-12 years, whereas immunity from prior infection is much longer, 4-20 years [32,38]. For measles, the immunity from vaccination can protect against re-infection for up to 20 years for two doses but is less effective for only one dose [22]. In a measles outbreak in 2011, 110 high school students were infected; the attack rate was much less for students with two vaccine doses [11].

al. extended the model for an MERS epidemic in South
Korea with an SEIJR model, where J represented infectious individuals that were isolated. In a mathematical model for SARS with superspreaders, Mkhatshwa and Mummert [27] included four different classes in their model, susceptible, infected, superspreader, and recovered classes. In a two-group model applied to spread of SARS [26], transmission between patients in a hospital and outside visitors were analysed in terms of the final epidemic size. These preceding models were ordinary differential equations (ODEs). In the papers by James et al. [20] and Lloyd-Smith et al. [25], discrete-time branching processes were employed to relate transmission heterogeneity to superspreading events and to fit models to particular outbreaks. Lloyd-Smith et al. [25] showed for a heterogeneous population that average quantities such as the basic reproduction number do not adequately describe the effect of individual variation. Edholm et al. [13] developed a continuoustime Markov chain (CTMC) two-group model for superspreaders and non-superspreaders. They applied their stochastic model to MERS and Ebola, performed extensive numerical simulations, and used branching process approximations to compare disease epidemics initiated by superspreaders to non-superspreaders. Similar to the model of Edholm et al. [13], we apply stochastic two-group models, CTMC models, and branching process approximations. We extend the two-group setting to consider the case where the transmission rate is dependent on either infectious or susceptible hosts, applicable to emerging and re-emerging infectious diseases. In addition, we prove there is a difference in probability of a minor or major epidemic, that is dependent on the particular group. It is verified that the probability of a major epidemic is greater if initiated by an infected individual from the highly infectious group (superspreader) or if initiated by an infected individual from the highly susceptible group. We discuss applications to SARS and measles.
In the next section, a two-group SIR model for epidemic spread is described, a system of ODEs. This model serves as a framework for formulation of two stochastic generalizations, CTMC models that include groups with either high and low infectivity or high and low susceptibility. In Section 4, continuous-time multitype branching processes are used to approximate the probability of a minor or a major epidemic. In Section 5, we show that the results extend to SEIR models when the group sizes are constant. We apply these models to potential SARS [26] and measles epidemics [11]. The sensitivity of R 0 and the probability estimates to group size and transmission and recovery rates are also investigated.

Two-group SIR model
Consider a SIR model for a heterogeneous population with two groups. The model has two susceptible, two infectious and two recovered groups with different transmission and recovery rates for each group. The model has the following form: for k = 1, 2. The transmission rate from an infected individual in group j to a susceptible individual in group k is β kj > 0 and the recovery rate of an infected individual in group k is γ k > 0. The total population size N = 2 k=1 N k = 2 k=1 (S k (t) + I k (t) + R k (t)) is constant and each group size N k is constant. The initial conditions are non-negative, S k (0) > 0, I k (0) ≥ 0, k I k (0) > 0 and R k (0) = 0 for k = 1, 2. For simplicity, we let R(t) = 2 k=1 R k (t) denote all recovered individuals. Two specific models are considered that distinguish between transmission dependent on the infectivity or the susceptibility of the host population. They are referred to as Model 1 and Model 2, respectively. In Model 1, the spread of disease is a result of high infectivity of the host, as in emerging diseases, such as MERS or SARS, where the presence of superspreaders may result in a large outbreak [8,33]. In Model 2, disease spread is a result of the lack of immunity in the host population from an infectious disease that may have either previously circulated in the population, as in re-emerging diseases, such as measles or pertussis [11,38].
For the two models, we assume that the transmission rate is separable β kj = α k λ j [36]. Either transmission depends primarily on the infectivity of the host, α k λ j = β I j or the susceptibility of the host α k λ j = β S k . The basic reproduction number R 0 , the average number of secondary infections generated by one infected individual, is computed using the next generation approach [12,19,36]. The basic reproduction number is for m = I, S.

Model 1
In Model 1, the population is divided into two groups, where each group is defined by the ability to transmit the infection. The two groups are referred to as non-superspreaders and superspreaders. Superspreaders have been identified in diseases such as SARS, MERS, and Ebola [40]. Group 1 represents non-superspreaders, where S 1 and I 1 denote susceptible and infected non-superspreaders and in group 2, S 2 and I 2 denote susceptible and infected superspreaders. Individuals remain superspreaders or non-superspreaders during the course of the epidemic. Susceptible individuals S 1 infected by an individual from group I 1 or I 2 become infected non-superspreaders (I 1 ) and susceptible individuals S 2 infected by I 1 or I 2 become infected superspreaders (I 2 ). Therefore, the transmission rates depend on the infective group, β I 1 or β I 2 . A compartmental diagram for Model 1 is given in Figure 1. The basic reproduction number for Model 1 follows from Equation (2). That is, In the special case that all individuals are either non-superspreaders or superspreaders (S 1 + I 1 + R 1 = N or S 2 + I 2 + R 2 = N), the basic reproduction number simplifies to either R 0 = β I 1 /γ 1 or R 0 = β I 2 /γ 2 , respectively.

Model 2
Model 2 is applicable to re-emerging diseases, such as measles or pertussis. The population is divided into two groups, dependent on the susceptibility to infection. One group of individuals has immunity to the disease due to vaccination or to a prior infection of the disease referred to as the low-susceptible group (group 1) and a second group of individuals has little or no immunity as they have either not been vaccinated or the vaccine has waned over time. Group 2 is referred to as the high-susceptible group. Variabies S 1 and I 1 represent the number of susceptible and infected individuals from the low-susceptible group 1 and S 2 and I 2 are the number of susceptible and infected individuals from the high-susceptible group 2. Individuals from S 1 after being infected by either I 1 or I 2 become I 1 and individuals from S 2 after being infected by either I 1 or I 2 become I 2 . The transmission parameters depend on host susceptibility for each group, β S 1 or β S 2 . A  compartmental diagram of Model 2 is given in Figure 2. The basic reproduction number is If there is a single group, either low-susceptible or high-susceptible, (S 1 + I 1 + R 1 = N or S 2 + I 2 + R 2 = N), then the basic reproduction numbers simplify to R 0 = β S 1 /γ 1 or R 0 = β S 2 /γ 2 , respectively.

CTMC models
We formulate a stochastic CTMC model with discrete random variables for the number of individuals. Stochastic variability may lead to a minor epidemic with a few individuals infected instead of a full scale epidemic with many individuals infected. For simplicity, the same notation for the variables is used for the CTMC model as for the ODE model. That is, The values of these random variables are denoted by lower case variables, s k , i k , and r.

Model 1
The infinitesimal transition probabilities define the CTMC Model 1, e.g. [1,21]. For example, the transition from a susceptible to an infectious individual for the non-superspreader group is defined as follows: A list of the four events corresponding to either infection or recovery is given in Table 1. The solution dynamics for Model 1 are illustrated for the two infectious classes in Figure 3 for the ODE and CTMC Model 1 when R 0 = 2.86 with population size  N = 1000 and the highly infectious group size N 2 = 100. The group reproduction numbers are β I 1 /γ 1 = 0.4 and β I 2 /γ 2 = 25. In Figure 3(a), an epidemic is initiated by a nonsuperspreader, I 1 (0) = 1 and I 2 (0) = 0, whereas in Figure 3(b), the epidemic is initiated with a superspreader, I 1 (0) = 0 and I 2 (0) = 1. The ODE solution and two sample paths of the CTMC model are plotted in each figure. For the two sample paths in Figure 3(a), only one sample path yields a major epidemic, while the other path yields a minor epidemic with a few infectious cases. However, in Figure 3(b), both sample paths yield a major epidemic. It is notable that an epidemic occurs earlier when initiated with a superspreader than with a non-superspreader [13].

Model 2
The CTMC for Model 2 can be defined in a similar way as in Model 1 with the exception that the transmission rates depend on host susceptibility. The infinitesimal transition probabilities are given in Table 2.
An example of the solution dynamics of the infectious classes are plotted for Model 2 in Figure 4 when R 0 = 2.86. The same parameter values are applied as in Figure 3, except the population size is N = 5000 with high-susceptible group size N 2 = 500. Two sample paths of the CTMC Model 2 are graphed with the ODE solution. To obtain an outbreak size similar to that in Model 1 (Figure 3), a larger population size is assumed.

Multitype branching process approximation
The CTMC model is approximated by a Galton-Watson multitype branching process (BP) near the disease-free equilibrium (DFE) [5,18]. In the BP approximation, it is assumed that the population is at the DFE and a small number of infectious individuals is introduced into the population. The goal is to use the BP approximation to estimate the probability of a minor or a major epidemic. The probabilities for an infection or a recovery are defined for an infectious individual in each group, through a generating function. The transmission dynamics for infectious individuals within each group I 1 and I 2 act like a multitype birth and death process, where a birth is a new infection and death is a recovery. The probability of a minor epidemic in the CTMC model corresponds to the probability of ultimate extinction (as t → ∞) in the birth and death process. It is verified in Appendix 1 from the backward Kolmogorov equations and from the assumption about independence of I 1 and I 2 that for a small number of infectious individuals, the probability of ultimate extinction is the minimal fixed point of the offspring probability generating functions (pgfs). An offspring pgf for I 1 has the general form Table 3. Branching process approximation for Model 1.
where P 1 ( , m) is the probability there are infectious individuals of type I 1 and m of type I 2 generated from one infectious individual of type 1 given I 1 (0) = 1 and I 2 (0) = 0. A similar definition applies to I 2 , given I 1 (0) = 0 and The independent assumption in the BP approximation and the differential equations derived from the Kolmogorov equations yield estimates for the probability of ultimate extinction. Applying this estimate to the CTMC model, it can be interpreted as the probability of a minor epidemic, given I 1 (0) = i 1 and [3][4][5]18,39]. In addition, the probability of a major epidemic in the CTMC model is approximately P major = 1 − P minor .

Model 1
The BP approximation of the CTMC model near the DFE is a linear approximation of the rates, S 1 = N 1 and S 2 = N 2 . These rates are defined in Table 3.
For Model 1, the offspring pgf for non-superspreaders, given I 1 (0) = 1 and I 2 (0) = 0, is is the probability of recovery of a non-superspreader, β I 1 ( is the probability of infecting another nonsuperspreader and β I 2 (N 2 /N)/(γ 1 + β I 1 ) is probability of infecting a superspreader. The offspring pgf for superspreaders, given I 1 (0) = 0 and I 2 (0) = 1, is is the probability of recovery of a superspreader, β is the probability of infecting another superspreader, and is the probability of infecting a non-superspreader [3]. Application of multitype BP theory, Equations (5) and (6) and the backward Kolmogorov differential equations lead to an ODE for probability of disease extinction at time t [3,5,18]. The steady-state solutions of this equation are the fixed points of pgfs (5) and (6) (probability of ultimate extinction). When R 0 > 1, there are at least two steady-states, one with (q 1 , q 2 ) = (1, 1) and at another one with q i < 1. The minimal fixed point is stable and this point yields the probability of ultimate extinction (or the probability of a minor epidemic in the CTMC model). When R 0 ≤ 1, there is only one fixed point, q i = 1, so that the probability of ultimate disease extinction is equal to one. Details of this derivation are given in Appendix 1.
In Figure 5, the dashed curves are probability of a minor epidemic when initiated by a non-superspreader q 1 or a superspreader q 2 . The points marked by 'x' are estimates for the probability of a minor epidemic calculated from 10,000 sample paths of the CTMC model. A threshold of 20 is assumed to indicate an outbreak. The peak level of infection for both infected groups in the deterministic model reaches values greater than 20 for R 0 > 1.5, therefore, this a reasonable assumption for numerical computations. In particular for each sample path, if I 1 + I 2 ≥ 20, then it is considered a major epidemic, but if the total number infected hits zero before reaching 20, then it is classified as a minor epidemic. Figure 5 shows that the CTMC estimate of a minor epidemic (proportion hitting zero from 10,000 sample paths) agrees with the BP estimate. Figure 5 illustrates that the probability of a minor epidemic when initiated by a nonsuperspreader is greater than when initiated by a superspreader. We verify this result always holds true when R 0 > 1 and β I 1 /γ 1 < β I 2 /γ 2 . The proof is relegated to Appendix 3.

Theorem 4.1:
Assume that R 0 > 1 and β I 1 /γ 1 < β I 2 /γ 2 in the BP approximation for Model 1. Then the probability of a minor epidemic when initiated by a non-superspreader is greater than the probability of a minor epidemic when initiated by a superspreader, In the special case of a single group, the result of Theorem 4.1 simplifies to the classic result of Whittle [39]. Whittle verified for the classic SIR CTMC model,

Model 2
The transition rates for the BP approximation of the CTMC Model 2 near the DFE are given in Table 4.
The offspring pgf for an individual from the low-susceptible group (I 1 (0) = 1, I 2 (0) = 0) is given by where is the probability of infecting an S 2 individual. The offspring pgf for an individual from the high-susceptible group (I 1 = 0, I 2 = 1) is given by where N)) is the probability of infecting an S 2 individual, and β S 2 ( N)) is the probability of infecting an S 1 individual (see, e.g. [3]).
In Figure 6, the values of the probability of a minor epidemic q 1 and q 2 as estimated from the BP approximation are compared with a numerical estimate computed from 10,000 sample paths of the CTMC. Of these 10,000 sample paths, the proportion of sample paths such that I 1 + I 2 hits zero before reaching a value of 20 is assumed to be an estimate of q 1 or q 2 . (We choose 20 as a threshold, as the peak level of infection for both infected groups in the deterministic model is greater than 20).
In the stochastic Model 2, it is the magnitude of the recovery rates rather than transmission rates that differentiates the probability estimates for a minor epidemic between the two groups. The proof of the following theorem is given in Appendix 4.

Theorem 4.2:
Assume R 0 > 1 and γ 1 > γ 2 in the BP approximation of Model 2. Then the probability of a minor epidemic when initiated by an infected individual from the low-susceptible group is greater than when initiated by an infected individual from the highsusceptible group, q 1 > q 2 . In addition, if R 0 ≤ 1, then q 1 = 1 = q 2 . In the special case γ 1 = γ 2 , q 1 = 1/R 0 = q 2 . ( I 1 , I 2 ) Probability

SEIR two-group model with multiple latent stages
We show that the previous results for multitype BP extend to models with m latent or exposed stages, E k,1 , . . . , E k,m , k = 1, 2 for the case that the group sizes are constant. During a latent or exposed stage, individuals are infected but not yet infectious, meaning they do not spread the infection to others. Multiple exposed stages account for the delay prior to the infectious period. In particular, the ODE model for stages S k and R k is the same as in (1) but the other stages have the form: where the new parameter α k,m is the transition rate between the k and k+1 latent stage, , with initial conditions S k (0) > 0, E k (0) ≥ 0, I k (0) > 0 and R k (0) = 0. New infections are due to the infectious stages I k . We make the same assumptions regarding the transmission rates as in model (1), β jk = β I j for Model 1 and β jk = β S k for Model 2. Another important assumption is that there are no deaths. Each group size is constant. Applying the next generation matrix approach [36], it follows that R 0 has the same expression as in Equation (2).
For the corresponding CTMC SEIR model, the infinitesimal transition rates for the latent stages are In the BP approximation of the CTMC SEIR model with two groups and m latent stages, there are 2m+2 pgfs. However, as there are no deaths in any of the 2 m latent stages, given E k,j (0) = 1, it follows that the jth latent stage progresses to the next latent stage E k,j+1 or to I k with probability one. It follows that the probability of disease extinction beginning from any latent stage E k,j is the same as the probability of disease extinction beginning from I k (Appendix 3). Therefore, Theorems 4.1 and 4.2 are applicable to the SEIR models with two groups.
In the next two sections, we apply Model 1 and 2 with one latent stage to hypothetical models of SARS and measles. The BP approximation is used to estimate the probability of a major epidemic and the estimate is compared to CTMC simulations.

Application to SARS
SARS, a recent emerging disease, was first identified in the early part of the twenty-first century. In 2003, an outbreak of SARS occurred in Singapore with a total 206 cases, as reported in [16]. Three superspreaders were identified who infected a total 76 individuals. The index case was a superspreader who was admitted to a hospital on March 1 and was isolated on March 6. By this time, the index case had infected 24 individuals. Two other superspreaders infected 25 and 27 individuals, respectively. Magal et al. [26] formulated an ODE two-group model with no latent stage, where the two groups are individuals inside and outside of a hospital setting. We use their model as motivation to formulate a hypothetical model for SARS, but applying the assumptions in Model 1 with a latent stage, a homogeneous mixing population, and two transmission rates for the infectious individuals, β I i , i = 1, 2. We use some parameter values comparable to Magal et al. [26] for population sizes N = 2500 and N 2 = 300 (Magal: N = 2300 and N 2 = 300), β I 1 = 0.15 (Magal: β 11 N = 0.184 outside of the hospital), and the recovery rate for group 1, γ 1 = 0.8 (Magal: η 1 = 0.4 outside of the hospital). We consider a range of values for β I 2 and γ 2 (Table 6), where β I 2 ≥ β I 1 and γ 2 ≤ γ 1 . Generally, SARS patients are contagious when they exhibit symptoms [29]. The latent period reported by the Centers for Disease Control is 2-7 days [29]. Therefore, we assume the transition rate from E k to I k is α k = 1/5 per day for groups k = 1,2. Estimates for the basic reproduction number for SARS in Singapore include a range of 2.3 to 4 prior to a issuance of a global alert [37]. With control efforts in place for SARS, estimates for the basic reproduction number range from 0.49 to 2.47 [10]. We explore a wider range of R 0 values in Table 6. A summary of the parameter values for our example of the two-group SEIR Model 1 for SARS is given in Table 5. The probabilities 1 − q 1 and 1 − q 2 of a major epidemic when the epidemic is initiated by one non-superspreader or by one superspreader, respectively, are graphed in Figure 7 as a function of the transmission rate β I 2 and the recovery rate γ 2 . In addition, the contours of this surface are graphed for fixed values of the probabilities. The surfaces and contours are computed from the BP approximation. It can be easily seen that 1 − q 2 1 − q 1 . On each of the contours, a value is selected and compared with the value obtained from 10,000 sample paths of the CTMC model. The results are recorded in Table 6. The 'x' marks on the contours in Figure 7 are the estimates from the CTMC simulations. If a sample path reaches I 1 + I 2 ≥ 15, the epidemic is assumed to be major. If the value of I 1 + I 2 hits zero before reaching 15, it is assumed to be a minor epidemic. The proportion of sample paths Table 6. The BP approximation of a major epidemic (either 1 − q 1 or 1 − q 2 ) is compared to the CTMC simulations based on 10,000 sample paths.  out of 10,000, where the infected population reaches 15 or greater is the estimate for a major epidemic. There is good agreement between the BP approximation and the CTMC simulations, as shown in Table 6. Evident in the top two graphs in Figure 7, are the differences in the vertical scales. The probability of a major epidemic, initiated with an infected superspreader is significantly greater than if initiated with an infected non-superspreader. (The value of 15 is chosen as a threshold since the peak value of infected for both groups is greater than 15 in the deterministic models for the range of R 0 values in Table 6.) As the values of q i depend on R 0 (Theorem 4.1), we performed a sensitivity analysis of the basic reproduction number R 0 , given in Equation (3), with respect to parameters β I i , i = 1, 2. Applying the method in [7], the normalized forward sensitivity index for R 0 with respect to the parameter p equals The sensitivity indices only need to be determined for β I i , i = 1,2, since the sensitivity with respect to the other parameters γ i and N i /N have the same magnitude as β I i (either ±). The sensitivity indices, reported in Table 6, show that R 0 is much more sensitive to the group 2 parameters β I 2 , γ 2 , and N 2 /N than to group 1 parameters. For the parameter values in Table 5, the proportional contribution of the basic reproduction number by group 1 ( non-superspreaders) is relatively small R 01 N 1 /N = 0.165 as compared to group 2 (superspreaders).

Application to measles
Measles is a highly contagious disease. The basic reproduction number for measles during an outbreak has often been estimated to be in the range of 12-18 but some upper estimates for R 0 in the pre-vaccine era, are very large, 27 (Canada), 57 (UK), 203 (Africa), and 770 (Western Europe) (various regions employed different estimation methods and types of models) [17]. In the post-vaccine era, estimates for R 0 during an outbreak range from 4.6 to 32.1 [17]. Measles infection can be prevented with high vaccination coverage and in this case the reproduction number is much lower. In 2011, De Serres et al. [11] reported a measles outbreak in a high school setting. The index case was a teacher with a single dose of the measles vaccine. The index case was removed immediately after onset of a rash. The students in this high school had either no vaccination or 1-dose or 2-doses with the 2-doses administered at different ages 12 months, 15 months or later. Out of a total of 1306 high school students 110 students were infected during the outbreak. The attack rate was highest for students with no vaccine, 82%, with less than 5% for those with 1-dose or 2-doses. We apply Model 2 for a measles outbreak in a population with a highly susceptible subgroup with no vaccine coverage and another group with partial immunity due to vaccination or prior infection. We assume there is a latent period and homogeneous mixing of individuals in the population.
The outbreak size is smaller for Model 2 than for Model 1 for the same population size. Therefore, in Model 2, to distinguish between minor and major epidemics, we assume the total population size is N = 5000 with 500 high-susceptible individuals, N 2 = 500. The transmission rate for low-susceptible individuals is assumed to be β S 1 = 0.2 with recovery rate γ 1 = 0.5 (2 days for the infectious period). Thus, the proportional group 1 reproduction number is R 01 N 1 /N = 0.36. To ensure the basic reproduction number is greater than one, the proportional basic reproduction number for the high-susceptible group 2 must satisfy R 02 N 2 /N > 0.64. We assume β S 2 ≥ β S 1 and γ 2 ≤ γ 1 . The incubation period for measles is an average range of 10-12 days, according to the Centers for Disease Control [28]. Also, the virus can survive outside the host for up to two hours and an individual is infectious for about 4 days prior to exhibiting symptoms (a rash) [28]. Therefore, we take the smallest value, and let α k = 1/10 per day. Table 7 is a summary of the parameter values applied in Model 2 for measles.
The probabilities 1 − q 1 and 1 − q 2 of a major epidemic when initiated by one individual from the low-susceptible or high-susceptible group, respectively, are graphed in Figure 8 as a function of transmission rate β S 2 and recovery rate γ 2 . Below these surfaces, are the contours for fixed probabilities. The surfaces and contours are computed from the BP approximation. To compare the values from the BP approximation to the CTMC models, a value is selected on each contour where R 0 > 1. Ten thousand sample paths of the CTMC model are used to estimate the probability of a major epidemic. If the number of infected individuals I 1 + I 2 reaches a total of 15 or greater, it is considered a major epidemic and if not it is considered a minor epidemic. The proportion (out of 10,000), where  The BP approximation of a major epidemic (either 1 − q 1 or 1 − q 2 ) is compared to the CTMC simulations based on 10,000 sample paths. infected individuals reach 15 or greater is recorded as a dot on a contour and is listed in Table 8. The differences in the top two graphs in Figure 8 are only evident as a function of increasing recovery rate γ 2 (Theorem 4.2). The steep rise of 1 − q 1 or 1 − q 2 and sensitivity to the transmission rate β S 2 can be clearly seen in these graphs. The transmission rate β S 2 for group 2 increases the probability of a major epidemic when initiated with either an infected individual from a low-susceptible or a high-susceptible group (compare with the top left Figure 7.) We performed a sensitivity analysis of the basic reproduction number R 0 (Equation (4)) with respect to parameters β S i , i = 1,2 (similar to the analysis in Table 6). The basic reproduction number R 0 is much more sensitive to the parameters from group 2 (high-susceptible) than from group 1 (low-susceptible).
The values of R 0 in Table 8 are relatively small given the large values reported for measles in [17]. But the reproduction numbers for the high-susceptible group 2 are relatively large, i.e. R 02 = β S 2 /γ 2 ranges from 15.4 to 85.7.

Discussion
Two CTMC SIR epidemic models with two groups are studied, where transmission depends either on the infectious individuals (Model 1) or susceptible individuals (Model 2). Model 1 is applicable to emerging diseases, new diseases which have not previously circulated in the population, such as SARS or MERS. It is assumed that all individuals are susceptible to infection. However, in Model 1, one group of individuals, known as superspreaders, transmit the disease to a large number of other individuals, whereas the remaining individuals have a much lower transmission rate and are referred to as nonsuperspreaders. Model 2 is applicable to re-emerging diseases, such as measles or pertussis. In Model 2, one group of individuals has immunity due to vaccination or to a prior infection of the disease and the second group has little or no immunity as individuals either have not been vaccinated or the vaccine has waned over time. The two groups are distinguished as either low-or high-susceptible groups, respectively. A multitype BP approximation of the CTMC models is applied to the infected groups of individuals. It is shown analytically for Model 1, that the basic reproduction numbers within each group, β I j /γ j , distinguish between the probability of a major or a minor epidemic for the two groups. If the outbreak is triggered by an infected superspreader, then there is a greater probability of a major epidemic than if initiated by a non-superspreader (Theorem 4.1). However, for Model 2, it is the recovery rates of the two susceptible groups that distinguishes which group has a greater probability of a major epidemic (Theorem 4.2). These analytical results are extended to an SEIR model with multiple latent stages and the results are confirmed in numerical simulations of the nonlinear CTMC models and exhibited in two applications, SARS (Model 1) and measles (Model 2), as illustrated in Figures 5-8. A major conclusion in this analysis is that R 0 alone is insufficient to determine the probability of a major epidemic. Our analytical and numerical results demonstrate the importance of the heterogeneity in transmission and recovery rates in a large group of individuals and the transmission and recovery rates of the first individuals to become infected.
The two-group model can be extended to a multigroup model with n ≥ 2 groups. For example, the extension of the basic reproduction number in (2) to a multigroup model is simply for m = I,S and Theorems 4.1 and 4.2 can be extended to n groups.
There are several limitations to the BP approximation of the CTMC epidemic models and to the model applications. To apply the BP approximation, the population size must be sufficiently large and second, the value of R 0 must be greater than one and not too close to one. These restrictions ensure that it is possible to distinguish between a minor and major epidemic in the CTMC models. For many infectious diseases, transmission rates cannot be separated according to individual infectivity or susceptibility as in our two-group models. Populations are often highly heterogeneous and transmission rates may depend on both the infectiousness and the susceptibility of individuals within the population which in turn depend on behavioural, physiological, and environmental factors (e.g. [6,14,15,25,34]).

Appendix 5. Proof of Theorem 4.2
Suppose the fixed points of the pgfs are (q 1 , q 2 ). Consider the functions f 1 (u 1 , u 2 ) = u 1 and f 2 (u 1 , u 2 ) = u 2 given in Equations (7) and (8), respectively, Rewriting Equation (A13) and (A14): The functions h 1 and h 2 have the same properties as the functions g 1 and g 2 , similar to the proof of Theorem 4.1 in Appendix 4. Functions h 1 and h 2 are both concave down with respect to the axes u 1 and u 2 , respectively, the curve h 1 intersects the line u 2 = u 1 at (1, 1), and The curve h 2 intersects the line u 2 = u 1 at (1, 1) and Notice that the difference between the two points in (A17) and (A18) is the recovery rates, γ 1 and γ 2 . We assume γ 1 > γ 2 . The proof follows in a manner similar to Theorem 4.1 by noting the order of the points of intersections of the curves h 1 and h 2 with the line u 2 = u 1 . Similar to Theorem 4.1 in Appendix 4, when γ 1 > γ 2 we consider two different cases: (i) R 0 > 1 and (ii) R 0 ≤ 1.