Herd immunity and epidemic size in networks with vaccination homophily

We study how the herd immunity threshold and the expected epidemic size depend on homophily with respect to vaccine adoption. We ﬁnd that the presence of homophily considerably increases the critical vaccine coverage needed for herd immunity and that strong homophily can push the threshold entirely out of reach. The epidemic size monotonically increases as a function of homophily strength for a perfect vaccine, while it is maximized at a nontrivial level of homophily when the vaccine efﬁcacy is limited. Our results highlight the importance of vaccination homophily in epidemic modeling.

Introduction.In the paradigmatic Susceptible-Infectious-Recovered (SIR) model of infectious disease in a fully mixed population [1,2], so-called herd immunity is reached when the fraction π v of the population that is immune to the disease through vaccination or previous infection is larger than where R 0 denotes the basic reproduction number, i.e., the expected number of secondary cases produced by a typical infectious individual in a fully susceptible population.Here, herd immunity means that the disease cannot spread in the population because each infected individual can only transmit the infection to less than one other individual on average; that is, the effective reproduction number R eff = (1 − π v ) R 0 < 1.Consequently, not only those who are vaccinated but also the unvaccinated individuals are collectively protected from the disease.This model assumes homogeneous mixing where individuals interact with each other randomly and independently of their properties such as their vaccination status.However, this is a premise that may be too simplistic for modeling real-world populations, which often exhibit inhomogeneous mixing patterns that can lead to nontrivial epidemic outcomes [3][4][5][6].One of the inhomogeneities that would be particularly relevant to vaccine-induced herd immunity is the correlation between the vaccination status of interacting individuals [7][8][9][10][11][12].When this correlation exists, the vaccinated and unvaccinated individuals have different compositions of vaccinated and unvaccinated neighbors.Let us introduce the term vaccination homophily to represent mixing patterns that are assortative with respect to vaccination status, so that connections are more probable within the vaccinated and unvaccinated populations than between them.In this Letter, we investigate the effect of vaccination homophily on the herd immunity threshold and the expected epidemic size.
Model.To this end, we formulate a random network theory of epidemic spreading under homophily with respect to the adoption of an immunity-inducing vaccine.The links in the network represent transmissible contacts between individuals, i.e., a susceptible individual will get infected if connected to an infected individual.We refer to this network as the transmission network to avoid confusion with the contact network.Each link in the contact network will let the disease be transmitted through it with a certain probability; the links on which transmission actually takes place constitute the transmission network [13,14].Here, we do not explicitly consider this probabilistic transmission process, but rather take the transmission network as a given.
Within the population, a fraction π v of the population adopts the vaccine, while the remaining fraction π u = 1 − π v is not vaccinated.Vaccination homophily can be expressed in terms of the bias in the probabilities of connections within the two groups.Let us denote the conditional probability that a random neighbor of an individual is vaccinated given that the individual is vaccinated by π vv and, similarly, the conditional probability that a random neighbor of an unvaccinated individual is not vaccinated by π uu .Assuming that the average degrees (numbers of connections) of the vaccinated and unvaccinated populations are equal, the two probabilities are related as The problem of using the connection probabilities π vv and π uu as measures of homophily is that they are not "orthogonal" to π v so that even if we fix the value of π vv , the strength of homophily varies with different values of π v .Moreover, the two connection probabilities are coupled in a nonlinear manner, making it difficult to justify using either of them as a representative measure of the homophily of the entire network structure.To address these issues, we adopt the Coleman homophily index, originally proposed for social network analysis [15] and defined by This measure has desirable axiomatic properties: i) it is an increasing function of both π vv and π uu , ii) it is symmetric for the vaccinated and unvaccinated populations, and iii) it takes a value of zero when the mixing is homogeneous (no homophily) and a value of one when all links are inside the two groups, that is, π vv = π uu = 1.
A negative value implies that the network is heterophilic in terms of vaccination status.Note that the connection probabilities π vv = π v + π u h and π uu = π u + π v h must be positive and therefore the Coleman homophily index is bounded from below as h ≥ max (−π v /π u , −π u /π v ).We consider the transmission network structure where π v , h, and the degree distribution P (k) are specified but otherwise maximally randomized.Neglecting the rare cycles, we can identify the basic reproduction number as the mean excess degree of the network, i.e., the expected number of other neighbors that a randomly chosen neighbor of a randomly chosen node has, as R 0 = k 2 / k − 1 [16][17][18].We consider a class of epidemic models where infection induces complete and permanent immunity, whereas the immunity induced by vaccines is generally incomplete.There are two effects of vaccine protection that are of interest for modeling herd immunity [19,20].First, the vaccine can reduce the probability that the recipient becomes infected upon exposure.This reduction is referred to as the efficacy against susceptibility and denoted by f S [21].Second, individuals who are infected despite being vaccinated may have a lower probability of transmitting the infection to others.We represent this with the efficacy against infectiousness, f I , defined as the reduction in the secondary infection rate.
Under this setup, the herd immunity threshold and the expected final size of a large epidemic can be derived from the structure of the transmission network alone, without explicitly considering the epidemic dynamics.In the following, we leverage the theory of branching processes and percolation theory to investigate these quantities of interest.
Herd immunity threshold.For a heterogeneous population consisting of multiple subpopulations, we can use the next-generation matrix (NGM) method [22,23] to identify the vaccination threshold π c v above which the disease cannot spread.While the NGM method was originally developed for epidemic dynamics described by ordinary differential equations, it can be naturally interpreted as a description of the local structure of the transmission network by a multi-type branching process where the branching factor is the excess degree of the network.Let us denote by I (m) v and I (m) u the number of infections in the vaccinated and unvaccinated populations, respectively, at generation m from an index case (the first infected individual).Assuming a locally tree-like network, we can write the following recurrence equations under a mean-field approximation: where π uv = 1 − π uu and π vu = 1 − π vv are the conditional probabilities that a link from one group points to the other.By writing I (m+1) = AI (m) , where u ) and we see that the infection eventually dies out after a finite number of generations if all the eigenvalues of the nextgeneration matrix A have an absolute value of less than one.That is, at the critical point, the spectral radius By reparameterizing the connection probabilities with π v and h, the critical vaccine coverage needed for herd immunity is given by where we define = (1−f S )(1−f I ) and require ≤ 1/R 0 .
For > 1/R 0 , the vaccination threshold disappears and herd immunity becomes unattainable.For a perfect vaccine with f S = 1 and/or f I = 1, we have which reduces to the well-known threshold of Eq. ( 1) for homogeneous mixing with h = 0. Equation (6) indicates that if the homophily strength h increases, so does the vaccine coverage π c v required for herd immunity (see Fig. 1).In other words, the presence of homophily makes herd immunity harder to reach.Notably, the threshold occurs at implying that above this critical strength of homophily, one cannot attain herd immunity at all unless the entire population is vaccinated.That is, no matter how small the unvaccinated population is, there will always be a nonzero probability of a large epidemic within this population.
Finally, we note that the above discussion is independent of the specific shape of the degree distribution-the equations are valid for any degree distribution P (k) with mean excess degree R 0 .
Epidemic size.When the vaccine coverage is below the threshold, an outbreak can result in an epidemic that infects a substantial fraction of the population.The size of such an epidemic coincides with the size of the giant component of the transmission network because all the individuals in a connected component will be infected if the index case belongs to the same component [13,17].Let us denote the probability that a link pointing to a vaccinated node does not lead to the giant component by φ v and the equivalent probability for an unvaccinated node by φ u .These probabilities are subject to the following consistency equations: where k denotes the probability generating function of excess degree.Having solved the above consistency equations for φ v and φ u , we can compute the size of the vaccinated and unvaccinated populations contained in the giant component as respectively, where g 0 (x) = ∞ k=0 P (k)x k is the probability generating function of the degree distribution P (k).The total size of the giant component is the sum of these two fractions s = s u + s v .
As an illustration, let us solve the above equations for a random network with a Poisson degree distribution P (k) = k k e − k /k!.For this network, the excess degree distribution is identical to the degree distribution, and hence k = R 0 .Given this degree distribution, we get ].In the thermodynamic limit and in the absence of homophily (h = 0), this random network model with the Poisson degree distribution reduces to the Erdős-Rényi (ER) random graph ensemble, which is equivalent to homogeneous mixing.In other words, our model represents the simplest deviation from the ER model through the addition of homophily that biases the randomness of links.
First, let us consider the case of a perfect vaccine, for which φ v = 1.Eq. ( 9) now becomes which has an analytical solution Here, W (•) denotes the Lambert W -function.The giant component size is then calculated from Eq. ( 11) as where all infections are restricted to the unvaccinated population.
Figure 2(a-d) shows the solution of Eq. ( 14).The main observation is that the expected epidemic size always increases with homophily strength h.The difference in epidemic size under strong and weak homophily is especially significant when the vaccine coverage π v is not small.As an example, for a disease with R 0 = 1.5, the homogeneous mixing assumption leads to the prediction that the vaccination threshold is 33%.However, even if the vaccine coverage is well above this threshold, strong homophily can still let the disease spread in the unvaccinated population and infect up to 58% of it (see Fig. 2(b)).
In the case of imperfect vaccines, the coupled consistency equations are not analytically tractable.The solution of Eq. ( 9) is given by whereas for f S < 1 and f I < 1, Eq. ( 8) leads to We can numerically solve for φ v by equating the right hand sides of Eqs.(15) and (16).Plugging the results into Eqs.(10) and (11) yields the giant component size.
In what follows, we present the results for f I = 0 and only vary the efficacy against susceptibility, f S , for the sake of simplicity.Figure 2(e, f) shows the epidemic size under the coverage of an imperfect vaccine.As expected, a smaller efficacy leads to a larger epidemic and a higher vaccination threshold.Unexpectedly, contrary to the case of perfect immunization, the epidemic size first grows and then shrinks with increasing homophily.This can be attributed to the following competing mechanisms affected by increased levels of homophily: (1) Similarly to the case of a perfect vaccine, more unvaccinated individuals will be infected as they are connected to fewer immune individuals and more densely within themselves, making them less protected by the herd immunity effect.(2) An imperfect vaccine leaves a part of the vaccinated population susceptible to breakthrough infections.In the weak homophily regime, more vaccinated individuals may contract the disease due to the larger epidemic in the unvaccinated population.The risk of breakthrough infection decreases as they become less connected with the unvaccinated population in the strong homophily regime.Figure 3 f S = 0.75, the final epidemic size varies between 13% and 24%, reaching its peak around h = 0.62.As a consequence of the competition, the total number of infected individuals is maximized, in general, at a nontrivial level of homophily h * , which depends on f S and R 0 but not on the vaccine coverage π v .The smaller the R 0 and higher the value of f S , the higher the strength of homophily h * that leads to the worst overall outcome (see Fig. 3(b)).In other words, a highly infectious disease countered by a vaccine with low efficacy spreads maximally in a population with a medium level of vaccination homophily, while less infectious diseases generally benefit from higher levels of homophily, especially if the vaccine efficacy is high.The maximum impact of homophily on epidemic size is further discussed in the Supplemental Material.
In the above discussion, we presented the results for the case where the transmission network has a Poisson degree distribution and the efficacy against infectiousness f I = 0.These conditions can be altered.In the Supplemental Material, we calculate the epidemic size for transmission networks with more realistically heterogeneous excess degrees that follow the negative binomial distribution.We also discuss the case where both f S and f I are varied.In both cases, the epidemic outcomes are qualitatively similar to those obtained for Poisson networks and vaccines that purely affect susceptibility, except for the fact that the homophily level at which the epidemic size is maximized is no longer independent of vaccine coverage.
Conclusions and discussion.We have studied the effect of vaccination homophily, i.e., assortative mixing by vaccination status, on the herd immunity threshold and the expected epidemic size.In human society, vaccination homophily can emerge due to the presence of confounding factors, such as age [5], geography [24,25], socio-economic status [26], and personal and religious beliefs [27], that influence both the likelihood of interaction between individuals and the likelihood of them being in a common vaccination status.It can also occur as a consequence of behavioral contagion [28,29] or inequality in the access to the vaccine.Our analysis is built on a model that embodies a minimalistic departure from the traditional assumption of homogeneous mixing and shows that the vaccination threshold for herd immunity is higher for stronger vaccination homophily.This suggests that herd immunity is more difficult, if not impossible, to achieve in the presence of vaccination homophily.It also implies that the well-known formula of Eq. ( 1) underestimates the vaccination threshold by not taking homophily into account.
We also show that the behavior of epidemic size as a function of homophily varies depending on the vaccine efficacy against susceptibility; when the efficacy is high, homophily monotonically amplifies the epidemic, while the epidemic size peaks at a nontrivial level of homophily when the efficacy is low.This is due to the competition between the herd immunity effect by homogeneous mixing and the epidemic containment by segregation.We can identify the parameter values for which homophily has a large impact on the epidemic size, which will have direct implications for the design of intervention strategies.
Apart from vaccination homophily, another important type of inhomogeneity in networked epidemics is degree heterogeneity.Namely, real-world epidemics often exhibit a large variance in the number of secondary infections, whose distribution can be modeled by a negative binomial distribution [4,6].The herd immunity threshold given by Eq. ( 5) is not affected by the overdispersion of the distribution, but the epidemic size depends on the full shape of the distribution and therefore differs from the one for a Poisson network, as shown in the Supplemental Material.After the completion of this manuscript, we became aware of two other research works [30,31] that report results in line with what we have described here.They found qualitatively similar effects of homophily on epidemic size for scale-free networks [30] and empirical contact networks [31].This further corroborates the generalizability of our theoretical findings to networks with heterogeneous degree distributions.
As a final remark, we note that our approach has a broader scope.In this Letter, we focused on homophily by vaccination status; however, our framework is general enough to account for homophily by adherence to other epidemic interventions that reduce the susceptibility or infectiousness of individuals, such as the practice of social distancing [32], use of protective equipment [30], and adoption of digital contact tracing [33,34].It can also be applied to the analysis of herd immunity in the case where the past infection (and consequent disease-induced immunity) is localized to a subpopulation [35] and in the case where the mixing pattern is assortative by risk factors of the disease [36].

NETWORK SIMULATION
The network simulation to obtain the giant component sizes is implemented in the following way.First, we generate a network of size N with vaccine coverage π v , strength of vaccination homophily h, and degree distribution P (k).To do so, we begin by randomly assigning a vaccination status ω i ∈ {v, u} and a degree k i to every node i according to π v and P (k).Each of the k i stubs (half links) emanating from node i is classified as an in-group stub with probability π ωiωi = π ωi + (1 − π ωi )h and as an inter-group stub otherwise.Finally, we randomly make pairs of two in-group stubs within each group and pairs of an inter-group stub from each of the two groups, and connect the pairs to make a network.
From this network, we randomly remove a f S fraction of vaccinated nodes and compute the size of the giant component (defined as the largest connected component larger than 1% of the unvaccinated population).The epidemic size is calculated as the mean giant component size over 20 network realizations.

MAXIMUM EFFECT OF HOMOPHILY ON EPIDEMIC SIZE
The effect that vaccination homophily h exerts on epidemic size s varies for different parameter values.To characterize it, we compare the maximum size s max of a large epidemic for homophily strengths in the range of 0 ≤ h ≤ 1 with the baseline epidemic size s 0 under homogeneous mixing (h = 0) for given values of R 0 , π v , and f S .We fix f I = 0.
The left two columns in Fig. S1 show the ratio s 0 /s max for different parameter values.The ratio takes a value close to one when either the vaccine coverage π v or the efficacy f S is small, indicating that the effect of homophily is small.In such a case, the herd immunity effect is already weak even under homogeneous mixing; therefore, its decline by homophily does not bring about a substantial difference in the epidemic outcome.In contrast, the ratio drops to zero when a highly effective vaccine covers a large fraction of the population.In this parameter region, the herd immunity is achieved at h = 0 (i.e, s 0 = 0), but the presence of homophily brings the system out of the disease-free equilibrium.
The ratio cannot describe how many additional infections will be caused by homophily, which is especially problematic when s 0 = 0. To this end, the difference s max − s 0 is calculated and shown in the right two columns in Fig. S1.By comparing Fig. S1(a, e, i) and (c, g, k), we see that the difference between the maximum and baseline epidemic sizes is remarkably large around the herd immunity threshold at h = 0 and, interestingly, is the largest when the vaccine is perfect.

IMPERFECT VACCINE AGAINST SUSCEPTIBILITY AND INFECTIOUSNESS
In this section, we study the effect of vaccine efficacy against infectiousness.Figure S2 shows the comparison between the epidemic sizes under the coverage of a vaccine purely against susceptibility, a vaccine against both susceptibility and infectiousness, and a vaccine purely against infectiousness.The value of is equal ( = 0.25) in all three cases; therefore, the herd immunity thresholds occur along the same line.For imperfect vaccines that reduce infectiousness, the homophily strength h * that maximizes the epidemic size is not independent of the vaccine coverage π v .Moreover, the maximum can occur at h * ≤ 0; that is, homophily makes the epidemic smaller for low vaccine coverage.Figures S3  and S4 show the epidemic sizes for different combinations of the values of R 0 , π v , and h for imperfect vaccines that only reduce susceptibility and infectiousness, respectively.

NEGATIVE BINOMIAL RANDOM NETWORK MODEL
In the main text, we focused on the network structure with the Poisson degree distribution.While the model represents the simplest way to integrate homophily with the canonical Erdős-Rényi random graph, it may be insufficient to describe the overdispersed spreading patterns seen in real-world epidemics [S1].To add more realism, we consider the transmission networks in which the excess degree is negative-binomially distributed [S2] with mean R 0 and the  for r = 1 and for r = 1.By taking the k-th derivative of g 0 as g (k) 0 (x) = d k g 0 (x)/dx k , we recover the degree distribution: If r > 1, setting p 0 = [r/(R 0 + r)] r−1 makes the degree distribution reduce to a negative binomial distribution The mean and dispersion parameters of this distribution are equal to (r − 1)R 0 /k and r − 1, respectively.The fraction of isolated nodes p 0 effectively controls the system size because these nodes will not be part of the epidemic.As our motivation to introduce the negative binomial distribution is to see the effect of overdispersion in excess degree as compared to the Poisson distribution at the limit of r → ∞, we do not want the value of p 0 to affect the comparison.A natural choice would then be to set the value of p 0 to be equal to the fraction of isolated nodes for the corresponding Poisson degree distribution, that is, p 0 = e −R0 .
The epidemic size under the coverage of a perfect vaccine in negative binomial networks with R 0 = 3 and different values of the dispersion parameter r are shown in Fig. S5.When r = 10, the epidemic size is similar to that for a Poisson network.Decreasing the dispersion parameter (i.e., increasing the variance) does not change the herd immunity threshold but decreases the epidemic size when the system is in the epidemic state.For example, if the system is maximally homophilic, i.e., h = 1, so that the vaccinated and unvaccinated subpopulations are completely isolated from each other, 93% of the unvaccinated will get infected when the variance in excess degree is relatively small with r = 10; however, this fraction decreases to 75% for r = 1 and to 18% for r = 0.1.Moreover, one can see from the bottom row of Fig. S5 that the critical behavior as a function of h changes when the value of r is varied.
The overdispersed degrees of negative binomial networks affect the epidemic size under the coverage of imperfect vaccines too.A marked qualitative difference from the Poisson case is that the homophily strength h * maximizing the epidemic size is not independent of the vaccine coverage even for vaccines purely against susceptibility (Fig. S6(b)).Another observation is that, compared to the Poisson case (Fig. S2), the epidemic size is maximized at h close to one.In other words, the amplification of epidemic size by homophily is more robustly observed in overdispersed networks.

FIG. 1 .
FIG. 1. Critical coverage π cv of a perfect vaccine required for herd immunity as a function of homophily strength h for different values of basic reproduction number R0. Positive and negative values of h imply homophily and heterophily, respectively.The area shaded in gray represents the parameter region where the network is unrealizable.

FIG. 2 .FIG. 3 .
FIG. 2. Epidemic size inPoisson networks as a function of homophily strength h and vaccine coverage πv.Top row: Twodimensional heatmaps representing the epidemic size.The solid red line in each panel denotes the vaccination threshold.We represent contours of the epidemic size at 0.1 intervals by different colors and solid black lines.Bottom row: Epidemic size divided by the size of the unvaccinated population.Theoretical predictions (in lines) are compared with the giant component sizes obtained by simulating networks of size N = 10 5 (in symbols).The details of the network simulation can be found in the Supplemental Material.(a) and (b) show the results for R0 = 1.5 and a perfect vaccine, (c) and (d) are for R0 = 3 and a perfect vaccine, and (e) and (f) are for R0 = 3 and an imperfect vaccine with fS = 0.75.If the vaccine is perfect, only the unvaccinated individuals contract the disease; thus, the vertical axis in (b) and (d) corresponds to the fraction of the unvaccinated population that will be infected.The cross symbols in (f) indicate the maximum of each curve.Note that the homophily strength at which the epidemic size takes the maximum is independent of πv.
FIG. S5.Epidemic size as a function of homophily strength h and coverage πv in negative binomial networks under the coverage of a perfect vaccine.Only the dispersion parameters are varied: (a, b) r = 0.1; (c, d) r = 1; (e, f) r = 10.The basic reproduction number R0 = 3 in all panels.
FIG.S6.Epidemic size as a function of homophily strength h and coverage πv in negative binomial networks under the coverage of imperfect vaccines.The vaccine efficacies are the same as in Fig.S3, that is, (a, b) only against susceptibility (fS = 0.75, fI = 0), (c, d) against both susceptibility and infectiousness (fS = 0.5, fI = 0.5), (e, f) only against infectiousness (fS = 0, fI = 0.75).The basic reproduction number R0 = 3 and dispersion parameter r = 0.5 in all panels.