1 Introduction

Vaccination programs aim to attain population immunity above which introduction of infectious people does not result in outbreaks. Often, public health practitioners estimate population-immunity thresholds based on prescriptions that assume homogeneous mixing. However, this may not be an adequate approximation for effective estimation and could result in erroneous conclusions such as a recommendation to use more vaccine than needed or is feasibly available. Estimation based on the widely-known \(1-R_0^{-1}\) threshold, where \(R_0\) is the basic reproduction number, could be inaccurate if the calculation of \(R_0\) is premised on overly simplistic assumptions, such as those about mixing (Feng et al. 2015), or the infectious period distribution (Feng et al. 2007). In this context, heterogeneity in features affecting sub-populations can be highly relevant and precisely how these sub-populations come into contact, in other words mix with each other, can be vitally important.

Consideration of heterogeneity in transmission models goes back to the late 1980’s and early 1990’s when researchers sought to replicate known behaviors demonstrating that person-to-person contacts can vary, for example being greater in urban than rural areas, or in some instances greater among school-aged persons than other age groups. Dietz (1980), May and Anderson (1988), and Diekmann et al. (1990) looked at the effect of varying contact rates on the basic reproduction number \(R_0\), the average number of secondary infections caused by a newly infectious person introduced to a wholly susceptible population. Nold (1980) developed a more general framework in which the fraction of a person’s contacts reserved for members of their own group and its complement is distributed proportionately among population groups. Jacquez et al. (1988) allowed this fraction to vary among groups and Barbour (1978), Dye and Hasibeder (1986), Hasibeder and Dye (1988), and Adler (1992) showed that \(R_0\) is maximized when individuals with high average per capita contact rates mix exclusively with each other. Further work was done by Hethcote (1978), May and Anderson (1984), Anderson and May (1984), and Hethcote and Van Ark (1987) who considered heterogeneity in immunity. Their investigations showed that the immunity level above which the introduction of newly infectious persons would not cause outbreaks is greater in heterogeneous populations than would be inferred if homogeneity were incorrectly assumed. This threshold is often referred to as the naïve population-immunity threshold. Hethcote and Van Ark (1987) argued that this difference is modest if transmission rates are not too dissimilar. However, preferential mixing was not addressed by any of these authors. More recently, Brauer (2008) and Brauer and Watmough (2009) considered preferential mixing and demonstrated the influence heterogeneity in contacts can have on final size estimates.

In this article, we examine a simple meta-population model, allowing for a combination of preferential and proportionate mixing among subgroups. By proportionate mixing, we mean uniform, random mixing that is weighted by sub-population size. Whereas by preferential, we mean that groups tend to mix among themselves. We quantify this explicitly below. In combining both kinds of mixing, we model a “continuum” of change from one extreme to the other, with the extremes being purely preferential and purely proportionate mixing. By exploiting a simple decomposition of the mixing matrix and, by extension, the next-generation matrix, we re-derive an old result showing that, in the case of proportionate mixing, heterogeneity in contact rates increases \(R_0\) when the average contact rate remains constant. We then present some theoretical results on thresholds relating to vaccine allocation and levels of mixing. Specifically, we show how the expression for the gradient (the vector of partial derivatives) of \(R_\textsf{v}\) with respect to vaccine allocation is related to the familiar relationship \(R_0 = R_\textsf{v}/(1-p)\) for homogeneous mixing. We also provide a novel proof that for Jacquez-type mixing (Jacquez et al. 1988), \(R_\textsf{v}\) increases as preferential-mixing increases in any sub-population.

2 Simple meta-population model

We propose a simple framework to illustrate effects of heterogeneity in sub-population contact rates and immunities, together with preferential mixing, on the effective reproduction number \(R_{\textsf{v}}\). This comprises an SIR model with n sub-populations, \(S_i\) (susceptible), \(I_i\) (infectious), \(Z_i\) (removed),Footnote 1\(i=1,\ldots ,n\), with per capita force of infection rate \(\lambda _i\) (\(S \rightarrow I\)), removal rate \(\gamma \) (\(I \rightarrow Z\)), proportions \(p_i\) immunized at birth, and, for simplicity, an equal recruitment and exit rate \(\theta \) ensuring fixed sub-population sizes \(N_i=S_i + I_i + Z_i\). This model is represented by the system of ordinary differential equations (ODEs):

$$\begin{aligned} \frac{d S_i}{dt}&= (1-p_i) \theta N_i -(\lambda _i +\theta )S_i, \nonumber \\ \frac{d I_i}{dt}&= \lambda _i S_i - (\gamma +\theta ) I_i, \nonumber \\ \frac{d Z_i}{dt}&= p_i \theta N_i + \gamma I_i - \theta Z_i, \nonumber \\ N_i&= S_i+I_i+Z_i, \ i=1,\ldots , n. \end{aligned}$$
(1)

The force of infection terms are

$$\begin{aligned} \lambda _i = \beta a_i \sum _{j=1}^n c_{ij} I_j/N_j \end{aligned}$$
(2)

where \(\beta \) is the probability of infection upon contacting an infectious person, \(a_i\) is the average per capita contact rate (“activity”) in sub-population i, \(c_{ij}\) is the proportion of the ith sub-population’s contacts with members of the jth sub-population, and \(I_j/N_j\) is the probability that a randomly encountered member of sub-population j is infectious.

3 Mixing matrix

As has been well-documented, the \(n\times n\) matrix \(C=[c_{ij}]\) must satisfy the constraints:

  • \(c_{ij}\ge 0\);

  • \(\sum _{j=1}^n c_{ij}=1\) (all proportions of ith group contacts sum to 1);

  • \(a_i N_i c_{ij} = a_j N_j c_{ji}, \, \forall i, j\) (number of contacts per unit time of group i with group j equal to that of j with i).

The constraints on the mixing matrix C imply that its dominant eigenvalue is 1 and that the matrix VC is symmetric, where V is the diagonal matrix \(V = \textsf{diag} (a_1 N_1,\ldots , a_n N_n)\).

We now look at the formulation of the mixing matrix C introduced by Jacquez et al. (1988). This takes the form

$$\begin{aligned} c_{ij}&= \delta _{ij} \epsilon _i + (1-\epsilon _i) f_j \nonumber \\ \text {and } f_j&= \frac{(1-\epsilon _j) a_j N_j}{\sum _{k=1}^n (1-\epsilon _k) a_k N_k} \end{aligned}$$
(3)

where \(\delta _{ij}\) is Kronecker delta, \(\epsilon _i\) is the fraction of contacts members of the \(i^\text {th}\) sub-population reserve for others in the \(i\text {th}\) sub-population (preferential mixing), and \(f_j\) describes activity-weighted proportionate mixing. The complement \(1-\epsilon _i\) of preferential \(i\text {th}\) group mixing is distributed across all groups (including i) in proportion to \(f_j\) (proportionate mixing). The overall mixing for group i is thus a weighted combination of preferential and proportionate mixing.

Special cases arise as:

  1. 1.

    Proportionate mixing if \(\epsilon _i=0\) for all i (contacts proportional to f above);

  2. 2.

    Preferential mixing if for some i, \(\epsilon _i>0\) (fraction of contacts of i with i, with the rest distributed proportionally among all sub-populations);

  3. 3.

    The \(i\text {th}\) sub-population is isolated if \(\epsilon _i=1\) (members mix only with others in their own sub-population);

  4. 4.

    Preferential mixing is homogeneous if \(\epsilon _i = \epsilon \) for all i;

  5. 5.

    And heterogeneous preferential mixing if \(\epsilon _i \ne \epsilon _j\) for some \(i \ne j\).

4 Next generation matrix

The overall reproduction number is the dominant eigenvalue of the \(n\times n\) next generation matrix (NGM) \(K_{\textsf{v}}\) (van den Driessche and Watmough 2002; Diekmann et al. 2010). The (ij)-entry of the NGM is the expected number of new infections in group i by an infected individual originally introduced into group j in a fully susceptible population. For the system of ODEs in (1), the NGM is given by Feng et al. (2015):

$$\begin{aligned} K_{\textsf{v}}&= \begin{bmatrix} R_{\textsf{v}1} c_{11} &{} R_{\textsf{v}1} c_{12} &{} R_{\textsf{v}1}c_{13} &{} \dots &{} R_{\textsf{v}1}c_{1n} \\ R_{\textsf{v}2}c_{21} &{} R_{\textsf{v}2}c_{22} &{} R_{\textsf{v}2}c_{23} &{} \dots &{} R_{\textsf{v}2}c_{2n} \\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ R_{\textsf{v}n}c_{n1} &{} R_{\textsf{v}n}c_{n2} &{} R_{\textsf{v}n}c_{n3} &{} \dots &{} R_{\textsf{v}n}c_{nn} \end{bmatrix} \nonumber \\ R_{\textsf{v}i}&= (1-p_i) \frac{\beta a_i}{\gamma +\theta } = (1-p_i) R_{0i} \end{aligned}$$
(4)

The quantity \(R_{\textsf{v}i}\) is the effective reproduction number for group i and \(R_{0i}\) is the basic reproduction number.

The NGM admits a decomposition that offers insight into the dynamics of the model. Specifically,

$$\begin{aligned} K_{\textsf{v}}&= \textsf{diag}(\varvec{R}_{\textsf{v}}) C = \textsf{diag} (\varvec{1}-\varvec{p}) \textsf{diag}(\varvec{R}_0) C \nonumber \\&= \frac{\beta }{\gamma +\theta } \textsf{diag} (\varvec{1}-\varvec{p}) \textsf{diag}(\varvec{a}) C. \end{aligned}$$
(5)

Here, \(\varvec{R}_{\textsf{v}}\) (respectively, \(\varvec{R}_0\)) is the \(n \times 1\) vector of sub-population effective (respectively, basic) reproduction numbers \(R_{\textsf{v}1}, \ldots , R_{\textsf{v}n}\) (respectively, \(R_{01},\ldots ,R_{0n}\)). C is the \(n\times n\) mixing matrix described above. The \(n\times 1\) vector of vaccination fractions for each population subgroup is denoted by \(\varvec{p}\), \(\varvec{a}\) is the \(n\times 1\) vector of average contact rates, and \(\varvec{1}\) is the \(n\times 1\) vector of 1’s.

It follows from the previous section that the matrix C admits a further decomposition as

$$\begin{aligned} C= \textsf{diag}(\varvec{\epsilon }) + (\varvec{1}-\varvec{\epsilon }) \varvec{g}_{\varvec{\epsilon }}^{\mathsf T}/m _{\varvec{\epsilon }} \end{aligned}$$
(6)

in which \(\varvec{\epsilon }^{\mathsf T}= (\epsilon _1,\ldots ,\epsilon _n)\), \(\varvec{g}_{\varvec{\epsilon }}\) is the vector whose jth element is \((1-\epsilon _j) a_j\pi _j\) with \(\pi _j=N_j/N\), and

$$\begin{aligned} m _{\varvec{\epsilon }} = \sum _k (1-\epsilon _k) a_k\pi _k \end{aligned}$$
(7)

is the population weighted mean of the \((1-\epsilon _k) a_k\) terms.

Thus, C is the sum of a diagonal matrix (preferential mixing) and a rank 1 matrix (proportionate mixing). The formulation also invites a probabilistic interpretation where \(m _{\varvec{\epsilon }}\) can be thought of as the expectation of \((\varvec{1}-\varvec{\epsilon }) \circ \varvec{a}\) over the discrete distribution of population fractions \(\pi _i\), where \(\circ \) denotes pointwise multiplication of vectors. We denote the expectation operator by \(\textsf{E}_{\varvec{\pi }}\).

For proportionate mixing, in the absence of vaccination (\(\varvec{p}=\varvec{0}\)), we can derive a revealing expression for the basic reproduction number. Writing \(\varvec{R}_0 = \frac{\beta }{\gamma +\theta } \varvec{a}\), we have from above that

$$\begin{aligned} R_0&= \varvec{g}_{\varvec{0}}^{\mathsf T}\varvec{R}_0/m_{\varvec{0}} = \frac{\beta }{\gamma +\theta } \varvec{g}_{\varvec{0}}^{\mathsf T}\varvec{a}/m_{\varvec{0}} \nonumber \\&= \frac{\beta }{\gamma +\theta } \sum _j a_j^2 \pi _j /m_{\varvec{0}} \nonumber \\&= \frac{\beta }{\gamma +\theta } \frac{\textsf{E}_{\varvec{\pi }}[\varvec{a}^2]}{\textsf{E}_{\varvec{\pi }}[\varvec{a}]} \nonumber \\&= \frac{\beta }{\gamma +\theta }\frac{\textsf{Var}_{\varvec{\pi }}[\varvec{a}]+\textsf{E}^2_{\varvec{\pi }}[\varvec{a}]}{\textsf{E}_{\varvec{\pi }}[\varvec{a}]} \nonumber \\&= \frac{\beta }{\gamma +\theta } m_{\varvec{0}} \left\{ 1 + \frac{\sigma ^2_{\varvec{0}}}{m^2_{\varvec{0}}} \right\} \nonumber \\&= \overline{R_0} \left\{ 1 + \textsf{CV}^2 \right\} , \end{aligned}$$
(8)

where \(\sigma ^2_{\varvec{0}}\) is the variance of the activities with respect to the distribution \((\pi _1,\ldots ,\pi _n)\), \(m_{\varvec{0}}\) is the mean activity, and the quantity \(\overline{R_0} = \frac{\beta }{\gamma +\theta } m_{\varvec{0}}\) is the “mean” basic reproduction number. In this formulation, \(\textsf{CV}\) is the coefficient of variation, that is, the ratio of the standard deviation of the activities \(\{a_j\}\) with respect to the distribution \((\pi _1,\ldots ,\pi _n)\), to the mean activity, \(m_{\varvec{0}}\). The mean \(\overline{R_0}\) is the weighted average of the \(R_{0j}\) across subgroups, based on \(\textsf{E}_{\varvec{\pi }}[\varvec{a}]\). We see that given the same mean activity, the more the activities (contact rates) differ, the greater the coefficient of variation, hence the greater the \(R_0\). This is an instance of greater heterogeneity (in contact rates) increasing the basic reproduction number. This increase can become more pronounced when some contacts are preferential. This expression was originally derived by Dietz (1980), but here we couch it in terms of the coefficient of variation.

Previously mentioned special cases of mixing can now be revisited in light of the decomposition in (6). Isolated mixing occurs when \(\epsilon _i=1\) for all i, whence \(C=I\) and \(K_\textsf{v}\) is diagonal, \(K_\textsf{v}=\textsf{diag}(\varvec{R}_{\textsf{v}})=\textsf{diag} (\varvec{1}-\varvec{p}) \textsf{diag}(\varvec{R}_0)\). It follows that the effective reproduction number is \(R_\textsf{v} = \max \{R_{\textsf{v}1},\ldots ,R_{\textsf{v}n}\}\).

At the other extreme, proportionate mixing occurs when \(\epsilon _i =0\) for all i, in which case \(C = \varvec{1}\varvec{g}_{\varvec{0}}^{\mathsf T}/m_{\varvec{0}}\), where \(\varvec{g}_{\varvec{0}} = [a_j\pi _j]\) and \(m_{\varvec{0}} = \sum _k a_k \pi _k = \textsf{E}_{\varvec{\pi }} [\varvec{a}]\). Thus, C is rank 1 so that \(K_\textsf{v}\) is too. In fact, \(K_\textsf{v} = \textsf{diag}(\varvec{R}_{\textsf{v}}) C = \varvec{R}_{\textsf{v}} \varvec{g}_{\varvec{0}}^{\mathsf T}/m_{\varvec{0}}\). The dominant eigenvalue has multiplicity 1 and is given by

$$\begin{aligned} R_{\textsf{v}}&= \varvec{g}_{\varvec{0}}^{\mathsf T}\varvec{R}_\textsf{v} /m_{\varvec{0}} = \sum _{i=1}^n R_{vi} f_i, \\ \text {where } f_i&= g_i/m_{\varvec{0}} = \frac{a_i\pi _i}{\sum _k a_k \pi _k}. \end{aligned}$$

The overall effective reproduction number is now the activity-weighted sum of the group effective reproduction numbers.

The threshold surface \(R_\textsf{v} =1\) (or any other fixed value) admits a straightforward geometric interpretation in \(\varvec{p}\)-space as (i) the surface of an n-dimensional rectangle when mixing is isolated, and (ii) an n-dimensional hyperplane when mixing is proportionate.

In between these extremes is where the more interesting and potentially realistic mixing patterns lie. Those special cases are extremes in a very geometrical sense. When preferred contact fractions vary between 0 and 1, level hypersurfaces of \(R_\textsf{v}\) end up above the hyperplane corresponding to proportionate mixing and below the n-rectangle.

In all cases (all \(\varvec{\epsilon }\), all \(\varvec{a}\)), whenever vaccination is implemented, there are two closed form solutions for the allocation of vaccine to achieve \(R_\textsf{v}=1\).

  1. 1.

    The sub-population critical fractions \(p_{ci}=1-R_{0i}^{-1}\) when implemented together give an overall \(R_\textsf{v}=1\), although this allocation may not be the most efficient way of achieving it. In this case,

    $$\begin{aligned} K_\textsf{v} = \textsf{diag}(\varvec{1}-\varvec{p}) \textsf{diag}(\varvec{R}_0) C = C \end{aligned}$$

    so that its dominant eigenvalue is 1 (recall the properties of C from Sect. 3).

  2. 2.

    If the overall basic reproduction number \(R_0\) is correctly specified, then identical allocation \(p_i=1-1/R_0\) across all groups i will yield an overall \(R_\textsf{v}=1\) as

    $$\begin{aligned} K_\textsf{v} = \frac{1}{R_0} \textsf{diag}(\varvec{R}_0) C. \end{aligned}$$
Fig. 1
figure 1

Contour plot in \(\varvec{p}\)-plane of \(R_\textsf{v}=1\) for \(n=2\). Parameters in (1) are \(\beta =0.05\), \(\gamma =0.15\), \(\theta =1/(365\times 70)\), \(a_1=20\), \(a_2=10\), \(\pi _1=1/6\), \(\epsilon _1=\epsilon _2=0.4\)

Figure 1 shows a contour plot in the \(\varvec{p}\)-plane of \(R_\textsf{v}=1\) for \(n=2\) groups. In the dark gray region, \(R_\textsf{v}<1\) for all mixing structures (all \(\varvec{\epsilon }\)), whereas, in the light gray region, \(R_\textsf{v}>1\) for all mixing structures. The boundary defining \(R_\textsf{v}<1\) is the level surface corresponding to isolated mixing in all groups (purely preferential) whereas that defining \(R_\textsf{v}>1\) corresponds to proportionate mixing. Level curves for the threshold \(R_\textsf{v}=1\) all pass through the point given by \(p_{ci}=1-R_{0i}^{-1}\). We show in Sect. 5 that the partial derivatives of \(R_\textsf{v}\) with respect to \(p_i\) are negative while those with respect to \(\epsilon _i\) are positive. It follows that an increase in \(\epsilon \) will push the level curve towards the isolated mixing curve bounding the dark gray \(R_\textsf{v}<1\) region.

For comparison, Fig. 2 shows a contour plot in the \(\varvec{\epsilon }\)-plane. As p increases, \(R_\textsf{v}\) decreases and therefore, to increase \(R_\textsf{v}\) back to 1, we need more preferential mixing (\(\epsilon \) closer to 1). Accordingly, the curves move up towards the top right hand corner as p increases.

Fig. 2
figure 2

Contour plot in \(\varvec{\epsilon }\)-plane of \(R_\textsf{v}=1\) for \(n=2\). With the exception of \(\varvec{\epsilon }\), parameters are the same as in Fig. 1

5 Implicit equation for the threshold surface \(\varvec{R_\textsf{v} =1}\)

Perron-Frobenius theory (Rao and Rao 1998, Hunter 1983) states that, when at least one element of a nonnegative matrix decreases, the spectral radius decreases. It follows that an increase in any of the \(p_j\) will decrease \(R_\textsf{v}\) and therefore \(\partial R_\textsf{v}/\partial p_j <0\) whenever the derivative exists.

We can express \(R_\textsf{v}\) implicitly as a function of vaccination \(\varvec{p}\). To avoid notational confusion with the vector of subgroup effective reproduction numbers, \(\varvec{R}_\textsf{v}\), denote the overall \(R_\textsf{v}\) by \(\lambda \). If \(\varvec{w}\) is the dominant eigenvector, the eigenvalue equation \(K_\textsf{v} \varvec{w} = \lambda \varvec{w}\) may be decomposed. First, we recall from equations (5) and (6) the constituent parts of \(K_\textsf{v}\).

$$\begin{aligned} K_\textsf{v}&= D + \varvec{u}\varvec{v}^{\mathsf T}, \\ D&= \textsf{diag}(\varvec{1}-\varvec{p}) \textsf{diag}(\varvec{R}_0) \textsf{diag}(\varvec{\epsilon }), \\ \varvec{u}&= \textsf{diag}(\varvec{1}-\varvec{p}) \textsf{diag}(\varvec{R}_0) \textsf{diag}(\varvec{1}-\varvec{\epsilon }) \varvec{1} = \varvec{R}_\textsf{v} - D \varvec{1}, \\ \varvec{v}^{\mathsf T}&= \varvec{g}_{\varvec{\epsilon }}^{\mathsf T}/m _{\varvec{\epsilon }}. \end{aligned}$$

The eigenvalue equation becomes:

$$\begin{aligned} D \varvec{w} + \varvec{u}\varvec{v}^{\mathsf T}\varvec{w}&= \lambda \varvec{w} \\ (\varvec{v}^{\mathsf T}\varvec{w}) \varvec{u}&= (\lambda I - D)\varvec{w} \\ (\varvec{v}^{\mathsf T}\varvec{w}) (\lambda I - D)^{-1} \varvec{u}&= \varvec{w} \\ (\varvec{v}^{\mathsf T}\varvec{w}) \varvec{v}^{\mathsf T}(\lambda I - D)^{-1} \varvec{u}&= \varvec{v}^{\mathsf T}\varvec{w} \\ \varvec{v}^{\mathsf T}(\lambda I - D)^{-1} \varvec{u}&= 1. \end{aligned}$$

In the last line, we divide by the scalar \(\varvec{v}^{\mathsf T}\varvec{w}\), which is guaranteed to be non-zero, as we can take \(\varvec{w}\) with nonnegative entries, at least one positive, due to Perron–Frobenius theory. We now have an implicit equation relating the effective reproduction number \(\lambda \) and vaccination \(\varvec{p}\):

$$\begin{aligned} F(\lambda , \varvec{p}, \varvec{\epsilon })&= \varvec{v}^{\mathsf T}(\lambda I - D)^{-1} \varvec{u} = 1, \end{aligned}$$
(9)
$$\begin{aligned} F(\lambda , \varvec{p}, \varvec{\epsilon })&= \frac{1}{m_{\varvec{\epsilon }}} \sum _{j=1}^n \frac{(1-\epsilon _j)^2 R_{\textsf{v}j} a_j \pi _j}{\lambda - \epsilon _j R_{\textsf{v}j}} \end{aligned}$$
(10)
$$\begin{aligned}&= \frac{\beta }{(\gamma +\theta )m_{\varvec{\epsilon }}} \sum _{j=1}^n \frac{(1-p_j) (1-\epsilon _j)^2 a_j^2 \pi _j}{\lambda -(1-p_j) \epsilon _j a_j \beta /(\gamma +\theta )} = 1. \end{aligned}$$
(11)

We use implicit differentiation to calculate the partial derivatives

$$\begin{aligned} \frac{\partial \lambda }{\partial p_i} = -\frac{\partial F / \partial p_i}{\partial F / \partial \lambda }. \end{aligned}$$
(12)

The denominator of (12) is

$$\begin{aligned} \frac{\partial F}{\partial \lambda } = - \varvec{v}^{\mathsf T}(\lambda I - D)^{-2} \varvec{u} = -\frac{1}{m_{\varvec{\epsilon }}} \sum _{j=1}^n \frac{(1-\epsilon _j)^2 R_{\textsf{v}j} a_j \pi _j}{(\lambda - \epsilon _j R_{\textsf{v}j})^2} <0. \end{aligned}$$
(13)

It follows that the sign of \(\frac{\partial F}{\partial p_i}\) must be the same as that of \(\frac{\partial \lambda }{\partial p_i}\), which is necessarily negative via the Perron–Frobenius theorem (increasing \(p_i\) decreases entries in \(K_\textsf{v}\)).

Observing that \(\frac{\partial R_{\textsf{v}i}}{\partial p_j} = -R_{0i} \delta _{ij}\), the numerator of (12) is readily evaluated from the expression for F in (10).

$$\begin{aligned} \frac{\partial F}{\partial p_i}&= \frac{1}{m_{\varvec{\epsilon }}} (1-\epsilon _i)^2 a_i \pi _i \left\{ \frac{-R_{0i} (\lambda -\epsilon _i R_{\textsf{v}i}) - R_{\textsf{v}i} \epsilon _i R_{0i}}{(\lambda -\epsilon _i R_{\textsf{v}i})^2} \right\} \nonumber \\&= - \frac{1}{m_{\varvec{\epsilon }}} \frac{\lambda R_{0i} (1-\epsilon _i)^2 a_i \pi _i}{(\lambda -\epsilon _i R_{\textsf{v}i})^2} \end{aligned}$$
(14)

Combining eqs. (13) and (14), we obtain

$$\begin{aligned} \frac{\partial \lambda }{\partial p_i} = - \frac{\partial F / \partial p_i}{\partial F / \partial \lambda } = - \frac{\lambda }{1-p_i} \times \frac{x_i(\lambda , p_i)}{\sum _{k=1}^n x_k(\lambda , p_k)} \end{aligned}$$
(15)

where

$$\begin{aligned} x_i(\lambda , p_i) = \frac{(1-\epsilon _i)^2 R_{\textsf{v}i} a_i \pi _i}{(\lambda -\epsilon _i R_{\textsf{v}i})^2} = \frac{(1-\epsilon _i)^2 (1-p_i) R_{0i} a_i \pi _i}{(\lambda -\epsilon _i (1-p_i) R_{0i})^2}. \end{aligned}$$

We reiterate that the expression is negative, as expected. In particular, for the critical vaccination allocation \(\varvec{p_c}=\left( 1-R_{01}^{-1},\ldots ,1-R_{0n}^{-1}\right) \), which gives \(\lambda =1\) for all \(\varvec{\epsilon }\), the above expression reduces to

$$\begin{aligned} \left. \frac{\partial \lambda }{\partial p_i}\right| _{(\lambda ,\varvec{p_c},\varvec{\epsilon })= \left( 1,\varvec{1}-(R_{01}^{-1},\ldots ,R_{0n}^{-1}),\varvec{\epsilon }\right) } = - R_{0i} \times \frac{a_i\pi _i}{\sum _k a_k\pi _k} \propto a_i^2 \pi _i. \end{aligned}$$

This is the direction of the normal vector to the proportionate mixing hyperplane.

The derived expression also generalizes the familiar \(n=1\) homogeneous result. Each element \(\partial \lambda /\partial p_i\) of the gradient may be interpreted as the \(i^\text {th}\) term \(-\lambda /(1-p_i)\) weighted by \(x_i/\sum _k x_k\). This is analogous to

$$\begin{aligned} \frac{dR_{\textsf{v}}}{dp}= -R_0 = - \frac{R_{\textsf{v}}}{1-p} \end{aligned}$$

which results from the well-known relationship \(R_\textsf{v} = (1-p) R_0\) for the \(n=1\) homogeneous mixing case.

In a similar fashion, we can show that \(\frac{\partial \lambda }{\partial \epsilon _i} > 0\), establishing that the overall effective reproduction number increases whenever preferential mixing increases within any group. As before, the sign of \(\frac{\partial \lambda }{\partial \epsilon _i}\) is determined by that of \(\frac{\partial F}{\partial \epsilon _i}\). An implicit expression for the latter may be derived:

$$\begin{aligned} \frac{\partial F}{\partial \epsilon _i}&= \frac{a_i\pi _i}{m_{\varvec{\epsilon }}^2} \sum _j \frac{(1-\epsilon _j)^2 R_{\textsf{v}j} a_j\pi _j}{\lambda -\epsilon _j R_{\textsf{v}j}} \nonumber \\&\quad + \frac{1}{m_{\varvec{\epsilon }}} \frac{-2(1-\epsilon _i)R_{\textsf{v}i} a_i\pi _i (\lambda -\epsilon _i R_{\textsf{v}i})- (1-\epsilon _i)^2 R_{\textsf{v}i} a_i\pi _i(-R_{\textsf{v}i})}{(\lambda -\epsilon _i R_{\textsf{v}i})^2} \nonumber \\&= \frac{a_i\pi _i}{m_{\varvec{\epsilon }}} \left\{ F(\lambda , \varvec{p}, \varvec{\epsilon }) + R_{\textsf{v}i} (1-\epsilon _i) \frac{-2(\lambda -\epsilon _i R_{\textsf{v}i}) + R_{\textsf{v}i} (1-\epsilon _i) }{(\lambda -\epsilon _i R_{\textsf{v}i})^2} \right\} \nonumber \\&= \frac{a_i\pi _i}{m_{\varvec{\epsilon }}} \left\{ 1 + \frac{R_{\textsf{v}i} (1-\epsilon _i)}{(\lambda -\epsilon _i R_{\textsf{v}i})^2} \left( -2\lambda + R_{\textsf{v}i} +R_{\textsf{v}i}\epsilon _i\right) \right\} \nonumber \\&= \frac{a_i\pi _i}{m_{\varvec{\epsilon }}} \times \frac{\lambda ^2 - 2\lambda \epsilon _i R_{\textsf{v}i} + \epsilon _i^2 R_{\textsf{v}i}^2 -2\lambda R_{\textsf{v}i} (1-\epsilon _i) + R_{\textsf{v}i}^2 (1-\epsilon _i^2) }{(\lambda -\epsilon _i R_{\textsf{v}i})^2} \nonumber \\&= \frac{a_i\pi _i}{\sum _{k=1}^n (1-\epsilon _k) a_k \pi _k} \times \left( \frac{\lambda -R_{\textsf{v}i}}{\lambda -\epsilon _i R_{\textsf{v}i}} \right) ^2 >0. \end{aligned}$$
(16)

Finally, we have

$$\begin{aligned} \frac{\partial \lambda }{\partial \epsilon _i} = - \frac{\partial F / \partial \epsilon _i}{\partial F / \partial \lambda } = \frac{a_i\pi _i}{\sum _k \frac{(1-\epsilon _k)^2 R_{\textsf{v}k} a_k\pi _k}{(\lambda -\epsilon _k R_{\textsf{v}k})^2}} \times \left( \frac{\lambda -R_{\textsf{v}i}}{\lambda -\epsilon _i R_{\textsf{v}i}} \right) ^2. \end{aligned}$$
(17)

This generalizes the result proved for \(n=2\) by direct computation of determinants (Chow et al. 2011).

An immediate consequence of (16) is that

$$\begin{aligned} \left. \frac{\partial \lambda }{\partial \epsilon _i}\right| _{(\lambda ,\varvec{p_c},\varvec{\epsilon })= \left( 1,\varvec{1}-(R_{01}^{-1},\ldots ,R_{0n}^{-1}),\varvec{\epsilon }\right) } = 0, \end{aligned}$$

because \(\lambda -R_{\textsf{v}i} = 1 - (1-(1-R_{0i}^{-1})) R_{0i} = 0\). This result is also apparent from the fact that \(F\left( 1,\varvec{1}-(R_{01}^{-1},\ldots ,R_{0n}^{-1}),\varvec{\epsilon }\right) =0\) for all \(\varvec{\epsilon }\).

6 Discussion

In this article, we applied matrix decompositions to the contact structure of a simple SIR meta-population model (1) with vaccination. This allowed us to re-derive some well-known phenomena of mixing in heterogeneous populations, ranging from proportionate to completely isolated. By calculating implicit expressions for the gradients of \(R_\textsf{v}\) with respect to the vaccination allocations \(\varvec{p}\) and mixing fractions \(\varvec{\epsilon }\), we showed in (15) how the homogeneous relationship \(R_0=R_\textsf{v}/(1-p)\) generalizes when \(n>1\) and proved in (16) that the effective reproduction number increases as preferential mixing increases in any of the sub-populations.

Results here are specific to the SIR model presented, which is limited by its simplicity for mathematical tractability. For example, all sub-populations have the same removal and mortality rates and are of constant size, all individuals have the same susceptibility to infection on coming into contact with an infectious person, and the mixing is restricted to Jacquez-type (Jacquez et al. 1988). Nevertheless, phenomena shown here can be expected to be qualitatively preserved in more realistic models. Of note, Britton and colleagues (Britton et al. 2020) alerted the scientific community to the dangers of over-estimating population herd immunity for the COVID-19 pandemic by unwittingly basing estimates on assumptions of homogeneous mixing.

The gradient approach in sect. 5 has been used elsewhere by the current authors to obtain optimal vaccine allocations to reduce the effective reproduction number via steepest descent over the threshold surface (Feng et al. 2017). They also extended the contact structures explored here to multiple-level mixing. This article retrospectively examines the simpler mixing structures in more detail, thus gleaning more insight into the dynamics of epidemics in heterogeneously mixing populations.