Projecting social contact matrices to populations stratified by binary attributes with known homophily

Contact networks are heterogeneous. People with similar characteristics are more likely to interact, a phenomenon called assortative mixing or homophily. While age-assortativity is well-established and social contact matrices for populations stratified by age have been derived through extensive survey work, we lack empirical studies that describe contact patterns of a population stratified by other attributes such as gender, sexual orientation, ethnicity, etc. Accounting for heterogeneities with respect to these attributes can have a profound effect on the dynamics of epidemiological forecasting models. Here, we introduce a new methodology to expand a given e.g. age-based contact matrix to populations stratified by binary attributes with a known level of homophily. We describe a set of linear conditions any meaningful social contact matrix must satisfy and find the optimal matrix by solving a non-linear optimization problem. We show the effect homophily can have on disease dynamics and conclude by briefly describing more complicated extensions. The available Python source code enables any modeler to account for the presence of homophily with respect to binary attributes in contact patterns, ultimately yielding more accurate predictive models.


Introduction
Most social networks, e.g.physical interaction networks relevant to the spread of infectious diseases, exhibit assortative mixing, which describes the presence of more-than-expected interactions between network nodes with similar characteristics [1].Assortative mixing is also known simply as assortativity, or as homophily in the case of social networks.The seminal POLYMOD study supplied empirical evidence for the presence of strong age-assortative mixing in eight European countries [2].Its main results are country-specific contact matrices, which describe the average number of daily contacts an individual of a certain age has with individuals of different ages, where the entire population is stratified into 5-year age bins (e.g., 0 − 4, 5 − 9, . . ., 80+).More recently, these contact matrices have been projected for 144 other countries including the United States, using surveys and demographic data [3].
Accurate infectious disease models must account for heterogeneous contact patterns in a population.During the COVID-19 pandemic, inclusion of realistic age-mixing patterns into infectious disease models has become increasingly popular, evident in the explosion of citations of [2,3].While the available, empirically-grounded contact matrices account for homophily with respect to age, there are many other attributes (e.g., race/ethnicity, vaccine status, occupation, religion, education level or socio-economic status [4]) that are typically not included in epidemiological models.This is likely because there exist no empirical contact matrices that describe the assortative mixing with respect to attributes beyond age, partly because empirical studies that go beyond age present a great logistical challenge.For many of the aforementioned attributes however, we possess rough estimates of their homophily in a population.For example, it is well-established that there exists a strong level of ethnic homophily in the U.S. population [4,5].
In this manuscript, we describe a novel procedure that uses linear algebra and non-linear optimization techniques to infer contact matrices for populations stratified by age and additional attributes, for which estimates of their homophily exist.In this initial work, we require these attributes to be binary but extending to attributes with finitely-many values or categories should be straight-forward.In brief, we introduce a set of linear conditions a meaningful contact matrix should satisfy and show that there are typically infinitely many contact matrices that do so.To find the "optimal" contact matrix, we therefore define an objective function and pick the contact matrix that minimizes this function.User preference can inform the choice of objective function.Throughout the entire manuscript, we follow two simple examples and show that accounting for homophily can have a strong effect on model dynamics, i.e., how a pathogen spreads throughout a population.

Contact matrices
Each individual in a population can be categorized using a multitude of attributes (e.g., age, ethnicity, education level).We can distinguish attributes by their range: some take on continuous values (e.g., age), while others are categorical (e.g., education level), discrete-valued or even binary (e.g., vaccinated against COVID-19 or not).As a whole, we can stratify a population based on a selection of attributes.In what follows, we assume all attributes take on only finitely many different values.This does not limit the usability of the developed methods, as binning can turn any continuous-valued attribute such as age into one with finitely-many choices, without losing substantial information if the number of bins is large.We begin with some basic definitions.
Given a population of size N total , a (joint) distribution (or stratification) of that population based on the d attributes is a For the methods developed in this manuscript, it does not matter if N total describes the absolute number of individuals in the population or if N total = 1 and we consider proportions.To describe the rates of contacts between individuals with different attribute values, we introduce the concept of a contact matrix.

Definition 2.3 Given a population stratified across the combined attribute space
a contact function (also called contact matrix for reasons that become clear in the following remark) is a function that describes the average number of daily contacts an individual with attributes (i 1 , . . ., i d ) ∈ A has with individuals with attributes (j 1 , . . ., j d ) ∈ A.  Remark 2.5 One can think of a distribution N of a population and a contact matrix C as two summary statistics of an undirected graph (i.e., a social interaction network) where the nodes (i.e., individuals) are labeled by the attribute values.Since the entry C ij in the contact matrix describes the average number of interactions (i.e., edges) connecting an individual with attribute i ∈ A with individuals with attribute j ∈ A, the total number of interactions between individuals with attributes i and j is given by N i C ij whenever i = j (and N i C ii /2 otherwise).Due to the undirected nature of physical, real-world interactions, it is however also given by N j C ji .A meaningful contact matrix representing physical interactions, relevant e.g. to describe the spread of an infectious disease, must therefore possess the following "symmetry" property.

Remark 2.4 Given a combined d-dimensional attribute space
Definition 2.6 (Reciprocity) Given a combined attribute space A and a corresponding distribution N of a population, a contact matrix This property is often also referred to as symmetry; we avoid this term due to its ambiguity in matrix theory.
Example 2.7 An age-based contact matrix has been inferred for the United States [3] for age groups 0 − 4, 5 − 9, . . ., 75 − 79, 80+.Using U.S. census data [6], this contact matrix can be condensed to the four age groups from Example 2.2 (Table 1a).The contact matrix is not reciprocal.To see this, compare e.g. the total number of reported daily contacts between 15 − 64 and 65 − 74 year olds: Empirical contact matrices stratified solely by age, such as those reported in the seminal POLYMOD study [2], are typically not reciprocal.This is because elderly people are generally more likely to report a short interaction as a contact.Extending the 1-dimensional transformation described in [7], an empirical contact matrix can however be easily transformed into a reciprocal one as follows.
C defined by is reciprocal.
Proof By design of the transformation, the total number of interactions between individuals with attributes i = (i 1 , . . ., i d ) ∈ A and j = (j 1 , . . ., j d ) ∈ A is given by Corollary 2.9 The total number of contacts in a population with reciprocal contact matrix C and distribution N over combined attribute space A is given by 1 2 i∈A j∈A Example 2.10 Using Theorem 2.8, we can transform the non-reciprocal U.S. contact matrix (Table 1a) into a reciprocal contact matrix (Table 1b).

Homophily
We are now in a position to (i) define the homophily of a contact matrix with respect to a binary attribute and (ii) describe the problems arising when trying to project e.g.age-specific contact matrices to populations stratified by additional binary attributes with homophily.We look first at how homophily with respect to a binary attribute can be defined for an undirected labeled graph (e.g., a physical interaction network) and use the correspondence described in Remark 2.5 to define the homophily with respect to a binary attribute for a reciprocal contact matrix.
Definition 3.1 Given an undirected graph with nodes labeled by a binary attribute with prevalence (i.e., proportion of nodes labeled true) p ∈ (0, 1), we distinguish between two types of edges: those connecting nodes with same and with opposite attribute values.As in [8], let φ ∈ [0, 1] denote the proportion of edges connecting nodes with same attribute values.
In the case of complete segregation, we have φ = 1 and the graph possesses two separate components.In the other extreme case of complete disassortativity, we have φ = 0 and the graph is bipartite.Finally, in the absence of assortative mixing (i.e., no homophily), we have φ = E(φ) where (3) Extending [8], we define the homophily h with respect to the binary attribute by comparing φ to its expected value, Example 3.2 If p = 2/3 of individuals in a population are vaccinated against COVID-19, the expected number of interactions involving individuals with the same vaccination status is E(φ) = 5/9.Thus if φ = 5/9, there exists no homophily regarding vaccination status.If φ = 7/9, the population exhibits 50% homophily.In the extreme case of φ = 1, there exists 100% homophily (i.e., complete segregation), while the other extreme case of complete disassortativity with φ = 0 corresponds to a homophily of −100% (or 100% heterophily).

Definition 3.3 Given a combined attribute space
a corresponding distribution N of a population and a contact matrix C ∈ [0, ∞) A×A , we define the reduced attribute space without the kth attribute as The reduced distribution without the kth attribute is a (d − 1)-dimensional array Similarly, the reduced contact matrix without the kth attribute is a function If the kth attribute is binary, assume w.l.o.g.A k = {0, 1} and we define the attribute's prevalence in the population as the (d−1)-dimensional array , this definition corresponds exactly to the common language definition of "prevalence", used e.g. in Definition 3.1.
A×A be a reciprocal contact matrix with A and N as before, and let the kth attribute be binary.Then, the homophily of the contact matrix with respect to the binary attribute can be defined by comparing the proportion of interactions between individuals with same attribute values, which is given by to its expected number, E(φ), which can be computed from the prevalence P of the binary attribute, as follows.W.l.o.g.assume that k = d and A d = {0, 1}, i.e., the last attribute is binary, so that we can write (i, v) to denote (i 1 , . . ., i d−1 , v) for any i ∈ A −d and v ∈ A d .In the absence of homophily, the binary attribute does not affect mixing patterns.For a given C, therefore, we can define a reciprocal contact matrix without homophily, denoted C 0 , by distributing according to the prevalence P the aggregated contacts in the reduced contact matrix.That is, for i, j ∈ A −d and v ∈ A d , we have By design, we now have E(φ) = φ(C 0 , N ), and homophily is defined as in Equation 4, Remark 3.5 Similar to the definition of the non-homophilic contact matrix C 0 , we can define a contact matrix C h that has homophily h ∈ (0, 1] with respect to the dth binary attribute, by moving a proportion h of all contacts between opposite-valued individuals to equal-valued individuals.With Equation 6, that is for i, j where While C h has homophily h by definition, it is only reciprocal in special cases.To see this, consider e.g.(i, 1), (j, 1) ∈ A. Reciprocity implies ),(j,1) = N (j,1) C h (j,1),(i,1) ⇐⇒ ji , which due to the reciprocity of C is equivalent to Example 3.6 Consider a population of size 400, which is stratified into (i) two age groups (e.g., young and old) with age distribution (100, 300), and (ii) by an additional binary attribute with prevalence (0.5, 0.8) across the age groups.Further assume an age-specific contact matrix (i.e., the reduced contact matrix without the second attribute, Definition 3.3) as in Table 2a.We can easily check that this contact matrix is reciprocal.Next, using the prevalence of the binary attribute, as described in Definition 3.4, we obtain a contact matrix C 0 (Table 2b), which stratifies the population by both attributes and exhibits no homophily with respect to the binary attribute.This contact matrix is still reciprocal.As described in Remark 3.5, we can also define a contact matrix C h with any given level of homophily for the binary attribute.In the extreme case of complete segregation (h = 1), this would result in the contact matrix shown in Figure 2c.
Note that in both extended contact matrices, C 0 and C h , the total number of contacts an individual of a certain age has with individuals from each age group, and therefore also the total number of contacts (row sum), agrees with the basic age-age contact matrix, which is desirable.C h is however not reciprocal (50 • 12 = 240 • 4) because h = 0 and the prevalence of the binary attribute is not constant across age groups (Remark 3.5).Since reciprocity is a necessary condition for any physical contact matrix, relevant e.g. to study the spread of an infectious disease, this example motivates the need for a more elaborate approach.

Extensions of contact matrices
In this section, we will develop methods to stratify a given contact matrix by an additional binary attribute with known prevalence and homophily.We start by describing a set of necessary conditions a meaningful extended contact matrix must satisfy.Any extended contact matrix that fails to satisfy some of these conditions is not suitable to accurately describe physical contacts and can, for example, not be used to study the spread of an infectious disease.).An extended contact matrix C ∈ [0, ∞) A ×A is meaningful if it satisfies all of the following linear properties: (a) Reciprocity: For all i, j ∈ A , (c) Same contact patterns as in C: For all i, j ∈ A, (d) Homophily: For the proportion φ of contacts between individuals with same binary attribute values (Equation 5) we have where φ and E(φ) are computed as in Definition 3.4.
(e) If N i = 0 for some i ∈ A , we can further assume C ij = C ji = 0 for all j ∈ A .
Remark 4.2 With m := |A * |, all conditions of Definition 4.1 can be written as a linear system of q equations Xc = y, (10) where X is a q × m 2 matrix, c is a vector of the m 2 entries in the extended contact matrix C and y is a vector of the same dimension containing the "right-hand"-side of the linear equations.2b.Thus to find an extended contact matrix that satisfies all desirable properties from Definition 4.1, we solve an under-determined non-homogeneous system of 15 linear equations in 16 variables, Xc = y, where dim Nul X = 4.If h = 1, we have C (i,v),(j,1−v) = 0 for all i, j ∈ {y, o} and v ∈ {0, 1}.
Here, the properties (a)-(d) yield a system of 11 equations in 8 variables, where dim Nul X = 1.
Lemma 4.4 Given C, N, A, P as in Definition 4.1, let C 0 be defined as in Equation 6.That is, for i Then, C 0 written as a vector c 0 ∈ [0, ∞) m 2 is a non-negative solution of Equation 10 for homophily h = 0.
Proof We start by showing C 0 is reciprocal.Let i, j ∈ A and v ∈ A d+1 ∼ = {0, 1}.Remember P j = N (j,1) /N j and 1 − P j = N (j,0) /N j .From the reciprocity of C, it follows that ),(i,v) .Equivalently, we can show N (i,v) C 0 (i,v),(j,0) = N (i,v) C 0 (j,0),(i,v) , which proves the reciprocity of C 0 .Properties (b) and (c) from Definition 4.1 follow very similarly.Finally, by design (see Definition 3.4) C 0 exhibits no homophily (i.e., φ = E(φ)), which means c 0 in vector form solves Equation 10 for homophily h = 0. Non-negativity of c 0 follows directly from C ≥ 0 and It is not clear if there exists a non-negative solution of Equation 10 for every choice of homophily h ∈ [−1, 1].It is however easy to see that the range of homophily values for which such a solution exists must be an interval.Proof Let c 0 ≥ 0 be the solution of Equation 10 for homophily 0 from Lemma 4.4.Further, assume c h is a non-negative solution to Equation 10 for homophily h ∈ (0, 1].Let α ∈ [0, 1].We show that the convex combination c α = αc h + (1 − α)c 0 is a nonnegative solution of Equation 10 for homophily αh.Let C 0 , C h , and C α denote the contact matrices corresponding to c 0 , c h , and c α , respectively.
(a) Reciprocity of C α follows directly from the reciprocity of C 0 and C h : For all i, j ∈ A , (b)-(c) Using the convex definition of C α , it is similarly straight-forward to show that C α contains the same total contacts and same contact patterns as the original contact matrix C (and as C 0 and C h ).
Due to the linear definition of φ (Equation 5), we have and therefore, with Equation 7, the homophily of Non-negativity of C α follows directly from C h , C 0 ≥ 0. Equivalently, one can show that c α is a non-negative solution of Equation 10for homophily αh given we have a solution c h ≥ 0 for h ∈ [−1, 0).
Example 4.6 Given C and N as in Table 2a, we have hmax = 1 for most choices for the prevalence P = (P 1 , P 2 ) of the added binary attribute (Figure 1).Only when P is very unbalanced, do we have hmax <    2a.
P .We generated 10 5 random prevalence vectors P ∼ U (0, 1) 4 and recorded whether we were able to obtain a solution c 1 ≥ 0. Binning by P i , i = 1, . . ., 4 confirms the observation made in the previous example that to ensure hmax = 1, we primarily require a non-extreme prevalence for the age group with the largest population share, i.e., 15 − 64 year olds (with prevalence P 2 ) in this example (Figure 2).10(i.e., a contact matrix C h , which satisfies all properties of Definition 4.1) for certain choices of homophily h ∈ [−1, 1].C h ≥ 0 is however normally not a unique solution.To find the "best" contact matrix C ∈ [0, ∞) A ×A , we define an objective function g : [0, ∞) A ×A → R and solve the following optimization problem with linear equality constraints: where c ∈ [0, ∞) m 2 with m = |A * | is, as before, the contact matrix C written as a column vector.Rather than solving this problem, we can solve a related, smaller problem over the null space of X with basis {b 1 , . . ., br}, using the known solution c h to formulate linear inequality constraints: Example 4.9 For C, N, P defined as in Example 3.6 and homophily h = 1, the null space of X is one-dimensional.Therefore, there exist infinitely many contact matrices that satisfy all desirable properties from Definition 4.1.The two extreme cases are shown in Table 3a,b.While both contact matrices have the same age-age contact patters as in C, they achieve this in the most unbalanced way.For instance, to obtain an average of C 11 = 9 contacts for young people with young people, both matrices assign 18 contacts to one group of the young people (those with attribute value True or False, respectively) and none to the other group.Any convex combination of these two contact matrices will be more balanced.
This example suggests an objective function that minimizes the differences in age-age contact patterns between people with opposite attribute value.Using the Euclidean distance, we can define the following objective function.C (i,1),(j,1) + C (i,1),(j,0) − C (i,0),(j,1) − C (i,0),(j,0) 2 . ( Example 4.11 The contact matrix shown in Table 3c minimizes g defined as in Equation 14.An alternative would be to define the least-squares solution (i.e., the contact matrix C that minimizes Xc − y from Equation 10) as optimal.This would result in the contact matrix shown in Table 3d.
When perfect segregation (h = 1) is not desired but instead we require h ∈ (0, 1), another objective function may be more useful.Instead of only looking at the overall homophily of a contact matrix with respect to a binary attribute, we can also define the specific homophily values for each split interaction.Definition 4.12 Let A = A × A d+1 with A d+1 = {0, 1} be an extended attribute space as in Definition 4.1.Given a contact matrix C, we define the specific homophily of the contact patterns that an individual with attributes (i, v) ∈ A has with individuals with attributes j ∈ A by comparing the proportion of contacts between opposite-valued individuals with the expected proportion of such contacts, given prevalence P .That is, with P j,1−v defined as in Equation 9.
Ideally, the specific homophily value for each split interaction should equal the overall homophily.As outlined in Remark 3.5 however, this is only possible if h = 0 or if the prevalence is constant.Thus, we suggest as another objective to minimize the sum of the squared differences between the specific homophily values and the desired homophily.Definition 4.13 Let A = A × A d+1 with A d+1 = {0, 1} be an extended attribute space as in Definition 4.1 and let h ∈ [0, 1] be the desired level of (overall) homophily with respect to the added (d + 1)th binary attribute.We define an objective function with P denoting the prevalence.
Example 4.14 For C, N, P defined as in Example 3.6 and homophily h = 0.5, the contact matrix shown in Table 4a minimizes g h , defined as in Equation 16.The specific homophily values, defined as in Equation 15, are shown in Table 4b.Due to the differences in the prevalence P = (0.5, 0.8) of the binary attribute, a contact matrix where all specific homophily values equal h = 0.5 is not possible, as it fails the reciprocity condition.

Application
In this section, we will compare a simple infectious disease model that only accounts for age-based mixing patterns with a more complicated model that accounts for an additional binary attribute with known homophily and prevalence.We show that disease dynamics may look quite different, especially if the homophily is high and the infection begins in one of the two groups.while for all other sub-populations j we have I j (0) = 0.A comparison of the disease dynamics for different levels of homophily reveals several phenomena (Figure 3). 1.Even though initially only young people are infected, age does not matter much in the dynamics, likely because the contact matrix (Table 2a) exhibits very low homophily with respect to age.
2. The higher the homophily, the later and the lower the peak incidence appears among the individuals with an attribute value opposite to the one where the disease started.The presence of homophily thus introduces a delay mechanism.On the contrary, the higher the homophily, the earlier and higher the peak incidence appears among the sub-populations with the same attribute value as where the disease started.This is likely due to the requirement that the total number of contacts are constant, irrespective of the homophily.In the presence of high homophily, relatively more contacts happen between people with the same attribute value, leading to an increased spread of the disease within one group.
3. With increasing homophily (except for the extreme case of complete segregation, h = 1), the time until the incidence falls below a certain threshold increases.This is likely because it takes more time for the disease to manifest itself among the sub-populations with attribute values opposite to the one where the disease started.
This example clearly shows the impact high levels of homophily can have on disease dynamics and that it can be important to account for the presence of homophily.
As outlined in Remark 2.5, contact matrices can also be used to generate undirected graphs.A homophilic contact matrix thus provides a way to obtain a more realistic interaction network, on which the spread of a pathogen through a community can be studied.The details of this application are however beyond the scope of this study.

More complicated extensions of contact matrices
Thus far, we have described how to expand a given contact matrix by a binary attribute with known prevalence and homophily.In this section, we will describe one of several possible extensions where in addition to the homophilic binary attribute the population is also split into several sub-populations with differential connectivity.This is, for example, needed to accurately model different contact levels due to occupation; e.g., people with public-facing jobs on average have a lot more contacts than people with office hours.We will now describe the linear conditions a meaningful contact matrix extended by both a homophilic binary attribute and a binary attribute with differential connectivity must satisfy.(b2) Differential connectivity with respect to X d+2 described by K. Fix v ∈ A d+2 .Individuals with attribute v 2 ∈ A d+2 possess Kv 2 /Kv times more contacts than individuals with attribute v ∈ A d+2 (and otherwise same characteristics).That is, for all i, j ∈ A and all v 1 ∈ A d+1 ,  5) we have where φ and E(φ) are computed as in Definition 3.4.If h = 1, we require φ = 1.That is, for any i, j ∈ A and any v 2 , w 2 ∈ A d+2 , C (i,0,v2),(j,1,w2) = C (i,1,v2),(j,0,w2) = 0.
(e) If N i = 0 for some i ∈ A , we can further assume (as in Definition 4.1) C ij = C ji = 0 for all j ∈ A .
Remark 6.2 To find the "best" meaningful extended contact matrix, one solves the same non-linear optimization problem as outlined in Section 4.
Remark 6.3 Definition 6.1 also includes all linear conditions needed to find a meaningful contact matrix extended by a binary homophilic attribute, where the two attribute values have differential connectivity.Simply set A d+1 = A d+2 ∼ = {0, 1} and use an extended distribution N ∈ [0, ∞) A with N (i,v1,v2) = 0 whenever v 1 = v 2 .

Conclusion
The current pandemic has revealed the importance of accurate epidemic forecasting models.One key element of these models is a contact matrix that describes the rates of mixing between different sub-populations.Age constitutes the primary attribute used to stratify populations.This is partly because age, especially for COVID-19 with its disproportionate burden on older individuals, is an important variable but also partly because most empirical work on contact matrices focuses primarily on age.This study introduces new methodology enabling modelers to expand contact matrices using binary attributes for which the prevalence and the level of homophily in a population is known or can be estimated.We showed that the disease dynamics can be very different when homophily is included in a model.A more elaborate, recent model uses this new methodology and shows that accounting for homophily with respect to ethnicity is important when designing optimal vaccine roll-out strategies [9].This manuscript leaves several questions unanswered.First, we only consider binary attributes.This means any non-binary attributes need to be binarized, which may not always be feasible or desirable.An extension to nonbinary attributes should be straight-forward and ideas from [10] should prove useful in this effort.One possible difficulty, however, is that homophily can no longer be described by a single value, but requires m 2 variables where m is the number of attribute values.Finally, we presented only a simple, simulationbased application to show the effect of homophily on disease dynamics.Related theoretical considerations, such as investigating the effect of homophily on the effective reproductive number using next-generation matrices as in [11], should provide fruitful avenues for future research.
we can create an equivalent 1-dimensional attribute space whose attribute values are exactly the m := |A i | different combinations of attribute values in A. This motivates the use of the term "contact matrix" for the function C defined in Defintion 2.3, as it can be written as an m × m-matrix.ab

Table 2
Contact matrices described in Example 3.6.(a) Reciprocal contact matrix for a population stratified only by age.(b-c) Contact matrices for the same population stratified by age and an additional binary attribute.Both contact matrices possess the same number of contacts between any pair of age groups as in (a).The contact matrix in (b) exhibits no homophily and is reciprocal, while the contact matrix in (c) exhibits 100% homophily but is not reciprocal (cells in red highlight where reciprocity fails).All contact matrices possess the same total number of contacts (row sum).

Definition 4 . 1
Given a combined attribute spaceA = A 1 ×• • •×A d , a corresponding distribution N of a population, a reciprocal contact matrix C ∈ [0, ∞) A×A ,and a binary attribute X d+1 with known homophily h and prevalence P in the population, define N as the natural extension of the distribution (a function of N and P ) over the extended attribute space A = A × A d+1 , |A d+1 | = 2 (to simplify notation, we frequently assume w.l.o.g.A d+1 = {0, 1} b) Same total contacts as in C: The total number of contacts of an individual should never depend on the value of the added binary attribute X d+1 .That is, for all i ∈ A, j∈A C (i,0),j = j∈A C (i,1),j = j∈A C i,j .

Theorem 4 . 5
Given C, N, A, P as in Definition 4.1, there exists h min , hmax with −1 ≤ h min ≤ 0 ≤ hmax ≤ 1 such that Equation 10 has a non-negative solution if and only if h ∈ [h min , hmax].

Fig. 1
Fig.1For different combinations of the prevalence P of the added binary attribute (axes), the maximal homophily value (hmax) that yields a non-negative solution of Equation 10 is shown.C and N are defined as in Table2a.

Remark 4 . 8
Given C, N, A, P as in Definition 4.1, Theorem 4.5 proves the existence of a non-negative solution c h of Equation

Table 3
Selected contact matrices for C, N, P defined as in Example 3.6 and homophily h = 1.(a,b) Extreme contact matrices described in Example 4.9 where one entry describing contact of young with young people is zero (highlighted in red).(c) Contact matrix that maximizes the balance in age-age contact patterns, i.e., that minimizes g defined as in Equation 14.(d) Least-squares solution.

Definition 4 .
10 Let A = A × A d+1 with A d+1 = {0, 1} be an extended attribute space as in Definition 4.1.We define an objective function g : [0, ∞) A ×A → R that measures the balance in age-age contact patterns by g(C) = i∈A j∈A

a b Table 4
(a) Contact matrix that minimizes Equation 16, the squared differences between specific homophily values and desired homophily h = 0.5, for C, N, P defined as in Example 3.6.(b) Specific homophily values (Equation 15) for the contact matrix shown in (a).

1 . 5 . 2
dS j dt = −λ j (I)S j dI j dt = λ j (I)S j − rI j dR j dt = rI j for j = 1, . . ., m.Here, r is the rate of recovery, the force of infection takes the form λ j (I) = λ j (I 1 , . . ., Im) = β m k=1 C jk I k , β is the transmission rate and C ∈ [0, ∞) m×m is the reciprocal contact matrix.As usual, we further assume that the population is closed with m j=1 S j + I j + R j = Example To see the effect of homophily, consider C, N, P as in Example 3.6 and set β = 0.05, r = 0.1.Assume there exists no pre-existing immunity in the population, R j (t = 0) = 0. Further, assume the disease starts in one sub-population, e.g., I young & True (t = 0) = 0.01 N young & True ,

Fig. 3
Fig.3Effect of homophily on disease dynamics.For different levels of homophily (sub panels), the proportion of currently infected people is shown for all four sub-populations considered in Example 5.2.Color distinguishes the value of the added binary attribute, while line style differentiates age.
(a) Reciprocity (as in Definition 4.1): For all i, j ∈ A ,N i C ij = N j C ji (b) Same total contacts as in C:(b1) The total number of contacts of an individual should never depend on the value of the added homophilic binary attribute X d+1 .That is, for all i ∈ A and all v 2 ∈ A d+2 , j∈A C (i,0,v2),j = j∈A C (i,1,v2),j .
Example 2.2 The age distribution, N = (N 1 , . .., Nm), of a population split into m age groups is a non-negative (1-dimensional) vector of length m, which describes the number (or proportion) of individuals in each age group.For the U.S. population and the four age groups defined as A = {0 − 14, 15 −

Table 1 Age
-based contact matrix for the United States, specifying the average number of daily contacts an individual of a given age group (row) has with individuals of different age groups (columns).Total population count (N ) in millions.(a) Inferred contact matrix from [3], which is not reciprocal.(b) Reciprocal contact matrix following transformation of the inferred contact matrix by Theorem 2.8.
1.Moreover, P 2 is more important than P 1 in determining if there exists a non-negative solution of Equation 10 for homophily h = 1.This is likely due to N 2 > N 1 .
Example 4.7 If we stratify the U.S. population with age groups 0 − 14, 15 − 64, 65 − 74, 75+ (and age distribution N and reciprocal contact matrix C as in Table1b) by an additional binary attribute with prevalence P = (P 1 , P 2 , P 3 , P 4 ), we can find a non-negative solution of Equation 10 for homophily h = 1 for almost all choices of Definition 5.1 A simple SIR (susceptible, infected, recovered) infectious disease model, in which the population is stratified into m groups (e.g., age groups or attribute-age groups) that accounts for assortative mixing is given by Definition 6.1 Let A = A 1 × • • • × A d be a combined attribute space, N a corresponding distribution of a population, C ∈ [0, ∞) A×A a reciprocal contact matrix, and X d+1 a binary attribute in A d+1 ∼ = {0, 1} with known homophily h as in Definition 4.1.Let X d+2 be a finitely-valued attribute with values A d+2 and known differential connectivity K ∈ (0, ∞) |A d+2 | .That is, individuals with attribute value v 2 ∈ A d+2 have Kv 2 /Kw 2 times the number of contacts compared to individuals with attribute value w 2 ∈ A d+2 , assuming all other attributes in A × A d+1 are the same.Further, let N be an extended distribution of the population corresponding to the extended attribute space A = A × A d+1 × A d+2 .An extended contact matrix C ∈ [0, ∞) A ×A is meaningful if it satisfies all of the following linear properties: