Adaptive data-driven age and patch mixing in contact networks with recurrent mobility

Graphical abstract


Introduction
Contact patterns are key determinants of epidemic dynamics because they define who can be infected, by whom, and how quickly [ 2 ]. Arenas et al. [ 1 ] develop a patch-based model of SARS-CoV-2 transmission applied to Spain, in which the modelled population is stratified by geographic patches and three age groups. Following foundational work by [3][4][5], the model incorporates data on short, recurrent mobility patterns to determine contact rates between individuals in different patches and age groups. We build upon this contact model to incorporate improved age mixing patterns, which are stratified by different contact types and are responsive to the age distributions of mixing populations, as proposed by [ 6 ]. We also explore some practical considerations in parameterizing such models.

Method
Consider a population stratified by N g patches and N a age groups. 1 Let P ga be the number of people in patch g and age group a . Let y denote N y different types of contacts (e.g. household, workplace, etc.). Let B gg be the proportion of population P g who travel to g each day, or the "mobility matrix". 2

Original Approach
Arenas et al. [1] model the force of infection (incidence per susceptible) experienced by population P ga as: λ ga (t) = (1 − ρ a ) ga (t) + ρ a g * B gg * a g * a (t) (1) where: ga (t) is the probability of acquiring infection while in patch g; and ρ a ∈ [0 , 1] is an agespecific overall mobility factor. Thus, λ ga (t) is the sum of infection probabilities from the residence patch g, and from visited patches g * = g. The probability g * a (t) is modelled using the chained binomial for multiple exposures [7] : f g * C a θ aa g * g i a (2) where: β i is the per-contact transmission probability associated with infectious state i ; f g * is a density factor associated with patch g * ; C a is the expected number of contacts made per person per day in age group a ; θ aa is the age distribution of those contacts, derived from [8] for Spain, such that a θ aa = 1 ; and g * g i a is the proportion of individuals present in patch g * who reside in patch g and who are in infectious state i , for each age group a . This proportion g * g i a is defined as: g * g a i = P g i a M g g * a g i P g i a M g g * a (3) where M gg a is a convenience simplification of the mobility matrix: This model of infection captures important mixing patterns related to recurrent mobility that are relevant to epidemic modelling on relatively small spatial and time scales. However, the model could be improved by separating different contact types throughout the force of infection equation, and by allowing age mixing patterns to respond to local demographic and intervention conditions. Three specific issues with the original approach are as follows: 1. Contact balancing: The contact balancing principle states that the total number of contacts formed by group a with group a should equal the number formed by group a with group a [6] : For a model with non-random age mixing and random (proportionate) mixing by patches, Eq. (5) could be satisfied by a single fixed age mixing matrix θ aa , i.e. for the population overall.
However, in the context of patch-based mixing reflecting recurrent mobility, Eq. (5) should be satisfied in each mixing context (patch). Specifically, if different patches have different age distributions, or different rates of per-person contact formation due to household size, employment, etc., then it would not be possible to satisfy Eq. (5) with a single fixed age mixing matrix θ aa . The implications of violating Eq. (5) depend on relative differences in demographics and/or contact rates by patch and/or age group. For example, if a given patch skews younger than average in age, and most contacts are formed with other members of the same patch, then fixed average θ aa would underestimate the number of younger contacts among residents of this patch, and overestimate the number of older contacts. 2. Age mixing by contact type: A related issue is that the expected contact rates by age group C a reflect the summation of different types of contacts, and so the fixed age mixing matrix θ aa is applied to all contact types. As such, changes to the numbers of each type of contact are not paired with changes the overall mixing patterns. As illustrated by the polymod study [2] , age mixing patterns vary by contact type, such as highly age-assortative mixing in schools. Thus, differential reductions in each contact type would affect overall age mixing patterns. For example, if reductions in school-related contacts due to school closures were not reflected in θ aa , then the relative contribution of children to overall transmission could be overestimated during the period of school closures.

Modelling contact & mobility reductions:
The term (1 − ρ a ) ga (t) in Eq. (1) represents transmission to non-mobile individuals in patch g. The associated definitions in Eqs. (2)(3)(4) consider transmission to these non-mobile individuals from visitors to patch g. Such definitions therefore imply that non-mobile individuals still form contacts with visitors to their residence patch. However, it may be useful to model some or all non-mobile individuals as only forming contacts with other individuals from their residence patch. That is, scenarios may exist wherein a fraction of the population only has household contacts, as could be the case with public health measures such as lockdowns. As illustrated in Figure A.1, the original approach may overestimate inter-patch connectivity during periods of reduced mobility (via lockdowns) versus an approach in which some or all non-mobile populations are limited to contacts with others from their residence patch and not with visitors. Thus, the original approach [1] could underestimate the impact of confinement lockdown strategies on inter-patch transmission reduction.
We therefore develop a refinement of the original approach, with the aim of addressing the above three issues.

Proposed Approach
In the proposed approach, the contributions of different contact types to the force of infection are added to the binomial function for multiple exposures: C gag a y g a i (6) where: C gag a y is the expected number of type y contacts formed per person per day among individuals in population P ga with those in population P g a ; and g a i is the proportion of individuals in residing in patch g and age group a who are in infectious state i : For each type of contact, C gag a y is defined to reflect both age-related and mobility-related mixing factors, as described in the following subsections. To support these descriptions, we will refer to Figures from an example application, although the details of the application and the Figures are given in § 3 . Collecting the full network of contacts in the matrix C gag a y provides a representation that is easy to interpret, and allows us to compute various properties, like the margins in a, a or g, g , and whether contact balancing is satisfied per Eq. (5) . Additionally, separating contact types allows the incorporation of different probabilities of transmission per contact type β i y , if desired.

Age Mixing
Prem et al. [9] project contact patterns by 5-year age groups from the polymod study [2] onto 177 countries, considering various demographic data. These contact matrices represent C aa y : the expected number of type y contacts formed per day among individuals in age group a with those in age group a . Four types of contact are considered: "home", "work", "school", and "others" 3 ( Figure 4 a). We aim to incorporate these contact numbers and patterns into C gag a y .
The first challenge is that the contact matrices C aa y are inherently weighted by the underlying population age distribution-the proportion of expected contacts with age group a is proportional to the size of age group a . To overcome this challenge and apply these patterns to new population age structures, Arregui et al. [6] suggest to divide by the population age distribution to obtain an "unweighted" matrix C u aa y ( Figure 4 b): 4 C u aa y = C aa y P P a (8) where P is the mean of P a . The next challenge is that C u aa y may not satisfy the contact balancing principle, Eq. (5) , due to sampling and/or reporting error in the polymod survey. To ensure that the overall mixing matrix C gag a y will satisfy the balancing principle, the input age mixing matrix C u aa y must satisfy the principle. A simple solution is to average C u aa y with its transpose to obtain the "balanced" matrix C ub aa y ( Figure 4 b): This operation may change the margin C ay , representing the total type y contacts formed by individuals in age group a . However, such changes are reasonable if understood as a correction for sampling bias. 5 3 The "others" contact type in [9] is itself derived from the combination of "leisure", "transport", and "other" contact types from [2] , while the "home", "work", and "school" types are the same between the studies. 4 The matrix C u aa y could also be interpreted as the expected contact matrix for a population with a rectangular demographic pyramid [6] . 5 A perfect survey in a closed population would produce a contact matrix C u aa y that is already balanced.
A final challenge in applying the contact matrices from [9] is that the 5-year age groups may not align with the age groups of interest. Overcoming this challenge is not theoretically required to obtain C gag a y , but we describe a solution here in case it is useful for modelling applications. We begin by upsampling the contact matrix from 5-year age groups a 5 to 1-year age groups a 1 using bilinear interpolation, based on the midpoints of each age group, and scaled by a factor of 1 / 5 . To avoid edge effects associated with many interpolation implementations, we first pad the matrix by replicating the edges diagonally. If the desired age groups extend beyond the maximum age group of 80 available in [9] , diagonal padding can also be used to approximate the trends in the additional age groups. Then, given the age groups of interest a * (which may have irregular widths), we aggregate C ub a 1 a 1 y to obtain C ub a * a * y using matrix multiplication with indicator matrix A : The right-hand A T term sums the total number of contacts formed with the 1-year "other" age groups a 1 corresponding to a * . The left-hand A term averages the total number of contacts formed from the 1-year "self" age groups a 1 corresponding to a * . The average weights each 1-year age group a 1 equally, although other weights could be incorporated through the nonzero values of A . Another interpretation of the normalization sum is the widths of the age groups a * .
The resulting matrix C ub a * a * y represents the expected contacts among age groups a * when mixing with a population having equal proportion in all age groups a * (regardless of their width). Thus, C ub a * a * y can later be multiplied by the population age distribution of interest -reversing Eq. (8) -to obtain the expected number of contacts when mixing with that population. This approach then addresses issues 1 and 2 described in § 2.1 .

Mobility-Related Mixing
In conceptualizing mobility-related mixing, we define two types of contexts in which contacts can be formed, similar to "residences" and "common" sites in [10] : • Home pools: where contacts are formed exclusively with other residents of the same patch (e.g. for household contacts) • Travel pools: where contacts are formed with individuals from any patch who are present in the pool (e.g. for work contacts) We model one home pool and one travel pool associated with each patch, as illustrated in Figure 1 .
In this conceptualization, only contacts associated with travel pools are influenced by the population mobility matrix B gg , representing the expected proportions of individuals from patch g who visit patch g per day. For contacts associated with home pools, this matrix is functionally replaced with an identity matrix δ gg . It is not necessary to assume that all contacts of any particular type are formed in only one type of pool. Rather, we introduce a parameter h y ∈ [0 , 1] representing the proportion of type y contacts that are formed in the home pool, and the remainder ( 1 − h y ) are formed with travel pools. For example, we could have h y = 1 for household contacts, h y = 0 for work contacts, and h y = 0 . 5 for school contacts. Thus, the expected contacts formed by individuals in patch g are distributed across three situations: The idea of "home pools" is new versus [1] , and allows us to address issue 3 by introducing situation 3. Thus in [1] , all mixing was implicitly modelled using "travel pools", and individuals described as "non-mobile" reflected situation 2. In the context of reduced mobility, we do not assume that rows of B gg sum to 1. The "missing" proportion 1 − g B gg is then taken to represent non-mobile individuals, who do not form any mobility-related contacts (situations 1 and 2) that day. In § A.3 we discuss some details about generating a mobility matrix B gg with these properties, based on mobile phone data.
To calculate C gag a y using these assumptions, we begin by considering the travel pool in patch g * . The effective number of individuals from population P ga who are present in the pool is given by: 6 P g * gay = (1 − h y ) B gg * P ga (11) There is no distinction between situations 1 and 2 in Eq. (11) , as both are already reflected in the off-diagonal and diagonal elements of B gg , respectively. If we assume that mixing by residence patch g within the pool is random, we need only consider age mixing within the pool. Under completely random mixing and with 1 contact per person, the total number of contacts formed between P g * gay and P g * g a y is given by the outer product: X g * r gag a y = P g * gay P g * g a y g a P g * g a y (12) where the first term represents the absolute population size of "self", and the second term represents proportions of their contacts among "other" strata. Then, the numbers and patterns of contacts by age can be applied via multiplication:  6 If residents of different patches might have relatively different numbers of contacts, a scaling factor could be applied here.
since X g * r gag a y is proportional to the population age distribution of "others", and will therefore act to reverse Eq. (8) as planned. The term a 1 A a a 1 is from Eq. (10) , representing the widths of the age groups a . It is necessary to divide by the widths of age groups a since both X g * r gag a y and C ub aa y are proportional to these widths, but the proportionality should only be singular overall. We could have applied this normalization to C ub aa y in Eq. (10) in the same way as for a , but this would make C ub aa y harder to interpret, as it would no longer represent the expected numbers of contacts for each age group.
Mixing within home pools (situation 3) can be modelled similar to mixing within travel pools, with one small modification: replacing (1 − h y ) B gg with h y δ gg . Following through Eqs. (11)(12) , we obtain X h gag a y , representing the total contacts formed within home pools. Then, the total type y contacts formed between populations P ga and P g a across all relevant mixing pools is given by the sum: It may be tempting to simplify the model for home pool contacts by updating the mobility matrix B gg similar to Eq. (4) from [1] , with h y = (1 − ρ a ) . However, such an approach does not produce the same result as Eq. (14) , and indeed underpins issue 3 described in § 2.1 regarding mixing of nonmobile individuals with mobile visitors to their patch. On the other hand, if the interpretation of "non-mobile" is intended to allow mixing with visitors, then B gg can still be adjusted per Eq. (4) to simulate this behaviour. Another implication of our approach is that non-mobile individuals will not form mobility-related contacts. Thus, if g B gg is reduced, the total contacts formed by residents of patch g would be reduced proportionately, and changes to mixing patterns by age and patch reflected automatically. Finally, the number of type y contacts formed per person in population P ga with population P g a can be obtained by dividing X gag a y by the population size: C gag a y = X gag a y P ga

Example
We applied the proposed methodology for generating a mixing matrix C gag a y , which reflects patterns of age mixing, recurrent mobility between patches, and different contact types, to the population (14 million) of Ontario, Canada, in the context of covid-19 transmission modelling. Ten patches were defined based on groupings of the 513 forward sortation areas (FSAs) 7 in Ontario. The FSA groupings reflect deciles of cumulative covid-19 cases, excluding cases among residents of longterm are homes, between 15 January 2020 and 28 March 2021 [11] . Thus each patch represents approximately 10% of the Ontario population (37-68 FSAs), but not contiguous geographic regions. Such definitions were used to support allocation and prioritization of covid-19 vaccines to "hot spot" neighbourhoods in Ontario [12,13] . Figure 2 illustrates the locations of the FSAs and their decile rank, which is synonymous with their patch index. Figure    groups as needed. We obtained the final output contact matrices C aa y for Canada from [9] , for each of the "home", "work", "school", and "others" contact types, as well as the population size of each 5-year age group used in [9] . 9 We assumed that residence patch did not influence the numbers of contacts formed per person, only with whom those contacts are formed, although such a belief could be incorporated in the model, perhaps in Eq. (13) .
The mobility matrix B gg between patches was derived using private data on geolocation service usage among a sample of approximately 2% of mobile devices in Ontario [14] during January-December 2020. Appendix A.3 details the specific methods and assumptions used; to summarize: Each devices was assigned an approximate home location (152.9 m × 152.4 m) based on the most common location during overnight hours for each calendar month. This location was then used to determine the home FSA ( n ). The proportion of time spent outside the home location each day, stratified by inside vs outside the home FSA, was also used to estimate the relative proportions of intra-vs inter-FSA mobility. Finally, the total numbers of visits to other FSAs ( n ) by all devices were used to estimate Fig. 4. Intermediate results in obtaining unweighted and balanced age contact matrices C ub aa y (expected number of type y contacts per person per day in each age group a , with those other age groups a ) from population-weighted matrices C aa y from [9] which may not satisfy contact balancing. Contact matrices for Canada, derived from [9] ; colour scales are square-root transformed to improve perception of smaller values. the conditional probability of travelling from FSA n to FSA n , given that an individual will travel outside the home FSA n .
The contribution of each FSA to overall mobility of the patch/decile (group of FSAs) was then aggregated as: where S g is the set of FSAs ( n ) corresponding to patch/decile g. Mobility matrices were estimated for each month in the available dataset (Jan-Dec 2020). A reference period reflecting pre-pandemic conditions was defined as Jan-Feb 2020; unless otherwise specified, all subsequent results use the average mobility patterns during that period ( Figure 3 ). We did not model any differences in mobility  Figure 4 . Contact matrices for Canada, derived from [9] ; colour scales are square-root transformed to improve perception of smaller values. by age group, although such differences could be included in the model by adding a relative rate in Eq. (11) .
Finally, we specified the proportions of each contact type assumed to be formed with the home pool: h y = { home : 1 , work : 0 , school : 0 , others : 0 } The parameters P ga , C aa y , B gg , and h y represent the necessary inputs to our approach for calculating C gag a y . The following § 3.2 walks step-wise through the approach and presents all major intermediate results. Fig. 6. Intermediate results in obtaining age-restratified contact matrices C ub a * a * y (expected number of type y contacts per person per day in each age group a * , with those from age groups a * ) from matrices C ub a 5 a 5 y with 5-year age stratifications a 5 . Contact matrices for Canada, derived from [9] ; colour scales are square-root transformed to improve perception of smaller values; the horizontal streaks in (c) corresponding to age groups 0-11 and 16-39 are expected, as more contacts will be formed with larger age groups. Figure 4 illustrates the contact matrices C aa y from Prem et al. [9] , before and after the steps of unweighting by population age distributions, Eq. (8) , and ensuring contact balancing, Eq. (9) . Figure 5 illustrates the differences in contact matrices between each step. These differences can be explained as follows. The Canadian age distribution used by Prem et al. [9] (Figure A.11 black dashed line), is below the mean for the youngest and oldest age groups; thus inverting the weighting by this age distribution increases the contacts expected with these age groups ( Figure 5 a). By contrast, Figure 5 b is purely symmetric (and opposite about the central diagonal), reflecting differences from the symmetric mean matrix. Fig. 7. Total contacts per person per day C ay = a C aa y for each intermediate step in obtaining C ub a * a * y , stratified by contact type. Contact matrices for Canada, derived from [9] ; modelled contacts for each age group are plotted at the midpoint of the age group; the cut points for the original age groups a 5 from [9] and the target age groups a * in our application are indicated on the bottom and top x-axes, respectively. Figure 6 illustrates the unweighted and balanced contact matrices C ub aa y before and after bilinear interpolation and aggregation to the target age groups of interest, Eq. (10) . The final matrices C ub a * a * y include dominant horizontal streaks corresponding to larger age groups. These streaks are expected, as more contacts are expected to form with larger "other" age groups. Vertical streaks do not appear, as each column represents the expected contacts for each person in the "self" age group, not the total contacts formed by that age group. 10 Figure 7 plots the total expected contacts per person per day, C ay = a C aa y , before and after each of the above steps, from before Eq. (8) through after Eq. (10) . Overall, patterns remained roughly consistent across transformations, although some details among the large 16-39 age group are lost due to substantial averaging.

Results
Finally, Figure 8 illustrates the margins (sum over "other" strata and population-weighted average over "self" strata) of the complete mixing matrices C gag a y , in terms of age groups a & a ( Figure 8 a), and patches/deciles g & g ( Figure 8 b). Such margins are computed as follows: The equivalent matrices for total number of contacts per person of all types ( C aa and C gg ) are also given in Figures 8 c and 8 d, respectively. The marginal matrices C aa y are identical to the input age mixing matrices from Eq. (10) , which could be used as an implementation check. Since h y = 1 for "home" contacts, C gg y is an identity matrix. The equivalent matrices for "work", "school", and "others" contact types also feature a strong diagonal, due to a strong diagonal in the source mobility matrix B gg (individuals who are mobile within their home FSA). However, the off-diagonal elements are less clustered towards the central diagonal versus the mobility matrix B gg ( Figure 3 ). Mobile individuals from patches g and g may form contacts not only when either travels to the others' patch, but also when they both travel to a third patch. Thus, the degree of mixing between patches using this approach is greater than the mobility matrix alone would suggest, though less than if completely random mixing was assumed. Fig. 8. Expected contacts per person per day, stratified by age, decile (patch), and contact type, computed as the margins of the overall contact matrices C gag a y . Contact matrices for Canada, derived from [9] ; colour scales are square-root transformed to improve perception of smaller values.

Conclusion
Arenas et al. [1] develop an approach to modelling contact patterns associated with recurrent mobility, which is relevant to dynamic models of infectious disease transmission, such as SARS-CoV-2. The original approach simulates contact patterns between age groups and geographic patches connected by recurrently mobile individuals, and considers changes to mixing between patches due to reduced mobility among some individuals. In this paper, we proposed approaches to improve upon the approach to: ensure contact balancing between age groups; model changes to age contact patterns in response to reduced mobility; and allow complete isolation of non-mobile individuals from mobilityrelated contacts.
The first key change in the proposed approach draws on [6] to combine preferential patterns of age mixing with the age distribution of the mixing population, such that the actual number of contacts formed reflects both elements. This change is incorporated into each separate mixing "pool" where contacts are formed, and ensures that the number of contacts simulated from age group a to age group a will equal those from age group a to age group a . This revised approach can also ensure contacts balance if population sizes change over time, such as in the case of large differential mortality by age group or patch; or if the numbers of contacts formed per-person differ by patch, such as if individuals in some patches have higher numbers of work contacts.
The second key change in the proposed approach is to maintain separate mixing patterns for each type of contact, only aggregating the contribution of different contact types to overall transmission within the force of infection equation. With this change, the age mixing patterns associated with any contact type are not influenced by changes to the numbers or mixing patterns of any other contact type. This change also supports differential probability of transmission by contact type.
The final key change in the proposed approach is to introduce two separate mixing pools where contacts can form. Within "home" pools, contacts can only be formed with other residents of the same patch. Within "travel" pools, contacts cab be formed with other residents of the same patch who are mobile within their residence patch, or with any mobile visitors to the patch. Home pools therefore allow true isolation of some individuals from mobility-related contacts, with implications for overall network connectivity.
In developing and applying the proposed approach to an example context, we present the methodological details and results of each intermediate step, so that they may be reproduced or built upon in future work.

Contributions
JK conceived of and developed the approach, reviewed the literature, conducted the mixing and mobility analyses, and wrote the code and manuscript. AG conducted and led data analyses and developed the mobility metrics. HM provided critical input into the assumptions and bias adjustments with the mobility data. MH, KB, and SM contributed critical insights to the assumpitons, data, approach, and interpretation. All authors were involved in intepretation and manuscript revisions.