The effects of local homogeneity assumptions in metapopulation models of infectious disease

Computational models of infectious disease can be broadly categorized into two types: individual-based (agent-based) or compartmental models. While there are clear conceptual distinctions between these methodologies, a fair comparison of the approaches is difficult to achieve. Here, we carry out such a comparison by building a set of compartmental metapopulation models from an agent-based representation of a real population. By adjusting the compartmental model to approximately match the dynamics of the agent-based model, we identify two key qualitative properties of the individual-based dynamics which are lost upon aggregation into metapopulations. These are (i) the local depletion of susceptibility to infection and (ii) decoupling of different regional groups due to correlation between commuting behaviours and contact rates. The first of these effects is a general consequence of aggregating small, closely connected groups (i.e. families) into larger homogeneous metapopulations. The second can be interpreted as a consequence of aggregating two distinct types of individuals: school children, who travel short distances but have many potentially infectious contacts, and adults, who travel further but tend to have fewer contacts capable of transmitting infection. Our results could be generalized to other types of correlations between the characteristics of individuals and the behaviours that distinguish them.


Introduction
Computational models of infectious disease dynamics come in two general forms: those that explicitly model each member of the population, and those that treat the population as a statistical ensemble of indistinguishable entities. The most well-established techniques are of the latter type, and compute the dynamics of susceptible and infected populations using compartmental models that specify timedependent infection rates based on the current numbers of susceptible and infected individuals in the population. A well-known model of this type is the SIR model of Kermack & McKendrick [1]. Such models are advantageous due to their simple formulation and small parameter space, which facilitate the explicit evaluation of the assumptions implicit in the rate equations and make fitting the model to data straightforward for cases in which those assumptions are approximately valid.
There are two primary critiques of these compartmental approaches. The first relates to the assumptions made about the type of dynamics, based on average rate equations, which can be interpreted explicitly in stochastic implementations of compartmental models that effectively express deterministic systems of ordinary differential equations as discrete counting processes. Stochastic implementations offer the capacity for increased realism when the number of infected individuals is small, as it is during the initial growth and final extinction phases of epidemics. These formulations assume exponentially distributed dwell times within compartments (i.e. recovery occurs at a constant average rate). In the SIR model, this assumption produces exponentially distributed generation intervals (with mean T g ), which defines the relationship between the basic reproductive number R 0 (the average number of secondary infections produced by a primary case in a completely susceptible population), and the initial exponential growth rate (r) of the infected population [2]. The critique is that there is a universal lack of empirical evidence for these types of dynamics in documented processes of disease transmission, with real generation interval distributions differing markedly from the exponential form. That is, the individual-level disease natural history implied by the SIR model is not justified based on any empirical evidence. More generally, observed dwell times in various disease compartments (i.e. incubation period distributions, or periods of symptom expression) are not exponentially distributed in epidemics of real-world pathogens [3][4][5]. To address these issues, extensions of the SIR model can provide more realistic representations of generation interval distributions by splitting the infected state into multiple compartments, each with exponentially distributed dwell times (see, e.g. [6,7]). While such extended compartmental models offer the capacity for increased realism, they also contain more parameters that must be calibrated and constrained.
A second common critique is that compartmental models are limited in their capacity to account for heterogeneity in population properties and structure. The compartmental approach assumes each individual in a compartment has identical disease susceptibility, infectiousness and contact frequency with others. Furthermore, in its simplest form, the approach assumes that each individual has a uniform probability of encountering any other individual in the population, an assumption which neglects social behaviours that are known to be important in the spread of infectious disease (i.e. recurrent mobility patterns and social clustering in households or schools). This critique regarding heterogeneity is less fundamental because it can be addressed with implementations that introduce added layers of complexity (additional compartments), and modulate the interaction rates between them to approximate population heterogeneity on a chosen scale.
The most direct approach to addressing both of the above critiques is to simulate the disease natural history and population heterogeneity explicitly with agent-based models (ABMs, also referred to frequently as individual-based models) in which each individual of the population is treated as a discrete, unique entity. While ABMs can, in principle, account for all known heterogeneity in population structure and behaviour, in practice only a small subset of known factors can be included in any given framework. This limitation is due both to the high computational burden of large-scale agent-based simulations, and also to the combinatoric explosion of parameter combinations that occurs as model complexity increases. The choice of which factors to include is typically constrained by the available data, required for model specification. Due to data constraints, it is difficult or impossible to faithfully incorporate all aspects of population heterogeneity that could be important for disease dynamics. On the other hand, the more factors that are included in such models, the more difficult it is to establish which of them are responsible for any observed departures from the dynamics of compartmental models.
Over the last several decades, there has been a sustained effort in the infectious disease modelling community to understand the role of population heterogeneity. This has led to hybrid models that royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211919 capture some aspects of population structure explicitly while treating others with homogeneous approximations. For example, the popular metapopulation approach treats discrete subpopulations as homogeneous entities that interact with one-another through coupling equations of various types [8][9][10][11][12]. Another approach, the degree-based mean-field approximation, attempts to incorporate heterogeneous contact patterns into continuum models by making assumptions about the relative timescales of social network dynamics and infectious disease spread [13]. Other work has focused on incorporating nonlinear terms into compartmental frameworks to directly account for the global consequences of heterogeneity on the infectious disease dynamics [14][15][16][17].
While many studies have addressed these questions of heterogeneity, it is rare to find explicit comparisons between different modelling approaches (but see e.g. [15,18]). In this work, we compare the results of a complex ABM with a metapopulation model (MPM) derived from the same population data. By aggregating the individuals into subpopulations, we explicitly remove demographic heterogeneity, the nuances of disease natural history progression, and the effects of social clustering. However, the MPM still accounts for medium-scale recurrent mobility patterns, and the basic transmission characteristics of the modelled pathogen (i.e. the basic reproductive ratio R 0 , and mean generation interval). After comparing the results of the ABM with both stochastic and deterministic implementations of the MPM, we explore some generic approaches of correcting for observed discrepancies and discuss the results in the context of which factors, included in the ABM, are responsible for the qualitative differences.
Our results indicate that incorporation of nonlinear terms into transmission equations is helpful, but not sufficient to account for discrepancies related to coupling strength (contact rates) between different subpopulations. However, when we systematically decouple the subpopulations of the MPM by decreasing the rate of contact between them, we are able to recover important qualitative aspects of the ABM dynamics. In this case, our results indicate that behavioural heterogeneity manifests both as a weakening of transmission strength with depletion of the susceptible population, as well as a reduction in coupling strength between communities. This is likely due to the different travel behaviours of children and adults. Children, who tend to have higher susceptibility, infectiousness and contact rates, contribute substantially less to long-range population fluxes associated with commuting behaviour. For cases in which individual properties correlate with different types of behavioural tendencies, heterogeneity will likely alter both the dynamics within local regions and the coupling between regions.

Agent-based model
In general, ABMs attempt to explicitly capture the salient features of individual behaviour in order to simulate non-equilibrium dynamics on the level of heterogeneous collectives.
The ABM of influenza transmission in Australia used in this work has been described in detail in several previous publications [19,20], so only a brief overview will be given here. The aspects of the modelling framework that are most relevant to the present study are the mid-to long-range mobility network [21] and the heterogeneity between students and adults which constitutes the most significant classification in terms of the individual behaviour captured by the model. Notably, the ABM individually simulates approximately 23 M agents, and effective use of it requires high-performance computing facilities.

Population model
The population model is built from the Australian Census of 2016 (Australian Bureau of Statistics). It incorporates local heterogeneity and connectivity at the level of households, household clusters, neighbourhoods and communities. Workplaces and schools provide settings for mixing outside of the local community context, and give rise to the large-scale connectivity patterns captured by the MPM derived here.
Individuals are assigned to households based on the local distribution of household sizes and compositions. The ages of individuals are assigned based on the composition of their households, which effectively reproduces the age distribution computed independently from the census [19]. This procedure produces a calibrated, parsimonious model of the Australian residential population.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211919 Agents between the ages of 18 and 65 years are assigned to workplace groups of 10 individuals based on commuting data provided in the Census [21]. Children between the ages of 5 and 18 years are assigned to schools based on proximity and capacity (they are randomly assigned to the nearest school with available capacity). School locations and capacities were determined from a dataset provided by the Australian Curriculum Assessment and Reporting Agency as described previously [20].
The dynamics of the ABM are computed in time steps of 12 h, which correspond to the 'nighttime' and 'daytime' sets of contact groups in which disease transmission may occur. During the 'daytime' phase, disease transmission events from interactions within workplaces and schools are computed. Individuals who are not assigned a workplace or school mixing environment (approx. 45% of the total population) do not participate in disease transmission during the 'daytime' simulation phase. During the 'nighttime' phase, interactions within local households, household clusters, neighbourhoods and communities are computed, with interactions confined to the scale of statistical area level two (SA2, the community scale, each containing between 1000 and 10 000 individuals [22]). All individuals are assigned residence locations and participate in local disease transmission during the 'nighttime' phase. The timestep of 12 h was selected due to the difference in behavioural features during 'daytime' and 'nighttime' intervals, and is sufficiently short to produce plausible epidemic dynamics. While a shorter timestep would change the disease dynamics produced by the model for a given set of inputs, the comparison we perform in this work is based on the parameters derived from the observed ABM outputs, not from the ABM inputs governing disease transmission. Therefore, we do not expect the choice of ABM timestep to substantially affect the comparison we present in this work.

Disease model
The disease model used in the ABM is designed to simulate influenza infection within an individual and consists of the following agent states: susceptible, latent (exposed, non-infectious), infected (infectious) and recovered. Additionally, 33% of agents are asymptomatic if infected, with infectiousness reduced by a factor of two relative to symptomatic agents. Symptomatic individuals become symptomatic in the days following the end of their latent period, and this occurs in fixed fractions with 30% becoming symptomatic on day 2 of their infection (the day immediately following the latent period), 50% expressing symptoms on day 3 and the remaining 20% becoming symptomatic on day 4 of their infection. The trajectory of asymptomatic or presymptomatic infectiousness follows a bi-linear model that increases for 12 h following the end of the latent period and subsequently declines until reaching zero after 5.5 days. At the moment of symptom onset, infectiousness increases by a factor of two while following the same bi-linear trajectory. This semi-deterministic progression of contagiousness and symptom onset produces a non-trivial generation interval distribution that cannot be captured exactly in standard compartmental frameworks that assume time-independent transition probabilities. In the ABM, no correlations exist between the descriptive properties of an individual and their tendency to become symptomatic (which is assigned homogeneously at random), so we do not explicitly include symptom expression within the compartmental formulation of disease progression used in our MPM.
In the ABM, disease transmission rates are dependent on both age and mixing context, with contacts between children in the home and classroom environments producing the highest chance of transmission between infected and susceptible individuals. The model of disease progression and the age dependence of transmission strength is described in detail elsewhere (see Cliff et al. [19]).

Scenario
The ABM simulates pandemic influenza in Australia, and initiates the Australian epidemic with importations (exogenous introduction of index cases) within 50 km of international airports. Relative importation rates are determined by the international arrival numbers at each airport, which produces the initial (seeding) conditions of each epidemic simulation which then proceeds through stochastic transmission of disease using binomially distributed random variables that determine the infection status of each agent, at each 12 h step of the simulation. Each scenario terminates when the infected population reaches 0, or the pre-determined time limit is exceeded.

Metapopulation model
The goal in creating our MPM is to preserve the large-scale mobility patterns of the population, while averaging over all local heterogeneity. The basic process is illustrated in figure 1. We do this by systematically building up a transport matrix on the scale of SA2. Each of these n partitions contains approximately 1 × 10 4 , to 3 × 10 5 individuals. In the ABM, individuals either stay within their home SA2 during the day or they commute to school or work in another area. The MPM consists of a matrix representing the mixing patterns between these locations. The first step in creating such a model is to build a bipartite adjacency matrix ϕ describing the proportion of individuals resident in each area i who make regular trips to destination regions j where OD(SA2 i → SA2 j ) is the number of travellers residing in area i who commute between their home in area i and their destination in area j, on a typical day (here, N i is the population of individuals living in area i). This bipartite structure is illustrated on the left-hand side of figure 2. Building the commuter matrix (OD) is a simple process of iterating through each individual in the ABM population, and incrementally increasing the entry of the OD matrix corresponding to that individual's homedestination pair.
To convert the resulting bipartite mobility network into a coupling matrix, we apply standard methods involving simple mass-action expressions. For a given pair of locations, the coupling between them is computed as is the population present in location j during work hours. The mixing matrix F is a weighted, symmetric, unipartite adjacency matrix representing mixing probabilities between individuals residing in each respective location, so that multiplication of F ij by subpopulation numbers from areas i and j gives the number of contacts between those subpopulations per unit time. Several implicit assumptions about human interaction behaviour are encoded into this mass action derived representation of contact patterns. These can be expressed as the assumption of spatial homogeneity on the scale of SA2, and a time-scale separation assumption that assumes averaging of temporal fluctuations in travel patterns.  To convert this model of coupled metapopulations into a full infectious disease model, we first need to define our disease dynamics. For simplicity, generality and approximate correspondence to the ABM, we will use the canonical SIR dynamics which, for a single homogeneously mixing population, are described by in which S is the proportion of susceptible individuals, I is the proportion of infected individuals and R is the proportion of recovered individuals, while β and γ are the transmission and recovery rates, respectively. Strictly speaking, β incorporates also the rate at which contacts capable of transmitting infection occur (in some descriptions, the contact rate is included as a separate parameter). The homogeneous transmission dynamics described by equation (2.4) can be applied to a network of coupled subpopulations by using F to describe the contact rates between them. Computing transmission between communities requires multiplication by infected and susceptible populations in each region, a transmission rate per contact, and a contact rate per unit time (here, these two terms are absorbed into the transmission rate β). The rate of transmission from a single infected individual residing in area j to the susceptible population of area i is computed as in which S i is the proportion of susceptible individuals in subregion i, and N i is the total population of subregion i. This formulation allows us to build several different implementations of MPMs, the results of which can then be compared to those of the ABM. Here, we discuss a deterministic discrete-time implementation, a stochastic discrete-time implementation and a stochastic continuous-time implementation.

Deterministic, discrete time
To compute the dynamics of the deterministic version of this MPM, all that is required is the combination of the transmission and recovery terms, allowing numerical integration of the coupled rate equations. The vector of infected populations I at time t + Δt is computed as where I is the identity matrix of size n (note that we have used I to represent the fraction of infected individuals in equation (2.4), while I is a vector of infected populations in each subregion). The infection numbers in each location depend on the incoming transmissions from all other areas, local transmissions and local recovery of individuals. Since these are continuous dynamics, non-integer SA2 (place of work) [5] SA2 (place of work) [3] SA2 (place of work) [4] bipartite commuter matrix (directed) SA2 (residence) [1] SA2 (residence) [2] contact matrix (undirected) SA2 (residence) [1] SA2 (residence) [2] transmission matrix (bi-directional) royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211919 values of infected populations are possible. While this condition may be acceptable when the infected and susceptible populations are both large, it makes interpretation difficult in the initial stages of the epidemic, and leads to substantial discrepancies in the results of numerical simulations, as we will discuss in the results sections.

Stochastic, discrete time
In the discrete-time stochastic description of the structured SIR dynamics, the force of infection on the subset of individuals living in region i is computed from the mixing terms given in equation (2.2), aggregated over each location's neighbours j, multiplied by the number of ill individuals in each neighbouring region, the transmission rate β and the chosen time interval Δt. This gives the individual-level probability of transitioning from the susceptible to infected state, computed independently for each SA2 subregion i while recovery only depends on the rate γ and the time interval

Stochastic, continuous time
To ensure that the dynamics computed by the discrete-time implementation were sufficiently free of artefacts due to time discretization, we implemented a continuous stochastic version of the model using the Gillespie algorithm. To do so, at each time t the probability of each possible next event was computed, and the next event chosen by inverse-transform sampling of the corresponding cumulative distribution function. The delay δt between each subsequent event was computed in the standard way by sampling from the exponentially distributed inter-event times δt ∼ ρ e −ρ where ρ is the sum over all individual event rates. In the system described here, there are 2n possible events, consisting of either recovery or infection events, in each of the n subregions. By comparing the results of the continuoustime model to those of the discrete-time stochastic model, we determined that a discrete-time step of Δt ≤ 0.1 d was sufficient for convergence of the discrete dynamics. Because we did not identify any significant differences between the results of the continuous-time simulation and those of the stochastic discrete-time model, we do not report the results of the continuous-time model, but have included the source code in the online Zenodo database (DOI:10.5281/zenodo.6486795).

Extensions to the SIR framework
After discussing the differences between the results produced by the ABM and those produced by a comparable MPM, we introduce two extensions to the standard SIR modelling framework. These extensions modify two aspects of the dynamics that are not captured in the MPM framework described above.

The saturation parameter λ
The first of these aspects is the tendency for local depletion of the susceptible population [16]. In tightly connected clusters of individuals such as households or workplaces, disease spread is enhanced by the higher-than-average contact rates between the individuals in the cluster. However, such clustering also produces lower-than-average transmission rates between such groups, and this leads to depletion of the local susceptible population during disease transmission. Mathematically, we attempt to capture such effects in a phenomenological way by introducing a saturation parameter λ > 0 that decelerates transmission as the susceptible population decreases royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211919 which produces a system in which the dynamics are initially identical to those of the base model but transmission strength decreases more quickly as the susceptible population is depleted.

The decoupling parameter σ
In addition to this saturation parameter, we explore the effects of altering the properties of the mixing matrix to enhance local mixing and inhibit inter-regional transmission. By doing so, transmission within regions will accelerate and transmission between regions will slow down. Our rationale for introducing such an effect is that we would like to understand the phenomena produced by correlation between decreased travel behaviour and increased social mixing behaviour of school children, without explicitly simulating the underlying heterogeneity. In the ABM, school children have larger mixing groups (classroom groups, grade groups and school groups), but typically remain local to their home region during the day. To modify our MPM in order to adjust for these effects, we alter the commuter flows described in the OD matrix ϕ, by uniformly moving commuters from offdiagonal components into the diagonal component (those who remain in their local areas). To manipulate the degree to which this occurs, we introduce the parameter σ, which modulates the original OD matrix as follows:f ð2:11Þ so that a fraction σ ∈ [0, 1] of travellers remain within their local regions instead of transiting outside of them. These traveller flows are used to compute the mixing matrix F, which determines the transmission strength between regions. The decoupling parameter σ attenuates the strength of disease transmission between regions by reducing the rate of contact between individuals residing in different locations. While we use a static representation of population mobility, and adjust the traveller volumes directly before computing mixing rates, the extension we propose is directly analogous to dynamic representations that model stochastic travel patterns based on average departure and return rates. For example, in the formulation proposed by Sattenspiel et al. [23], if spatially homogeneous departure and return rates are assumed, then our correction factor σ can be mapped to a global attenuation of departure rates from home locations (s) as where s Ã is the attenuated departure rate from home locations, ρ is the spatially homogeneous return rate from destination regions, and z = (1 − σ) −1 . Here, z alters average home and destination populations by changing the relative rates of departures and arrivals, rather than by adjusting a static snapshot of the travel topology. While it is useful for generality to connect our decoupling parameter σ to the alternate formulation above, we note here that periodic daily travel patterns in which approximately half of the working population's waking hours are simultaneously spent in a destination region and half in a home region are assumed in the ABM. These types of mobility patterns cannot be captured under the time-scale separation assumption used in our MPM or in more explicit steady-state formulations such as the one developed by Sattenspiel et al. [23].
In order to develop a first approximation of the decoupling due to time-use assumptions of the ABM, we first note that the 'nighttime' phase of the ABM corresponds to a travel topology with σ night = 1.0 (interactions occur only within local regions), while the 'daytime' phase corresponds to σ day = 0 (interactions occur in destination regions). Therefore, it is plausible that a correction of σ = 0.5 could account for this model of time use, as a time-average of the periodically varying transmission dynamics of the ABM. However, the ABM additionally assumes that individuals who are not assigned working groups (45% of the total population) do not contribute to disease transmission during the daytime phase. Therefore, if h tot = h night + h day is the total number of person-hours accounted for in the ABM per day, where h night = 12N and h day = 0.55 × 12N, then the effective decoupling due to the time-use assumptions of the ABM is which gives a value of σ ABM = 0.645, a substantial departure from the coupling strength assumed by the royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211919 MPM formulation. This first approximation does not account for correlations between mobility range and disease transmission rate, to which we attribute additional decoupling, up to σ ≈ 0.9 (see results). With these two degrees of freedom (σ and λ) added to the base MPM configuration, we can tune the relatively simple MPM model to closely match the dynamics produced by the complex ABM.

Adding a latent period (SEIR)
In addition to the extensions described above, we perform an analysis of model uncertainty with respect to the SIR compartmental formulation. As the ABM includes a latent period of one day between infection and the onset of infectiousness, we explore an alternative formulation that includes an exposed (E) compartment into which newly infected individuals in the MPM reside before stochastically transitioning to the infectious (I) compartment with an average dwell time of 1 day (exponentially distributed). Similar to SIR dynamics, in the canonical SEIR model individuals progress between the compartments S → E → I → R, with the compartment populations following: where the new parameter κ determines the rate at which individuals transition from the latent infected (E) state, to the infectious (I) state. While residing in the exposed compartment, infected individuals cannot transmit the contagion to others, neither can they become infected. Here, we employ a stochastic, discrete-time implementation of the SEIR model in order to assess the robustness of our results to the choice of SIR compartmental dyanamics in our MPM formulation.

Matching the models
There are several fundamental differences between the ABM and the MPM. These relate to the details of disease natural history (i.e. time-dependent recovery probabilities in the ABM), heterogeneity of individual behaviours which determine structured mixing patterns within SA2 regions (which are considered as homogeneous populations in the metapopulation framework), and social clustering across spatial regions due to habitual contact patterns between travellers (i.e. the interactions of coworkers).
In order to approach the question of how these fundamental factors affect the results of simulations, we first calibrate the two frameworks to match important quantities that must be conserved across models to facilitate a fair comparison. Despite the different levels of model complexity, we establish fairness by matching the initial conditions, basic reproductive ratios R 0 and global exponential growth rate of incidence trajectories. Importantly, this choice of compartmental framework and method of matching the models produces differently shaped generation interval distributions, but only slightly different mean generation intervals.

The basic reproductive number R 0 , generation interval and initial growth rate r (SIR)
An essential quantity describing any model of contagion spread is the basic reproductive number R 0 , the average number of secondary cases produced by a typical primary infection (index case), in a completely susceptible population. For the ABM, this quantity is difficult to compute analytically due to the complex nature of the population structure and disease natural history model. However, it is straightforward to estimate R 0 from simulations. This allows calibration with respect to the single independent parameter κ, the transmission multiplier, in order to approximate a set value of R 0 . We used this technique to calibrate the ABM so that R 0 = 2.23, that is, for typical index cases in an entirely susceptible population, there occur 2.23 secondary infections on average. The calibration procedure is described in detail in our previous paper, and takes into account the age-structured distribution of primary infections [24]. In the ABM, the duration of illness is 5.5 days, with a constant one-day latency period applied to all infections. The resulting generation interval, T g , the average time for the infection to propagate between infectious and susceptible individuals, was computed as T g ≈ 3.4 days (the ensemble of index cases and transmission events used to generate our estimate of R 0 was also used to estimate generation interval). In SIR dynamics, T g is related to the reproductive ratio and the initial exponential growth rate r as royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211919 An average growth rate r ¼ 0:32 day À1 was estimated by calculating the mean of the incidence from the first 30 simulation days of 30 independent realizations of the ABM, and fitting to an exponential growth function. Using this growth rate, and the estimated value R 0 = 2.23, equation (2.15) gives T g ¼ 3:84 days, which is close to the generation interval estimated from the ABM results. In the SIR dynamics, the inverse generation interval 1/T g is equivalent to the recovery rate γ. In the SIR model, R 0 = β/γ, so initializing the SIR MPM with g ¼ 0:26 d À1 and b ¼ 0:58 d À1 should give an initial dynamics approximately equivalent to those of the ABM (note that because the matrix Φ is row-stochastic, the R 0 expression for the MPM is equivalent to that of the homogeneous SIR model). These parameters indeed produce almost identical initial growth dynamics to the average over ABM trajectories, in both deterministic and stochastic MPM implementations, as shown in figure 3.

The basic reproductive number R 0 , generation interval and initial growth rate r (SEIR)
In the case of the SEIR model extension, the process for calibrating the initial conditions is the same as for the SIR model, except that the analytic formulation of the generation interval differs due to the inclusion of the additional exposed compartment (see, e.g. [25]). For the same R 0 = 2.23, initial growth rate r = 0.32 and an average latent (exposed) period of 1=k ¼ 1 day, the SEIR formulation produces a transition rate from the infectious (I) to recovered (R) compartments of for an infectious period of 1=g ¼ 2:15 days and an average generation interval of the sum of the average latent and infectious periods (recalling that κ, the rate of transition from exposed to infectious, is set to 1 d −1 to match the latency period of the ABM). We note that the SIR and SEIR formulations both produce generation intervals that are near that of the ABM, with the SIR and SEIR models over-and underestimating the generation interval of the ABM (respectively). While either model could provide a fair comparison for the purposes of this study, we apply the SIR formulation to the main analysis, including the SEIR extension as a test of the robustness of our comparison to the choice of compartmental framework.

. Initial conditions
The results of infectious disease models in heterogeneous populations can be sensitive to the distribution of index cases. We took the following steps to ensure a fair comparison between models. In the ABM, the international pandemic scenario corresponds to a steady influx of index cases near international airports. Specifically, each international arrival produces a 0.004% chance of a new index case occurring within each member of the susceptible population residing in the SA2 regions within 50 km of the corresponding airport. This incoming force of infection is continuously applied throughout the simulation. Mimicking these conditions in the metapopulation framework is straightforward, because the seeding procedure is carried out on the same scale (SA2) in the ABM. In the discrete-time stochastic implementation, the procedure is identical to that of the ABM, while in the deterministic implementation the incoming force of infection is interpreted as a local rate of infection, applied in addition to the transmission terms described by equation (2.9). In the continuous-time stochastic implementation, the seeding rates are added to the transmission and recovery rates when computing the global event rate, and appended to the infection rate terms for each of the affected SA2 regions when computing the next-event CDF. After matching the initial conditions and growth dynamics, the results of the ABM can be compared to those of the MPM.

Censoring of case counts
To facilitate a fair comparison of incidence trajectories, we note that the one-day latency period and stochastic symptom expression in the ABM produces right-censoring of case counts (counted at symptom onset) with an observed incidence of I obs ðtÞ ¼ 0:2Iðt À 4Þ þ 0:5Iðt À 3Þ þ 0:3Iðt À 2Þ, ð2:18Þ which we apply as a sliding observation kernel to the raw infection incidence counts produced by the MPM in order to compare the trajectories produced by the models. We additionally apply a global attenuation of 0.67 (the symptomatic fraction) to all MPM infection incidence counts.

Initial comparison of metapopulation and agent-based models
Without correcting for population heterogeneity, the MPM calibrated to the same initial growth rate and initial conditions as the ABM overestimates peak prevalence and underestimates epidemic duration. This general finding is true for both the deterministic and stochastic implementations of the MPM. Of the two implementations, the stochastic MPM is qualitatively more consistent with the ABM (figure 4).

Comparison of extended metapopulation models and agent-based model
To account for the effects of heterogeneity, equations (2.10) and equation (2.11) introduce the correction factors λ and σ, respectively. Varying the transmission attenuation parameter λ decreases peak incidence without altering the initial growth rate, as shown in figure 5a. On the other hand, altering the inter-region mixing parameter σ marginally reduces peak incidence and slightly increases peak timing, while substantially broadening the incidence curve and, for large enough values (σ ≈ 0.9), produces observable bi-modality, a feature of the ABM produced by weak connections between urban and regional areas, and discussed at length in our previous work (figure 5b) [20]. By tuning the attenuation parameter λ and the inter-region decoupling parameter σ, the mean incidence trajectories of the MPM can be made to closely match those of the ABM ( figure 6). For attenuation λ = 0.6 and inter-region mixing reduction of σ = 0.9, the growth rate, peak incidence (and its run-to-run standard deviation), and the bimodal character of the ABM are reproduced in the MPM implementation. The inter-region mixing parameter σ produces a delay in the peak incidence. This is likely due to the limitation of our homogeneous decoupling procedure that applies uniform relative alteration in diagonal versus off-diagonal travel matrix elements, over all regions (equation (2.11)). That is, the decoupling procedure used in this work applies the same function between tightly coupled urban regions as it does to weakly coupled rural-rural, urban-rural and inter-city connections. In the ABM, bimodality of incidence trajectories results from heterogeneous decoupling in which intra-urban locations remain tightly coupled. As these regions are responsible for the first royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211919 incidence peak (see [20]), decoupling them in the MPM shifts the global peak to later time points. This global peak shift limits the degree of decoupling that can be applied while still producing a satisfactory match to the ABM incidence trajectories.
Finally, to investigate the qualitative correspondence between the dynamics produced by the ABM and adjusted MPM, we visualized the spatio-temporal distribution of illness prevalence as computed by both models. Examples are shown in figure 7. In both models, outbreaks begin in densely populated urban areas, where international arrivals are concentrated and epidemic seeding is more likely. This leads to an initial peak concentrated in the larger urban centres, followed by a later second peak corresponding to urban regions and smaller urban areas with lower levels of international air traffic. As expected based on the correspondence shown in figure 6, the dynamics of the ABM are born out in the adjusted MPM.  Figure 6. Log-scaled (a) and linear-scaled (b) incidence curves comparing the ABM with the extended stochastic MPM after finetuning the saturation parameter λ and inter-community decoupling parameter σ to qualitatively match the ABM results. Initial growth rates and peak incidence are very closely matched between the models. Peak timing is slightly over-estimated by the adjusted MPM.  (c,d ). The selected snapshots from days 50 and 80 of each simulation show the average distribution of illness during the first prevalence peak (a,c), and the second peak (b,d ), respectively. The MPM results in (c) and (d ) were generated using the tuned parameters σ = 0.9, λ = 0.6. Colours correspond to natural log prevalence ( proportion currently ill at time t) scaled relative to the largest prevalence value recorded over all SA2 regions. In (c,d ), the raw prevalence values from simulations shown in the plots of prevalence versus time were adjusted by 0.67 (the symptomatic proportion in the ABM) for comparison to the ABM prevalence plots shown in (a,b), however, no observation kernel was applied to prevalence counts. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211919 identical initial incidence dynamics to those of the SIR formulation (figure 8a), and very similar peak incidence, with a slightly later peak and slightly faster incidence decay during the epidemic tail (figure 8b). Overall, this analysis demonstrates that our results regarding the divergence of the compartmental MPM and ABM dynamics is likely to be robust unless the heterogeneity factors we correct for in our extended formulations are taken into account directly.

Discussion
A comparison between an agent-based and a structured metapopulation stochastic model, using a pandemic event in Italy simulated within the GLobal Epidemic and Mobility (GLEaM) framework [18], pointed out that the MPM consistently yielded a larger incidence than the ABM. This bias was explained by the assumption of homogeneity used in the MPM, as well as different structures in the intra-population contact patterns captured by the respective approaches. For example, the largest discrepancy between the two models was reported for the 60 þ age class-a 'class with the most marked difference in household structure and workplace habits that cannot be taken into consideration in the metapopulation level' [18]. Our comparative analysis also indicated that aggregation of heterogeneity in household composition and social mixing behaviour produced a similar result. These discrepancies were reported to increase with higher values of the basic reproductive number R 0 [18],  Figure 8. Comparison of SIR and SEIR dynamics, using an average latency period of one day. Note that for direct comparison between the SIR and SEIR outputs, no observation kernel has been applied to the incidence numbers in this figure. Here, the incidence numbers for SIR and SEIR formulations correspond to the number of individuals entering the infectious (I) compartment.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 211919 and hence, this tendency forms an elevated concern in modelling more infectious diseases, such as COVID-19. Indeed, pandemic spread of SARS-CoV-2, and its more infectious Delta (B.1.617.2) variant, was often driven by structural factors, especially in areas characterized by socioeconomic disadvantage profiles, with high-density housing, multi-generational and shared households, higher concentrations of essential workers [26], increasing the risks of exposure within overcrowded living spaces [27], and disproportionately affecting rural and urban areas [28]. ABMs manifest both behavioural and mechanical adequacy of the disease transmission mechanism [29,30]. While 'behavioural adequacy' ensures that simulated epidemic patterns agree with the observations, 'mechanical adequacy' guarantees a concordant natural history of the disease within an individual, representing stages of the pathogenesis from exposure and incubation to the infectivity peak and then to recovery or death. In a large-scale ABM, each infected agent follows a profile which is stochastically generated from an explicitly defined distribution, thus intrinsically representing diversity of possible histories. When the natural history of disease exhibits non-trivial dependencies (e.g. with respect to pre-symptomatic or asymptomatic infectivity), an aggregation of individual agent profiles within an MPM is likely to average out nuances of the time-dependent infectiousness profile. Aggregation of these individual-level factors impedes the design of intricate targeted interventions based on nuanced timing considerations (e.g. the isolation of infectious contacts before they can infect others).
In general, one may need to consider a balance between (i) increased realism of ABMs, achieved at the cost of a more laborious calibration of internal parameters [30,31] and a detailed reconstruction of mobility patterns [21], and (ii) coarse-grained compartmental approximations of the overall epidemic dynamics, attained with substantially reduced design, calibration and computational costs. To reiterate, there is much promise for hybrid models which enrich homogeneous approximations with some aspects of population heterogeneity [32,33], or that use individual-based models to define sensitive initial conditions [34].
However, modelling targeted interventions remains a challenge even for these hybrid approaches, since the intensity of interactions within mixing groups dynamically changes in response to various spatio-temporal constraints (social distancing, vaccinations, school closures etc.). These fine-grained changes may generate local and transient effects that can be lost during an aggregation, complicating modelling of intervention efforts focused on super-spreading events [35], epidemic risk assessment for vulnerable communities [36], and estimation of age-dependent hospitalization and fatality rates [37,38]. This work demonstrated that by introducing two global correction factors, for a total of four free parameters (β, γ, σ, λ), a simple MPM can be tuned to closely approximate the dynamics observed from a complex ABM tailored to simulate pandemic influenza H1N1 in the Australian population. The base MPM was derived from a detailed accounting of mobility between regions, to match the mixing patterns captured by the ABM. To match the ABM results, the base MPM was corrected to globally attenuate inter-region transmission, and account for local depletion of susceptible populations. The final, adjusted MPM combines a detailed accounting of mobility with a phenomenological model of disease transmission. This approach balances a key advantage of the ABM (detailed mobility patterns), with the capacity for efficient calibration and low computational cost.
In the ABM, the demographic properties of each area influence the transmission strength between pairs of regions, which suggests that a more accurate correction procedure should take into account the proportions of different sub-populations in each region (i.e. children and adults). If these demographic distributions were taken into account explicitly, the phenomenological correction factor σ could be replaced element-wise with a data-derived n × n correction matrix, modifying transmission strength between each pair of regions based on their local demographics. On the other hand, the MPM developed here requires only mobility information and global observations of disease incidence for its formulation, and could be a valuable approach when demographic information (and its relationship to disease spread) is uncertain or unavailable. The MPM formulation here uses 'perfect' mobility data, for a one-to-one match with the ground truth as specified by the ABM. In real outbreaks, mobility patterns are uncertain, but can be inferred from various sources on scales of aggregation appropriate for maintaining the privacy of individuals. Future applications of the model developed here could import near real-time mobility and case data, to account for sensitive initial conditions and population mixing patterns that change dynamically during an epidemic [39,40].
A key limitation to application of the MPM approach demonstrated here to forecasting of epidemics is the late divergence of the trajectories, a known limitation of sub-exponential growth models [41]. The calibrated base MPM closely reproduced the early dynamics of the ABM, but dramatically overestimated peak epidemic prevalence (which translates to peak public health burden). The trajectories and standard deviation bands shown in figure 4c,d suggest that the initial MPM trajectory deviates substantially from the ABM by approximately day 40. While still preceding the peak, this divergence is late relative to the timescale of epidemic growth, with day 40 corresponding to incidence in excess of 10 5 cases per day. Additionally, figure 5 suggests that the initial divergence from exponential growth could be accounted for only with the saturation parameter λ, but that doing so would produce an underestimate of the epidemic duration and attack rate without also adjusting the decoupling parameter σ. Unfortunately, σ alters the global incidence trajectory only after the first epidemic peak and cannot be estimated based on initial growth behaviour. Therefore, it remains unclear how to prospectively calibrate the correction factors σ and λ early in an epidemic, during the exponential growth phase when intervention decisions need to be made. To do so, a clearer understanding of the behavioural phenomena underlying these corrections is required, aided by detailed observations of local epidemic trajectories and synchronization between regions. Other possible future extensions of this work include application of the comparison to different disease models (varying R 0 , generation interval distribution, incubation periods etc.), and to more detailed compartmental behavioural models including aspects such as age structure.
Unlike previous explorations that have introduced similar corrections to traditional models [14,16,17,42], this study used a detailed ABM as ground-truth. This simulated ground-truth offers the advantage that aspects of heterogeneity accounted for by the correction factors are known. The disadvantage of using simulated ground-truth is that other potentially important factors, such as riskdriven behavioural feedback, were not incorporated into the ABM. It is not clear whether the correction factors introduced here would be sufficient to capture the effects of such phenomena. What our work demonstrates clearly is that salient heterogeneity can be accounted for through simple, phenomenological corrections to traditional modelling approaches. As part of a hybrid, multi-scale modelling framework, these types of methods may prove useful in forecasting epidemic dynamics and efficiently providing evidence for or against intervention policies during future epidemics.
Data accessibility. All simulated data, simulation code and processing scripts necessary to reproduce the results reported here (including the MPM source code and inputs produced by the ABM population generator) are freely available and may be found on the associated Zenodo database (doi:10.5281/zenodo.6486795) [43]. The source code and input data used for the agent-based model population generator and epidemic simulations (ACEMod) is available in a separate database (doi:10.5281/zenodo.5773908) [44].