Understanding the impact of mobility on COVID-19 spread: A hybrid gravity-metapopulation model of COVID-19

The outbreak of the severe acute respiratory syndrome coronavirus 2 started in Wuhan, China, towards the end of 2019 and spread worldwide. The rapid spread of the disease can be attributed to many factors including its high infectiousness and the high rate of human mobility around the world. Although travel/movement restrictions and other non-pharmaceutical interventions aimed at controlling the disease spread were put in place during the early stages of the pandemic, these interventions did not stop COVID-19 spread. To better understand the impact of human mobility on the spread of COVID-19 between regions, we propose a hybrid gravity-metapopulation model of COVID-19. Our modeling framework has the flexibility of determining mobility between regions based on the distances between the regions or using data from mobile devices. In addition, our model explicitly incorporates time-dependent human mobility into the disease transmission rate, and has the potential to incorporate other factors that affect disease transmission such as facemasks, physical distancing, contact rates, etc. An important feature of this modeling framework is its ability to independently assess the contribution of each factor to disease transmission. Using a Bayesian hierarchical modeling framework, we calibrate our model to the weekly reported cases of COVID-19 in thirteen local health areas in Metro Vancouver, British Columbia (BC), Canada, from July 2020 to January 2021. We consider two main scenarios in our model calibration: using a fixed distance matrix and time-dependent weekly mobility matrices. We found that the distance matrix provides a better fit to the data, whilst the mobility matrices have the ability to explain the variance in transmission between regions. This result shows that the mobility data provides more information in terms of disease transmission than the distances between the regions.

The outbreak of the severe acute respiratory syndrome coronavirus 2 started in Wuhan, China, towards the end of 2019 and spread worldwide. The rapid spread of the disease can be attributed to many factors including its high infectiousness and the high rate of human mobility around the world. Although travel/movement restrictions and other non-pharmaceutical interventions aimed at controlling the disease spread were put in place during the early stages of the pandemic, these interventions did not stop COVID-19 spread. To better understand the impact of human mobility on the spread of COVID-19 between regions, we propose a hybrid gravity-metapopulation model of COVID-19. Our modeling framework has the flexibility of determining mobility between regions based on the distances between the regions or using data from mobile devices. In addition, our model explicitly incorporates time-dependent human mobility into the disease transmission rate, and has the potential to incorporate other factors that affect disease transmission such as facemasks, physical distancing, contact rates, etc. An important feature of this modeling framework is its ability to independently assess the contribution of each factor to disease transmission. Using a Bayesian hierarchical modeling framework, we calibrate our model to the weekly reported cases of COVID-19 in thirteen local health areas in Metro Vancouver, British Columbia (BC), Canada, from July 2020 to January 2021. We consider two main scenarios in our model calibration: using a fixed distance matrix and time-dependent weekly mobility matrices. We found that the distance matrix provides a better fit to the data, whilst the mobility matrices have the ability to explain the variance in transmission between regions. This result shows that the mobility data provides more information in terms of disease transmission than the distances between the regions.

Introduction
The pandemic of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which started in the city of Wuhan, Hubei province, China [1] has since spread all over the world with over 585 million reported cases and 6.4 million reported deaths, as of August 2022 [2]. In human populations, the virus can be transmitted through the inhalation of infectious droplets in aerosols, exposure to infectious respiratory fluids, coughing, sneezing, and having physical contact with an infected individual. It can also be transmitted indirectly when a susceptible individual comes in contact with a contaminated surface, such as door handles or other commonly shared surfaces or objects [3][4][5]. SARS-CoV-2 is the casual agent for the coronavirus disease 2019 (COVID- 19), and is estimated to be more infectious compared to other coronaviruses such as the severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome coronavirus (MERS) [6,7]. The COVID-19 disease was declared a public health emergency by the World Health Organization (WHO) on January 20, 2020 [8] and a pandemic on March 11, 2020 [9]. Due to the fast spread of COVID-19, during the early stages of the pandemic, governments around the implemented non-pharmaceutical interventions (NPIs) such as movement/travel restrictions, wearing of facemasks, closure of schools and businesses, physical distancing, etc. [10][11][12][13][14], to limit the spread of the disease. Although, the implementation of these NPIs helped in slowing down the spread of COVID-19, the disease still continues to spread under these restrictions. In addition, these NPIs have significant social and economic effects around the world [15][16][17], and could not be put in place for too long. The development of safe and effective COVID-19 vaccines brought some relief and were introduced to replace stringent NPIs [18,19]. The first set of COVID-19 vaccines became available towards the end of 2020 [20]. These vaccines provide significant protection against the earlier strains of SARS-CoV-2 virus [21][22][23]. However, the emergence of highly infectious mutant strains such as Omicron variant led to the continuous spread of the disease.
The first case of COVID-19 was reported in Wuhan, China, in December 2019 [24]. On the 12 th of January 2020, the first case of the disease outside of China was confirmed in Thailand [25]. By January 30, 2020, COVID-19 has spread to 18 countries outside of China with a total of 7,818 confirmed cases worldwide [25]. The first confirmed case of COVID-19 in Africa was reported on February 14, 2020 [26,27], in North America, January 21, 2020 [28], and in Europe, January 24, 2020 [29]. COVID-19 has spread more rapidly and widely around the world than previous outbreaks of coronaviruses. This spread can be attributed to globalization, settlement and population characteristics, and high human mobility [30]. Several studies have looked at the effect of human mobility on the spread of COVID-19 [31][32][33]. In Kraemer, Moritz UG, et al [31], real-time human mobility data was used to investigate the role of case importation in the spread of COVID-19 across cities in China. The impact of human mobility network on the onset of COVID-19 in 203 countries was studied in [32]. They used exponential random graph models to analyze country-to-country spread of the disease. Their study suggested that migration and tourism inflow contributed to COVID-19 case importation, and that a mixture of human mobility and geographical factors contribute to the global transmission of COVID-19 from one country to another. Human mobility data collected via mobile devices such as cell phones, smartwatches, e-readers, tablets etc., have also been used to study the spread of COVID-19 [34][35][36][37]. In [34], county-level cell phone mobility data collected over a period of 1 year in the US was used to study the spatio-temporal variation in the relationship between COVID-19 infection and mobility. They found that in the spring 2020, sharp drop in mobility often coincide with decrease in COVID-19 cases in many of the populous counties.
Mathematical models have been used to study the relationship between the spatio-temporal spread of COVID-19 and human mobility [37][38][39][40][41][42][43][44][45]. A city-based epidemic and mobility model together with multi-agent network technology and big data on population migration were used to simulate the spatio-temporal spread of COVID-19 in China [45]. In [46], a stochastic, data-driven metapopulation model was used to study the initial wave of COVID-19 in Belgium, and also to study different re-opening strategies. Their model incorporates the mixing and mobility of different age groups in Belgium. Another stochastic metapopulation model was used to study the spread of COVID-19 in Brazil [43]. This model assumes that epidemics start in highly populated central regions and propagate to the countrysides. For many states, they found strong correlations between the delay in epidemic outbreaks in the countrysides and their distance from central cities. In [47], an SEIR country-wide metapopulation model was used to study the spread of COVID-19 in England and Wales. The model was used to predict the COVID-19 epidemic peak in England and Wales, and also to study the effect of different non-pharmaceutical intervention strategies on the predicted epidemic peaks. Similarly, in [48] a stochastic SIR model was applied to describe the spatio-temporal spread of COVID-19 across 33 provincial regions in China and to also evaluate the effectiveness of various local and national intervention strategies. Their model incorporates an outflow mobility index for all the regions and the proportion of travelers between regions. More discussions on human mobility and COVID-19 transmission can be found in the systematic review article [49]. The relative contribution of mobility data to the observed variance in the COVID-19 transmission rates between regions still remains an unexplored problem.
We develop a hybrid gravity-metapopulation modeling framework for studying the spread of COVID-19 within and between different regions. An important feature of our framework is the ability to determine human mobility based on the distances between the regions or through empirical data such as those collected through mobile devices. In addition, our framework allows for the explicit incorporation of factors that affect disease transmission, such as facemasks, physical distancing, contact rates, etc., into a time-dependent disease transmission rate and the assessment of the contribution of each of these factors to actual disease transmission. As an illustration, we use a Bayesian hierarchical modeling framework to calibrate our model to the weekly reported cases of COVID-19 in the thirteen local health areas (LHAs) of Fraser health authority (Fraser Health), British Columbia (BC), Canada, from July 2020 to January 2021. The study area comprises 1.9 million population in the eastern sections of the Greater Vancouver area. We estimate region-specific scaling parameters for computing baseline disease transmission rates for each region, and a parameter for quantifying the contribution of mobility to disease transmission. In addition, we estimate a time-dependent piece-wise constant scaling parameter to account for the cumulative effect of the remaining factor that affect disease dynamics, which are not explicitly included in our model. We consider two main model structures in our example, which are determined by the mobility matrices used: one with a distance matrix (computed using the distances between the regions, based on the population weighted centroid) and another with time-dependent mobility matrices computed from mobile device data. The results from these two scenarios are used to test the hypothesis of whether the time-dependent mobility matrices, computed from mobile device counts, provide more information about human mobility, with respect to disease transmission between the regions than the distances between the region.

Mathematical model
We develop a hybrid gravity-metapopulation model to study the dynamics of COVID-19, within and between regions. The model stratifies the population of each region into six compartments: susceptible (S), exposed (E), pre-symptomatic infectious (P), symptomatic infectious (I 1 and I 2 ), and recovered (R). Individuals in the pre-symptomatic infectious compartment are infectious (can transmit the disease) but do not show symptoms yet. Similar to [50,51], we divided the infectious compartment into two classes so that the recovery time follows a Gamma distribution rather than an exponential distribution. This way, a symptomatic infectious individual spends the first half of their infectious period in I 1 and the other half in I 2 . We assume that there is no re-infection in our model due to relatively low infections across the study period as the size of the susceptible population is far greater than the size of the recovered population. In addition, we assume that all the individuals infected during our study period will not lose their COVID-19 immunity and be reinfected within this period [52]. Furthermore, asymptomatic cases were not considered as testing guidelines during the study period were symptom-based.
A schematic diagram of the model illustrated for four (4) regions is shown in Fig 1, where the gray circles on the left represent the regions, while the black arrows show the interactions and movements of individuals between the regions. On the right, we have an illustration of the population dynamics in each of the regions, where the subscript j represents the j th region. The black arrows here show the transition of individuals through the different stages of COVID-19 at the rates indicated beside the arrows. The red dashed arrows indicate disease transmission. Observe that there is a red dashed arrow extending from each of the remaining three regions into region j, these arrows account for the contributions of infectious individuals in the three regions to disease transmission in the j th region. The ordinary differential equations (ODEs) for the model are given by (see Fig 1 for where β j � β j (t) is the time-dependent disease transmission rate for region j. We aim to define the transmission rate (β j ) as a function of the different factors that affect disease transmission. This way, we would be able to evaluate the contribution of each of these factors to the overall disease transmission. Therefore, we define β j (t) as b j ðtÞ ¼ exp ðc 0j þ c 1 m j ðtÞ þ gðtÞÞ: ð2Þ . Model compartments are defined as follows: Susceptible (S j ); exposed (E j ); pre-symptomatic infectious (P j ); symptomatic infectious (I 1j and I 2j ); and recovered (R j ) for region j. Our model assumes that there are no re-infections. The black arrows show the movement of individuals from one region to another (left) and the transition of individuals through the different stages of COVID-19 at the rates indicated beside the arrows (right). The red dashed arrows indicate disease transmission (see (1) for more details). https://doi.org/10.1371/journal.pcbi.1011123.g001

PLOS COMPUTATIONAL BIOLOGY
Here, c 0j is the scaling parameter for the baseline disease transmission rate for region j, c 1 is the scaling parameter used to remove biases from the time-series mobility data, and g(t) is a time-dependent piece-wise parameter used to account for other factors that affect disease transmission other than human mobility (e.g. facemask, social distancing, contact rates, etc.), which are not explicitly incorporated into the model. Movement within the j th region is captured by a time-series mobility data represented by m j (t). This data is used as a proxy for the time-dependent contact rate in the region. We have defined our disease transmission rate, β j (t), as an exponential function to ensure that its value remains positive due to the way the time-series mobility data and the function g(t) are to be incorporated into β j (t). This definition will ensure that the estimated model parameters are identifiable (see Bayesian inference section for more details). Based on the definition of β j (t) in (2), e c 0j is the baseline disease transmission rate for region j, while e c 1 m j ðtÞ incorporates the effect of human mobility within the region into the transmission rate. Lastly, e g(t) accounts for the effect of other factors that affect disease transmission, which are not explicitly incorporated into the model, on the disease spread. Although, the formulation in (2) explicitly incorporates only human mobility into the disease transmission rate, this formulation can be extended to include other factors that affect disease transmission such as facemaks, physical distancing, etc. See more details in the Discussion section.
The parameter C j � C j (t) in (1) is used to incorporate infectious interactions within the j th region, and their contribution to disease transmission in the region. In terms of a homogeneous single population model, this parameter would represent the probability of making an infectious contact in the population. Here, C j is defined as where 0 � θ � 1 is a parameter used to measure the effective contribution of human mobility to disease transmission in all the regions. Here, θ = 0 implies that there are no infectious contacts due to mobility as defined by the intra-regional mobility matrix (π), and the regions are essentially uncoupled from each other. On the other hand, θ = 1 means that all the infectious contacts in the system are due to human mobility. The parameterN j is the adjusted population size for region j, which incorporates the changes in the population size of the region due to movements in and out of the region. We defineN j bŷ where M is the total number of regions under consideration and N j is the baseline population size of the j th region. The first term in (3) given by ð1 À yÞðP j þ I 1j þ I 2j Þ=N j accounts for all the infectious contacts made by the residents of region j who are not moving within the region, while the second term, ðy=N j Þ P M i¼1 p ji ðP i þ I 1i þ I 2i Þ, accounts for all the infectious contacts made in region j by the residents of the region who are moving within the region and the visitors from other regions. In (3) and (4), π ji is the probability that an individual who migrated into region j, originated from region i, given that he/she is from one of the other regions under consideration. We compute this probability using two different approaches. The first approach uses the distances between the regions. In this case, π ji is given by

PLOS COMPUTATIONAL BIOLOGY
Where d ij � d ji is the distance from region i to region j, k 2 R þ and M is the total number of regions considered. The second approach used to compute the probability π ji involves using mobile device data. The model parameters, their descriptions, and values are provided in Table 1. The estimated parameters are presented in the Results section.
As an illustration of concept, we consider the thirteen (13) local health areas (LHA) of Fraser health authority, British Columbia (BC), Canada. These regions include the communities of Abbotsford, Agassiz/Harrison, Burnaby, Chilliwack, Delta, Hope, Langley, Mission, Maple Ridge/Pitt Meadows, New Westminster, South Surrey/White Rock, Surrey and Tri-Cities. Fraser health authority (Fraser Health) is the largest of the five regional health areas in BC, with 12 acute care hospitals and providing health care to over 1.9 million people [65]. It has a width of 150 km.  Table C in S1 Text. We use a Bayesian hierarchical modeling framework to calibrate our model to the weekly reported cases of COVID-19 in these 13 LHAs, from July 2020 to January 2021. From the model calibration, we estimate the parameters c 0j , c 1 and g(t), which are used to construct and study the time-dependent disease transmission rate for each region, and to study the dynamics of the time-dependent piece-wise parameter g(t). We also estimate the parameter θ, used to quantify the effect of mobility, both within and between the regions, on disease transmission in the regions.

Data
Human population move between regions for many reasons including work, leisure, family visits, health reasons, e.t.c. The main goal of this work is to develop a mathematical modeling framework for studying and understanding the effect of human mobility on the spread of COVID-19 within and between regions. We consider the period from July 1, 2020 to January 27, 2021, inclusive. Although, movement restriction was imposed in Fraser Health during PLOS COMPUTATIONAL BIOLOGY some part of this period, we used the mobility data collected through mobile device counts as a proxy for quantifying movements between the regions and the contact rate within each region. We quantify mobility between the regions using two approaches. The first approach uses the physical distances between the regions, based on population weighted centroid (left panel of Fig 3) and the formula in (5) to calculate the probability that an individual moving in region j, who came from one of the 13 regions, originated from region i (π ji ). The premise of using physical distance between regions is based on the concept of geographic distance decay, where spatial and social interactions decrease as the distance between regions increases [59]. The distances between our regions of interest are given in the left panel of Fig 3, while probabilities π ji computed from these distances are presented in the right panel.
The diagonal entries of the probability matrix (π) represents the probability that an individual moving in a region is a resident of that region. It is important to note that the probability matrix is not symmetric, even though the distance matrix (left panel of Fig 3) is symmetric. In addition, each row of the probability matrix sums to 1. The second approach used to construct

PLOS COMPUTATIONAL BIOLOGY
the probability matrix (π) is based on mobile device counts and uses Telus mobility (TELUS) data.
TELUS is a Canadian national telecommunications company that has network coverage in 99% of the populated areas of Canada. TELUS Insights provides anonymized geo-intelligence data, which reflects population location and mass movement patterns based on information about locations and population movement of TELUS mobile device users [66]. These data have helped answer a range of questions around location and public mobility patterns within Canada, including in infrastructure planning, health services, roads, and transit routes.
As TELUS subscribers use their mobile devices, they connect to various cellular towers for telecommunication services. These connections are used to determine the users' locations based on the nearest tower that relays signals to their devices. Every Telus user with their mobile network active would be included in the TELUS data, except the subscriber opt-out [67]. This network data provides insights into movement patterns and trends across Canada. To provide a layer of privacy, all the mobility data provided by TELUS are de-identified, aggregated into large data pools, rounded-up to the nearest 10 counts and all results are extrapolated to represent the entire population of a given region. This ensures that the data cannot be traced back to individual TELUS subscribers. The results of the TELUS application programming interface (API) implementation, which provide the numbers of mobile devices moving within and between geographical locations of interest and the neighbourhood that a mobile device resides in, depend on cellular tower locations at the time of the analysis.
We generate the mobility data for each region using a one-day bucket size and 120-minute minimum dwell time. We filtered for "non-residents", "moving residents" and "residents", which represent, respectively, the daily number of mobile devices residing in an LHA and spending at least two hours in another LHA (movement between regions), the daily number of mobile devices residing in an LHA and spending over two hours outside their census track within the LHA (movement within a region), and the total number of mobile devices residing in an LHA. To construct the weekly mobility matrices, we consider the "non-residents" and "moving residents" data. For each region and for a specified time interval (weekly), we PLOS COMPUTATIONAL BIOLOGY compute the number of mobile devices from the other 12 regions that visited the region and stayed there for at least 2 hours during the visit. This gives us the mobile device count for movement into the region (off-diagonal entries). For movement within the regions (diagonal entries), we used the "moving residents" data, from which we computed the number of mobile devices registered to a region and moving within the region. These information are used to construct a mobility matrix of device counts within the specified time interval for each region. We normalize each row of the matrix with the total number of devices in the row. This way, the i th element of the j th row represents the fraction of mobile devices that came into the j th region (from the 13 regions) that originated from the i th region. These fractions can also be interpreted as the probability that an individual moving within the j th region (whom originated from one of the 13 regions) is from the i th region (π ji ). Using this approach we compute the probability/mobility matrices for each week from July 1, 2020 to January 27, 2021. The computed matrices for week 1 (July 1-7, 2020) and 30 (January 21-27, 2021) are shown in Fig 4, while the matrices for the remaining weeks are presented in Figs (B-F) in S1 Text. The distance matrix (right panel of Fig 3) and the constructed mobility matrices are used to describe the interaction between individuals from different regions. We considered two main scenarios in our Bayesian inference based on the distance and mobility matrices and investigated whether the mobility data is more informative, in terms of disease transmission than the distances between the regions.
We also used the Telus mobility data to compute the weekly mobility rate for the movements within each region. To compute these rates, we sum the daily device count in each region for "non-residents" and "moving resident", and divide it by the sum of the "residents" and "non-residents" device count for our entire study period. This gives us the proportion of mobile devices moving in each region with respect to the total number of devices in the region during our entire study period. For each week in our study period, we sum the computed proportion of mobile devices and divide by 7 to get the weekly average proportion of mobile devices moving in each of the regions, as shown in Fig 5. These mobility rates are used as

PLOS COMPUTATIONAL BIOLOGY
proxy for the contact rates in the regions and are represented by m j (t) in the disease transmission rate (β j (t)) defined in (2). We observe from Fig 5 that there is a sharp decline in mobility rate around the first week of September 2020 in most of the regions. Similarly, there is another decline in mobility rate around the first week on November. This decline is associated with the implementation of public health measures in BC. We calibrate our model to the weekly reported cases of COVID-19 in the thirteen local health areas of Fraser Health, BC, obtained from the British Columbia Centre for Disease Control (BCCDC). We extracted these data from a line list generated by BCCDC Public Health Reporting Data Warehouse (PHRDW), based on symptom onset date or reported date where symptoms onset date is not available. The collected case data spans the period from July 2020 to January 2021, inclusive, and was incorporated into the model likelihood based on the computed disease incidence as shown in (6). The collected weekly reported cases of COVID-19 for the 13 regions are shown in Fig A in S1 Text. Similar to [50], our model incidence is computed as the number of individuals in the pre-symptomatic population (P), transitioning to the symptomatic infectious compartment (I 1 ).

Bayesian inference
Our hybrid gravity-metapopulation model (1) is fitted to the COVID-19 cases in all the thirteen regions using a Bayesian hierarchical modeling framework. Bayesian inference is a statistical technique for data analysis and parameter estimation, which is based on the Bayes' theorem. It has been applied to problem in many fields ranging from biology, physics, sport, epidemiology, ecology, and engineering, among others [68][69][70]. A Bayesian hierarchical modeling framework is one where the prior distribution of some of the model parameters depend on other parameters to be estimated. It allows the incorporation and estimation of model parameters at individual and population levels (see [71,72] for more information on Bayesian hierarchical models). We implement our Bayesian inference model with the RStan package in R version 3.6.3 [73]. Stan is a free and open-source probabilistic programming language for statistical inference implemented in C++. It performs Bayesian inference on arbitrary user-defined models through Markov Chain Monte Carlo (MCMC), and can be invoked through other programming languages such as Python, Matlab, Julia and R. [74]. RStan is the R interface to Stan, which provides full Bayesian inference via the No-U-Turn sampler (NUTS), a variant of Hamiltonian Monte Carlo (HMC), approximate Bayesian inference via automatic differentiation variational inference (ADVI), and penalized maximum likelihood estimation via L-BFGS optimization [73].
In our Bayesian inference model, we construct the likelihood for the j th region as cases j ðtÞ � NegBinðincidence j ðtÞÞ; cÞ; ð6Þ where NegBin(�) is the negative binomial distribution, cases j (t) and incidence j (t) are the weekly reported cases of COVID-19 and the incidence computed from the model (1), respectively, for region j. Here, ψ is the over-dispersion parameter. Using Bayesian inference framework implemented in Rstan gives us the flexibility to incorporate our prior knowledge into the model parameters and the ability to evaluate probabilistic statements of the data based on the model. In addition, this framework allows us to incorporate hierarchical structure into the model parameters, with the benefit of understanding variations in the parameters at individual and population levels. It enables us to construct the posterior distribution for the population mean and variance of the model parameters and those of the individual parameters for each region, which are conditioned on the population mean and variance. We have used a negativebinomial distribution to model the weekly reported cases of COVID-19 because of its effectiveness and convenience in modeling nonnegative over-dispersed data. Uninformative priors were implemented in the Bayesian inference framework. We incorporate the time-series mobility data (Fig 5) into our modeling framework using an exponential scaling approach for the disease transmission rate. The disease transmission rate is given by (2), where c 0j � N þ ð0; 1Þ is the scaling parameter for the baseline transmission rate for the j th region (e c 0j is the baseline transmission rate) and c 1 � N þ ð0; 1Þ is the scaling parameter used to remove biases from the time-series mobility data (m j (t)). Here, e c 1 m j ðtÞ models the time-varying effect of m j (t) on the disease transmission rate (β j ) for region j. The time-dependent piece-wise constant parameter g(t) is used to account for other factors that affect disease transmission, which are not explicitly accounted for in the model. This parameter is estimated every four weeks (except for the last interval which has 2 weeks). We also estimated the total prevalence of COVID-19 in all the 13 regions at the beginning of our study period. Similar to [50,54], when building our Bayesian inference modeling framework, we simulated the incidence for our model (1) using known parameters values and then tested the ability of our framework to recover the values. We inspect the resulting posterior distribution for biases and their coverage of the true parameters.
Throughout this article, we used the Variational Bayes (VB) method with the meanfield algorithm implemented in RStan [75,76] for our inferences, from which we estimate the total initial prevalence in all the 13 region and a parameter, θ, used to quantify the effective contribution of mobility to disease transmission in the regions (see the formulations in (3) and (4)). We estimate a fixed value of the parameter g(t) for every four weeks, starting from the beginning of our study period, and for the last two weeks. Thereby making it a time-dependent and piece-wise parameter. To ensure that the estimated parameters are identifiable and that the estimated values of g(t) from the second interval onward are relative to that of the first interval, we set g(t) = 0 for the first four weeks (first sub-interval). In addition, we rescaled the timeseries mobility data using the first week's mobility rate as a reference for the remaining rates. This was done by subtracting the mobility rate for the first week from those of the subsequent weeks. This way, the rescaled mobility rate for the first week is 0, while those for the remaining weeks are centered around 0. We used a Bayesian hierarchical modeling framework to estimate the parameters c 0j and g(t). We construct their population posterior distributions, which are used as priors for estimating the region specific c 0j for j = 1, . . ., 13, and the interval specific g(t), respectively. The remaining parameters of the model are fixed and are as presented in Table 1.
We consider two main scenarios in our model calibration: one with a fixed distance matrix (computed from the distances between the regions, see Fig 3) and another with weekly mobility matrices (computed from Telus mobility data, see Fig 4 and Figs (B-F) in S1 Text). These two matrices are used to quantify mobility between the 13 regions. Performing inference based on these two scenarios enabled us to understand the effect of mobility on the posterior predictive distributions of the model and to determine which of the two mobility quantifiers best recreates the observed case data. It would also help us to identify, which of the two approaches provides more information on human mobility in terms of disease transmission. The two scenarios were ranked by comparing their leave-one-out predictions and standard errors, computed using the leave-one-out cross-validation (LOO) method [77][78][79], and using the widely applicable information criterion (WAIC) method [80,81]. We compared the Variational Bayes (VB) method to the adaptive Hamiltonian Monte Carlo method No-U-Turn sampling. The results from both methods are found to produce comparable estimates of the posterior distribution with significant reduction in total computation time when VB is used [82]. For the case of a fixed distance matrix, the mean and/or median ELBO usually converges in 5, 000-6, 000 iterations of the stochastic gradient ascent algorithm, while it converge in 11, 000-12, 000 iteration for the weekly mobility matrices case (see [76,83,84] for more information about ELBO in the variational Bayes method).

Results
We considered two main scenarios when fitting our model to the weekly reported cases of COVID-19 (see Methods). Results for the two scenarios, for selected regions (Agassiz/Harrison, New Westminster, Maple Ridge/Pitt Meadows and Surrey), are presented in Fig 6. We selected these regions based on their population sizes and geographical locations, to show the diversity in reported cases and population sizes in the regions considered, and the model's ability in predicting cases irrespective of these factors. The results for the remaining regions are presented in Figs G and H in S1 Text.
For each model scenario, we present the posterior predictions of the weekly cases of COVID-19 in each region (columns 1 & 3 of Fig 6). We compute the time-dependent disease transmission rate, β j (t), using the estimated parameters and the formula in (2). These rates are presented in blue for the fixed distance matrix scenario (column 2 of Fig 6) and in gold for the weekly mobility matrices scenario (column 4 of Fig 6), together with the contribution of mobility to the transmission rate, e c 1 m j ðtÞ (green) with 50% credible interval (CrI) (darker bands) and 90% CrI (lighter bands).
We observe from these results that our model is able to capture the trends and reported cases of COVID-19 in each of the regions with a high degree of accuracy for both model scenarios. In addition, we see that there are significant changes in the computed disease transmission rate over time, which has a similar trend for all the regions. Even though there are no much changes in the time-series mobility data, its effect on the disease transmission rate for each region is still noticeable.
The mean estimate for the initial total prevalence in the 13 regions is 47.61 (90% CrI: 44.82 -50.31) for the distance matrix scenario and 50.19 (90% CrI: 47.37-53.04) for the weekly mobility matrix scenario. The mean estimate of the parameter used to quantify the effect of mobility on disease transmission in the regions (θ) for the distance matrix scenario is 0.53 (90% CrI: 0.44-0.60) and 0.90 (90% CrI: 0.72-0.98) for the scenario with weekly mobility matrices. This implies that movement between the regions contribute to a mean fraction of 0.53 and 0.90 of the total reported cases of COVID-19 in the regions for the distance and mobility matrix scenarios, respectively. The scaling parameter used to remove biases in the time-series mobility data (c 1 ) was estimated as 1.51 (90% CrI: 0.90-2.10) for the distance matrix and 2.11 (90% CrI: 1.52-2.69) for the mobility matrix scenario.
We estimated the scaling parameters for the baseline disease transmission rate, c 0j for j = 1, . . ., 13, using Bayesian hierarchical modeling framework. These parameters are used to compute the baseline disease transmission rate for each region defined by e c 0j for j = 1, . . ., 13. The mean estimate for the population mean and variance are 0.45 (90% CrI: 0.35-0.54) and 0.18 (90% CrI: 0.12-0.26), respectively, for the distance matrix and 0.21 (90% CrI: 0.05-0.36) and 0.35 (90% CrI: 0.24-0.47), respectively, for the mobility matrix scenario. The mean estimate for c 0j , for j = 1, . . ., 13 with 90% credible interval (CrI) are presented in Tables A (distance  matrix) and B (mobility matrix) in S1 Text. The estimated distribution for the baseline disease transmission rates for the regions are presented in Fig 7. We observe from the results in this figure that the predicted distributions for the larger and more urbanized regions with dense populations are similar for the distance and mobility matrix scenarios. These regions include Abbotsford, Burnaby, New Westminster, Surrey, and Tri-Cities. On the other hand, the predictions for the less densely populated smaller regions are relatively different for the two scenarios. In addition, the variances in the distributions for the smaller regions are larger than those of the bigger regions with larger populations.
The time-dependent piece-wise parameter, g(t), was also estimated using a Bayesian hierarchical modeling framework with population mean and variance estimate with 90% credible interval given by -0.33 (-0.52, -0.14) and 0.30 (0.16, 0.47), respectively, for the distance matrix scenario, and -0.28 (-0.45, -0.10) and 0.32 (0.19, 0.50), respectively, for the weekly mobility matrix scenario. The mean estimates with 90% credible interval for the interval-specific parameters (g 2 -g 8 ) are given in Tables A (distance matrix scenario) and B (mobility matrix scenario) in S1 Text. It is important to emphasize that we have set g 1 = 0 (week 1-4) to ensure that the model parameters are identifiable, and to estimate g 2 , . . ., g 8 relative to g 1 . The distributions for the time-dependent effect of other factors that affect disease transmission, other than mobility, on the disease transmission rate, are given in Fig 8. We observe that the constructed distributions for the two scenarios agree well.
Lastly, we compare the estimated expected leave-one-out predictions and their standard errors, for the two model scenarios, computed using the leave-one-out cross-validation (LOO) method [77][78][79] and the widely applicable information criterion (WAIC) method [80,81].
The comparison is summarized in Table 2, where the distance matrix scenario is ranked better than the mobility matrix scenario, in terms of their ability to capture the case data. Even  Tables A and B in S1 Text for the estimates of c 0j ). Scenarios: fixed distance matrix (blue) and weekly mobility matrices (gold).
https://doi.org/10.1371/journal.pcbi.1011123.g007 PLOS COMPUTATIONAL BIOLOGY though the distance matrix scenario captures the case data better than the weeekly mobility matrix scenario, the difference in the fits for the two approaches is not much.

Discussion
An important feature of our modeling framework includes the ability to explicitly incorporate factors that affect disease transmission into the transmission rate. This formulation allows us to effectively access the contributions of these factors to disease transmission in our model. In the example presented in this article, due to lack of adequate data, only time-series mobility data was incorporated explicitly into the disease transmission rate. The effect of other factors  (e g(t) ) to the transmission rate (β(t)), computed every four weeks and for the last two weeks: g 1 = 0 (weeks 1-4), g 2 (weeks 5-8), g 3 (weeks 9-12), g 4 (weeks 13-16), g 5 (weeks [17][18][19][20], g 6 (weeks 21-24), g 7 (weeks 25-28), g 8 (weeks 29 & 30). Scenarios: fixed distance matrix (blue) and weekly mobility matrices (gold). The estimated means with 90% credible interval are presented in Tables A and B in S1 Text.
https://doi.org/10.1371/journal.pcbi.1011123.g008 PLOS COMPUTATIONAL BIOLOGY that affect disease transmission was accounted for using a time-dependent piece-wise parameter. We attempted to incorporate the effect of facemasks into the model but could not get adequate data for facemask usage in each of the regions we considered. In this case, the disease transmission rate was formulated as follows where c 0j for j = 1, . . ., 13 are region-specific scaling parameters used to compute the baseline disease transmission rate for each region (e c 0j is the baseline transmission rate for region j).
The parameters c 1 and c 2 are covariates for the mobility and facemask usage rates, respectively, and g(t) is a time-dependent piece-wise parameter that is used to incorporate the effect of other factors that affect disease transmission other than mobility and facemask. This formulation can always be extended to explicitly account for other factors that affect disease transmission based on data availability. Our model captures the trends and reported COVID-19 cases in each region (see Fig 6, and Figs G and H in S1 Text). In addition, the results of the two model scenarios agree well, although, there is a slight different in the estimated time-dependent disease transmission rates and the contribution of mobility to disease transmission (columns 2 & 4) for some regions. There are significant changes in the computed disease transmission rate over time, which may be attributed to the intervention strategies implemented by the government during this period. Even though there are no much changes in the time-series mobility data, its effect on the disease transmission rate is still apparent for each region. The estimated total initial prevalence of COVID-19 in all the regions for the two scenarios agree well, as well as the estimates for the time-dependent piece-wise parameter (g(t)), used to incorporate the effect of other factors that affect disease transmission into the transmission rate (see Fig 8). However, the estimated effect of mobility on disease transmission is significantly different for the two scenarios. The mean estimate of this parameter was 0.53 (90% CrI: 0.44-0.60) for the distance matrix scenario and 0.90 (90% CrI: 0.72-0.98) for the weekly mobility matrix scenario. This can be interpreted as mobility contributing to 53% and 90% of the cases in the regions for the distance and mobility matrices scenarios, respectively. These results show that the weekly mobility data provides more information, in terms of disease transmission, than the distances between the regions. Note that the mobility referred to here is for both within and between the regions. To confirm that indeed the weekly mobility data provides more information, we considered a third scenario, where we used a fixed mobility matrix computed using the mobility data for the entire study from July 2020 to January 2021. For this scenario, we estimated the effect of mobility on disease transmission as 0.60 (90% CrI: 0.52-0.70) (see Fig (I-K) and Table D in S1 Text). As expected, the fixed mobility matrix does not provide more information about disease transmission than the weekly mobility matrices, even though it does better than the distance matrix.

Table 2. Model comparison using leave-one-out cross-validation (LOO) and the widely applicable or Watanabe-Akaike information criterion (WAIC).
Model ranking (in descending order) is shown in the first column. The difference between the expected log pointwise predictive density (ELPD) for each scenario and that of the best scenario with standard errors are shown in the second column. In the third column, we have the Bayesian LOO estimate of the expected log pointwise predictive density (ELPD LOO) and its standard error. The LOO information criteria (LOOIC) and its standard error are given in the fourth column. Lastly, the computed Watanabe-Akaike information criterion (WAIC) for each model is shown in the fourth column.

PLOS COMPUTATIONAL BIOLOGY
The constructed distributions for the baseline disease transmission rate for the two model scenarios are similar for some of the regions and significantly different for other regions. These distributions are similar for the larger and more urbanized regions with dense population (Abbotsford, Burnaby, New Westminster, Surrey and Tri-Cities) and significantly different for the less densely populated smaller regions (see Fig 7). The difference in the predicted distributions for these two groups of regions may be attributed to their population size and mobility in the regions. Lastly, we compared the results obtained from the two scenarios using the leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) methods. This comparison ranks the distance matrix results better than those of the weekly mobility matrices, although, the computed LOO and WAIC for the two scenarios are very similar (see Table 2). We considered these two model scenarios in order to test the hypothesis of whether the time-dependent mobility matrices, computed from the mobile device data, provide more information about human mobility between the regions in terms of disease transmission than the distances between the regions. Based on our results, we conclude that even though the distance matrix provides a better fit to the data, the weekly mobility matrices have the ability to explain the variance in transmission between regions over time.
The model for when the distance matrix is used is considered a gravity model, while the scenario where the weekly mobility matrices are used is referred to as a metapopulation model. Hence, our hybrid gravity-metapopulation model.
Unlike in other models used to study the effect of human mobility on disease spread, where mobility is described based on either the distances between regions or using mobile devices data or other forms of mobility data only [37,45,47], our hybrid gravity-metapopulation modeling framework provides the flexibility of switching between the two data types. In addition, our framework provides an approach for studying the spread of diseases between all the regions of interest, rather than from an epicenter or a large city to its neighboring smaller cities [43,45], with the ability to quantify the effective contribution of mobility to disease spread between the regions. The models used to study disease spread from an epicenter to neighboring regions/cities are only suitable for studying disease spread at the early stages of the disease outbreak since there is a much higher probability of disease transmission from a person living in the epicenter to those living in the neighboring regions as shown in [45]. Also these models do not account for disease transmission between the smaller neighboring regions. Our modeling framework is suitable for studying disease dynamics at any stage of the epidemic and accounts for disease spread from each of the regions to all the remaining regions, irrespective of the number of reported cases in each region.
Overall, our modeling framework provides the ability to explicitly incorporate real data on factors the affect disease transmission into the disease transmission rate, and also allows independent assessment of the contribution of these factors to disease transmission in an epidemic. Furthermore, this framework allows us to quantify the effect of mobility on disease transmission in the regions. However, this work is not without limitations. We quantified the effect of mobility on disease transmission in the 13 LHAs of Fraser Health, BC, based on movements between these thirteen regions only. However, there are movement in and out of these regions to other parts of BC. Another limitation of this work is that some regions in Fraser Health are closer to regions in other regional health areas in BC, than they are to other regions in Fraser Health. For example, Burnaby is closer to Vancouver than it is to many of the LHAs in Fraser Health. As a result of this, the spread of COVID-19 in Burnaby may be influenced more by the number of cases in Vancouver than in other regions in Fraser Health, e.g. Hope, Chilliwack and Agassize/Harrision. In the example presented here, we explicitly incorporated only the time-series mobility data into the disease transmission rate and accounted for other factors that affect disease transmission through a piece-wise parameter. An interesting extensions of this work would be to incorporate the data for other factors that affect disease transmission explicitly into the model. This way, the effect of each factor on disease spread can easily be assessed. Another extension of this model is to include vaccination and the variants of concern of COVID-19. Since mobility rate varies by age, an exciting extension of this work would be to stratify the population of each region by age. This way, in addition to assessing the impact of mobility on disease spread, it would also be possible to assess the contribution of each age group to disease spread.
Supporting information S1 Text. Supplementary methods and results. This document contains more details of the methods and results. (PDF)