Trending on the use of Google mobility data in COVID-19 mathematical models

Google mobility data has been widely used in COVID-19 mathematical modeling to understand disease transmission dynamics. This review examines the extensive literature on the use of Google mobility data in COVID-19 mathematical modeling. We mainly focus on over a dozen inﬂuential studies using Google mobility data in COVID-19 mathematical modeling, including compartmental and metapopulation models. Google mobility data provides valuable insights into mobility changes and interventions. However, challenges persist in fully elucidating transmission dynamics over time, modeling longer time series and accounting for individual-level correlations in mobility patterns, urging the incorporation of diverse datasets for modeling in the post-COVID-19 landscape.


Introduction
Due to the highly contagious nature of COVID-19, reducing social interactions and community movement has been crucial in lowering transmission rates [21,30,32].Interventions have been adopted to reduce the transmission of COVID-19, including the practising of social distancing, self-isolation or quarantine [15,17], and so on.The implementation of interventions in response to infectious disease outbreaks is not new, and these methods aiming to reduce social contact and limit mobility have been used for centuries, as adopted in the outbreak of MERS and SARS epidemics [6,37].However, despite the historical knowledge of the link between mobility and disease, quantifying this relationship in detail has been challenging, especially over large geographical areas and for large populations.In response to the COVID-19 pandemic, academic researchers have dedicated significant efforts to study the connection between human mobility and COVID-19 transmission.They have utilized various datasets and mathematical models in different countries and regions [14,26,27].An example of such datasets is provided by Google.Google released data collected from users accessing its applications through handheld devices.The "Community Mobility Reports" (CMR) [1] from Google showcase alterations in activity and mobility across various location types, comparing the period before the global spread of COVID-19.Given the lack of alternative global data sources for these factors, Google mobility data serves as a reliable indicator of the impact that health recommenda-tions and government restrictions have had on social activity and movement.It provides distinctive and valuable insights into changes in mobility, presenting a unique opportunity to explore the correlation between mobility and disease incidence.Thus, researchers are progressively exploring methods to integrate Google mobility trends into COVID-19 research.Searching on PubMed with the terms "Google Mobility Data" and "COVID-19" generates over 288 results, while on Google Scholar, there are more than 694,000 matches for the same query.
In this review, we have delved into the extensive body of literature addressing using Google mobility data during the COVID-19 crisis.While these papers employ both statistical methods and mathematical models, including compartmental and metapopulation models, our primary focus centers on the utilization of Google mobility data in the context of COVID-19 mathematical modeling.We examined existing models incorporating Google mobility data in general and highlighted the use and effectiveness of Google mobility data to enhance traditional infectious disease models and discuss challenges that may arise with its burgeoning addition to the infectious disease modeling suite.We also discussed papers that did not employ the Google mobility data directly in the model but instead used it to validate the model performance or model input data source.
This paper is organized as follows: Sect. 2 describes the method of article collection; Sect. 3 details the application of Google mobility data in different aspects of epidemic modeling.Last but not least, some challenges, observations, and conclusions are summarized in Sect. 4.

Article collection
We initiated our search by exploring PubMed and Google Scholar for articles published between January 2020 and May 2023, aiming to encompass the latest research on the utilization of Google mobility data in COVID-19 models.The searching terms we used are "COVID-19", "novel coronaviruses", "2019-nCov", "SARS-CoV-2", "Google mobility data", and "mathematical modeling".This effort yielded around 100 relevant articles for our study.The selected articles apply various mathematical models to analyze, simulate, and predict the association between human mobility and COVID-19.We categorized these articles mostly into two distinct groups: those utilizing Google mobility data directly in the modeling process and those that use Google mobility data as reference to validate the model.The varieties of models employed in these papers are depicted in Fig. 1.This concentration led us to conduct a thorough examination of more than a dozen highly influential research studies.

Epidemic models
Upon reviewing these studies, we observed that they could be categorized based on the mathematical models utilized and whether they incorporate Google mobility data into the modeling framework.Therefore, in Table 1, we first catergorized the most of collected articles into two groups, which modeled the COVID-19 dynamics with either compartmental or metapopulation models.Subsequently, in Sects.3.1 and 3.2, we delve deeper into this classification, specifically considering whether Google mobility data is incorporated into their modeling procedures.Connecticut [9] Kenya [4] US [18] UK, South Africa, Brazil [39] Age structured SEIR-type Philippines [9] Israel [13] Ontario, Canada [19] France [26] England and Wales [35] Metapopulation models Age structured Santiago de Chile [14]

Models that incorporated the Google mobility data into the epidemic model
As shown in   is the variable for places of residences.Similarly, another group of reseachers [8] also developed an SEIR-type compartmental model to evaluate the impact on the COVID-19 epidemic in each state of the United States via incorporating mobility data, confirmed case data and contact tracing.To estimate contact rates, the authors employed several types of mobility data (i.e., Unacast [33], Google [1], OpenTable [31]).Within the model, the influence of social distancing, hygiene measures, and reopening is characterized by a time dependence of the contact rate c(t): ] and the probability of transmission per infected contact β: β (t) = β 0 × θ (t) η .Several mobility data are applied to fit the contact rate model c(t), aiming to derive the prior distributions for parameters.The authors found that Google's "retail and recreation" (γ 2 = 0.49) and Unacast (γ 2 = 0.52) generate the highest R-squared values.In summary, the findings indicate the necessity to broaden the utilization of mobility data sources for constructing prior distributions, as opposed to merely incorporating such data directly into modeling contact rates.
In another paper published in 2021, authors [18] present a deterministic SEIR compartmental framework to forecast severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections and evaluate the effects of nonpharmaceutical interventions within the United States, including analysis at the state level as well.The duration of the research period was longer than that of the previous two studies.Interestingly, in their model, not only β is considered a function of time, but the force of infection is modulated by a mixing parameter α defined in such a way that λ (t) = β(t)(I 1 + I 2 ) α /N, where I 1 and I 2 describe pre-symptomatic and symptomatic individuals.They use four data sources on human mobility to construct a composite mobility indicator by a linear regression model, linking the implementation of different NPIs.Those sources include not only Google Community Mobility reports [1] but also Facebook Data for Good [12], SafeGraph [29], and Descartes Laboratories [11].For Google mobility data, they take the average of the percentage change in the "Retail and recreation", "Transit stations", and "Workplaces"to represent the mobility trend most strongly affected by the social distancing measures.Research results confirm the effectiveness of NPIs under different scenarios.
Other authors [4] also managed to construct a modification of the SEIR scheme using data from another country, Kenya, with a compartment W to account for the portion of recovered individuals that return from a completely protected state to a partially protected state due to waning immunity.The authors model the SARS-CoV-2 dynamics in each of the 47 Kenyan counties as a two-group SEIRW transmission process with differences in their abilities to reduce social mobility.The per capita forces of infection on individuals in the two groups, lower and higher social-economic groups, denoted respectively λ L (t) and λ U (t), are described as follows: The estimation of the proportion c U (t) of the higher socioeconomic group interacting in locations outside the home is determined through the average change in the "retail and recreation", "grocery and pharmacy", "transit stations", and "workplaces" settings in Google mobility trend data.The authors posit that Google mobility data is more effective in depicting access trends for the higher socioeconomic group when visiting locations outside the home, attributed to their ownership of smartphones.
Based on case and mortality data, Yang and Shaman [39]   Force of Infection "Retail and recreation", "Grocery and pharmacy", "Transit stations", "Workplaces" [18] Scaling Transmisson Rate Facebook Dataset, SafeGraph, Descartes Labs Dataset, "Retail and recreation", "Transit stations", "Workplaces" [39] Scaling Transmisson Rate "Retail and recreation", "Transit stations", "Workplaces" contact matrix.The force of infection was defined as follows: where a is age, j and c are population groups, and P, I, and L are infectious groups.The contact matrix in a specific age group is adjusted as where C H , C S , C W , and C L are the age-specific contact matrices associated with households, schools, workplaces, and other locations [24].h(t) is a constant; s(t) depends on the percentage of students attending educational institutions; w(t) is a polynomial spline fitted to the Google mobility's "workplaces" data; l(t) is a polynomial spline fitted to the average Google mobility's "retail and recreation" data, "grocery and pharmacy", "parks", and "transit stations".Jentsch et al. [19] developed an age-structured SEPAIR (susceptible, exposed, presymptomatic, asymptomatic, symptomatic, removed) model with 16 age classes to project COVID-19 mortality under four different COVID-19 vaccine scenerios in Ontario, Canada.The model takes population adherence to NPIs, changes to mobility patterns, and seasonality into consideration.The force of infection in the model can be modulated as follows: where γ is the probability of transmission per contact, s represents seasonality, and ∅ is a seasonality phase.C ij (t) is the average number of contacts per day at workplaces, schools, households, and other locations, which can be represented as where C ij (t) varies depending on individual adherence to NPIs as well as government shutdown policies.The authors utilize deviations from the baseline time spent at retail and recreational venues to signify population compliance with nonpharmaceutical interventions (NPIs).Consequently, the proportion x(t) of individuals adhering to NPIs is determined by fitting the reduction in "Retail and Recreation" from Google mobility data.
The authors also fit a step function Pullano et al. [26] described the impact of age-specific contact activity in COVID-19 transmission in French regions with a stochastic discrete age-stratified SEIR structure.The authors adopted the social contact matrices measured in a survey in France in 2012 [3] as the baseline conditions for their model.The contact matrix incorporates both the nature of the activity and the location of contacts (such as home, school, workplace, etc.).Adjustments to the contact matrices serve as the basis for modeling intervention strategies.The application of the Google mobility data is to estimate the percentage change of individuals at the workplace to account for the work contact pattern changes.Moreover, in the analysis of model selection, the authors illustrate that accounting for changes in contact patterns during the exit phase of intervention measures provides a more accurate description of the epidemic trajectory.
Across the sea, with the data from England and Wales, Waterlow et al. [35] created a deterministic compartmental transmission model to examine the impact of cross-protection from seasonal coronavirus (HCoVs) on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).Both seasonal HCoVs and SARS-CoV-2 have populations grouped in either susceptible (S), exposed (E), infectious (I), or recovered (R) compartments with five age groups.Due to the nonpharmaceutical interventions implemented, the authors split the contact matrices into three categories: school contact matrix, household contact matrix, and other contact matrix (originally from contacts in all other categories reported in the POLYMOD study [23]).Based on Google mobility data, the authors adjust the 'other' contact matrix with the average change in "retail and recreation", "workplace", "grocery and pharmacy", and "transit stations" reported in the Google Community Mobility Reports.Table 3 intuitively reflects the use and selection of mobility data for age structured SEIR-type model.
Later in 2022, Gavish et al. [13] used a mathematical model that accounts for the agestratification, vaccination, and booster administration, and waning immunity afterwards, to assess the population-level impact of the booster campaign in Israel.The social contact matrix used to model the infection process is composed of a time-varying linear combination of contact matrices.Interestingly, the transmission rate β is not only considered as the function of time, , but also modulated by the contact matrix as Contact matrix "Parks", "Grocery and Pharmacy", "Workplaces", "Retail and recreation", "Transit stations" [13] Contact matrix "Residential", "Retail and recreation", "Workplaces" [19] Contact matrix "Retail and recreation", "Workplaces" [26] Contact matrix "Workplaces" [35] Contact matrix "Retail and recreation", "Workplaces", "Grocery and Pharmacy", "Transit stations" [36] The reproduction number "Retail and recreation", "Workplaces", "Grocery and Pharmacy", "Transit stations" follows: where F h , F w , F s , F c are household, work, school, and community contact frequency matrices, respectively, derived from the existing literature [25].The household, workplaces, and community coefficients a h (t), a w (t), and a c (t) are based on percent change in each type of location from its baseline in Google's COVID-19 community mobility report.Only the school coefficient a s (t) is set according to the assessed proportion of school openings at each period.The coefficients ω h , ω w , ω s , ω c express the number of contacts occurring in different locations above, representing the contribution of each location to the overall contact matrix C(t).The household setting has coefficient of ω h = 1 since other coefficients are set as relative to ω h .By data fitting, authors obtain ω w = 1.7, ω s = 1.6, ω c = 2.9.Hence, community contact makes the largest contribution to the overall contact matrix C(t).One notable aspect is the asymmetry of their community contact matrix, casting some doubt on the validity of their results.The common assumption of the normal compartment models is that the population is homogeneous, and it is justified as long as infection within a single community is concerned [16,20].It might not work well when a larger scale is concerned [22].Metapopulation models are created based on a network of subpopulations (i.e., cities, regions, countries) connected by mobility.The disease dynamics inside each patch (i.e., sub-population) follow a compartmental model like those described in the previous section.Metapopulation models always represent socio-technical systems as networks in which nodes describe subpopulations and link the mobility flows between them.More specifically, in the metapopulation model, the mobility data are mainly used to characterize the flow between each metapopulation groups.
Rader et al. [27] used a metapopulation SIR model to study the link between the shape of the epidemic curve and the spatial features of cities.The authors determine the percentage of daily movements within prefectures in China by extracting data on human mobility from the Baidu web platform.The authors extend their results to cities across the world by employing the fitted model from China along with globally extensive covariates.Human mobility data from Baidu are not available for locations outside of China, and hence the authors use the Google mobility dataset to calculate both mobility within shapefile in 310 cities and mobility coming into each city.The authors also mention the limitations of Google's mobility data that cannot describe population-level mobility patterns.In another paper [28], the authors aimed to determine the extent to which well-planned restrictions relaxing strategies could postpone the resurgence of COVID-19 on a continental scale and curtail community transmission.They first estimate the baseline mobility probability by incorporating mobility data obtained from the pre-COVID-19 continental Google The aforementioned studies did incorporate Google mobility data into their models.Nevertheless, as noted by Unwin et al. [34], relying solely on Google mobility data is insufficient to account for all variations.Although mobility data accounts for a significant portion of the R t trend, it does not comprehensively depict the evolution of transmission dynamics over time.Other behavioral shifts during COVID-19 are likely contribute to variations as well.Unwin employed a second-order, weekly, autoregressive process to grasp these changes, yet attributing them solely to other transmission determinants or interventions remains challenging.Furthermore, the majority of the aforementioned studies only focused on a single wave, making it unclear how useful they would be for longer time series (refer to Table 4).Currently, there is not sufficient information available to formulate a unified model using Google mobility data for fitting multiple waves in epidemic modeling.Furthermore, in Sect.3.2, it is worth noting that some authors did not directly incorporate Google mobility data into their mathematical modeling process; instead, they employed it as a validation tool.This highlights the diverse approaches taken in utilizing mobility data across studies.

Models that deployed the Google mobility data as a tool for validation
Unlike the effort above, several articles did not incorporate Google mobility data into the model.Instead, the authors used it as a tool to check model performance or to validate the model input data.For example, Wong et al. [38] presented a modification of the SIR scheme, considering the long and variable delay times reported in the literature.Forward predictions of the model not only provide robust short-term epidemic estimates (peak position and severity) under social distancing but also the epidemic dynamics later under releasing orders in the summer of 2020 in Illinois.The effective reproduction number R t is expressed by the authors as a parametrization involving the basic reproduction number R 0 , a seasonal forcing estimate F(t), a mitigation profile M(t) parametrized as a piecewise cubic Hermite interpolating polynomial, and the susceptible population fraction S(t)/N , i.e., R t = R 0 F(t)M(t) S(t)  N .Based on the assumption that no causal relationship exists between R t and mobility data, even though the model is not supplied with prior information on nonpharmaceutical interventions, it exhibited a mitigation trend that resembles the mobility data reported by Google and Unacast, showing its flexibility and calibration procedure.Google mobility data was not directly deployed in the model-building, instead, it was used as validation tool to compare with the target model results.
Similarly, authors in another paper [9] built a model with Google mobility data, but just to use it as a comparison with their primary model.Based on the SEIR framework, they developed a county-stratified deterministic model using close contact rate to recapitulate the COVID-19 transmission and predict case counts in Connecticut.The close contact rate was derived from the pairs of devices that are within six feet in Connecticut.This close contact rate later is used to determine the mobility metric M contact (t), to parameterize temporal dynamics of transmission parameter β (t), where and exp [B(t)] is a function that approximates residual changes in transmission parameter.When the estimated value of [B(t)] under a particular mobility metric approximates zero, this mobility metric explains most of the variation in transmission.To evaluate the usefulness of this close contact rate as an input to the transmission model, the authors also fit the SEIR transmission model with mobility metrics from Apple [2], Descartes Labs [11], Facebook [12], Google [1], Cuebiq [10], and with a no-mobility null model.The model with the described close contact rate fits best, and other mobility metrics exhibit a poorer fit.The authors hereby confirm that mobility metrics primarily measure movement, which might not represent close interpersonal contact.
Watson et al. [36] used an age-stratified SEIR model structure to study the dynamics of the SARS-CoV-2 in Damascus, Syria.The time-varying reproduction number R t is modulated by where f (x) = 2 exp (x) /(1+exp (x)), to capture the impact of mobility data on transmission.M (t) is the inferred mobility throughout the epidemic.ρ i reflects the change independent of mobility in transmission.However, Google mobility data is not available in Syria, the authors estimate mobility using a Boosted Regression tree model based on an alternative data source.To validate this tree model inferred mobility data, the author compared it with the Google mobility data of Turkey, Iraq, Jordan, Lebanon, and Israel.
Gozzi et al. [14] introduced a metapopulation model with an age structure, employing a stochastic mechanistic epidemic model that considers mobility, physical contacts, and census data.Within this study, the population was subdivided into N comunas and categorized into K age groups.Within each subpopulation, the author employed an SLIR compartment model to simulate the dynamics of the epidemic.The author determined reductions in mobility and interpersonal contacts by leveraging data from mobile devices, utilizing this information as an input for the model.Interestingly, while the primary model proposed by the author did not initially incorporate Google mobility data, it was eventually integrated into an alternative compartmental model presented in supplementary materials.This integration allowed for a comparative assessment of different models.In the model that incorporated Google mobility data, the author treated the entire metropolitan area as a unified, age-structured population.The contacts matrix within this model encompasses a linear combination of four components, representing interactions occurring at school, in the workplace, at home, and in other locations: where ω (t) is the location-specific, time-varying contacts reduction coefficient, the Google mobility data was used to characterize contacts variations at home, workplace, and other locations.The model with a simplified structure that incorporated Google mobility data actually exhibited poorer performance when compared to the primary model initially proposed.
Chang et al. [7] introduced a metapopulation SEIR model in which subpopulations are from smaller geographic units of the ten largest metropolitan areas in the USA.The subpopulation in these units can interact when visiting a point of interest (POI), which might be a bar, hotel, gym, etc.The system is modeled as a bipartite network with time-varying edges, in which the two types of nodes are units and POI.The weight of an edge W (t) = W ij between a unit and a POI is estimated from SafeGraph data.The researchers used the high Pearson correlation between the SafeGraph and Google mobility datasets to demonstrate the reliability of the SafeGraph datasets since its mobility changes are consistent with Google under the observed period.While the Google mobility data is not directly incorporated into the network, it serves as a validation tool for assessing the reliability of SafeGraph data through its utilization.

Conclusions
Before the onset of COVID-19, research on nonpharmaceutical interventions (NPIs) primarily relied on theoretical frameworks, hampered by the notable limitation of lacking empirical data that could describe behavioral changes.However, with the advent of the COVID-19 pandemic, an unprecedented wealth of high-resolution datasets, capturing various facets of NPIs and human mobility, has been amassed and shared.A substantial majority of models now integrate these datasets as inputs.Consequently, there has been a shift from theoretical approaches in the pre-COVID-19 era to data-driven modeling in the post-COVID-19 landscape.Google mobility data, in particular, contributes distinctive and valuable insights into mobility changes and the implementation of interventions, whether integrated into mathematical models or employed as a validation tool.The articles we reference incorporate mobility into models depicting COVID-19 dynamics, often simplified and expressed through contact matrices, contact rates, effective reproduction functions, and the rescaling of key parameters based on mobility data.However, several noteworthy factors deserve attention in evaluating the utility of Google mobility data.While it does capture a significant portion of the Rt trend, it falls short in fully elucidating the dynamics of transmission over time, leaving room for the influence of other behavioral shifts during the pandemic.Unwin et al. 's attempt [34] to capture these dynamics using a second-order, weekly, autoregressive process underscores the complexity of attributing variations solely to transmission determinants or interventions.Moreover, the focus of most studies on single waves raises questions about the applicability of their findings to longer time series, indicating a need for more robust modeling approaches.Additionally, due to the aggregate nature of Google datasets, there remains a challenge in accounting for individual-level correlations in mobility patterns.The availability of Google's consumer location history feature is also limited to smartphone users, turned off by default, and subject to differential privacy algorithms designed to safeguard user privacy by obscuring fine details.Additionally, mobility estimates may exhibit biases due to the specific populations included in Google mobility data, potentially leading the model to overestimate the spread and resurgence of COVID-19.Consequently, it becomes imperative to broaden the scope by incorporating multiple datasets to capture population-level patterns beyond the confines of any single service or system.

Figure 1
Figure 1 Summary of mathematical models applied in the selected articles to the "Workplaces" field of the Google mobility data so as to obtain the values of ε W , k 1 , k 1 , thus revealing the workplace function C W ij (t) and the school function C S ij (t).Matrices C o ij and C H ij are merged under the assumption that NPIs in home has the same effacacy as other locations.

Table 1
Summary of the epidemic models: the first column describes the main categories; the second column shows the sub-categories; the last column presents the countries studied in each references [8]egory SubcategoryThe countries studied in each referenceCompartmental models SIR-type US [34] Age structured SIR-typeIllinois, USA [38] SEIR-type US[8]

Table 2
The use and selection of mobility data for an age unstructured SEIR-type model

Table 3
The use and selection of mobility data for age-structured SEIR-type models ) is the reproductive number at time t = t 0 , and ρ (M) is the spectral radius of a matrix M.More specifically, contact matrix C(t) is modeled as follows:

Table 4
Overview of investigated durations of COVID-19 in studies utilizing Google mobility data in the model NUTS3 (Nomenclature of Territorial Units for Statistics) dataset and Call Data Records from Vodafone in Spain and Italy and then extrapolate it across Europe, employing a linear model, to generate the continental baseline mobility probability.Then the Google COVID-19 data was aggregated to represent the reduction in the NUTS3 area during NPIs.A metapopulation model at the NUTS3 resolution was built.The authors emphasize the importance of incorporating multiple datasets to better capture population-level mobility patterns.