Investigating mobility-based fast food outlet visits as indicators of dietary intake and diet-related disease

,

Main Outcomes and Measures: Main outcomes were self-reported FF intake frequency (never, infrequent, moderate, frequent), obesity, and diabetes from LACHS.FF outlet visits were computed as the temporal frequency of FF visits (FF visits/time) and the ratio of visits to FF over all food outlets (FF visits/food), summarized over smartphone users in a neighborhood, scaled from 0-10, and linked to LACHS respondents by census tract.

Conclusions and relevance:
This study illustrates that population-scale mobility data provide useful, passively-collected indicators of FF intake and diet-related disease within large, diverse urban populations that may be better than self-report intake. .

Introduction
Food environments, where people acquire and consume food, impact diet and related diseases (i.e., nutritional health) 1 .To date, research has focused on predefined local and static food environments, largely of the home neighborhood 2,3 .Their features (e.g., the availability of fast food outlets) can predict nutritional health 1 although findings are mixed [4][5][6] .A growing proportion of food acquisition occurs miles from our homes 7 , therefore the limited focus on static food environments may be one cause of these mixed results.
A major gap in the literature is evidence of the dynamic food environments people are exposed to in their daily routines (i.e., their "activity space" 8 ), the food outlets they visit, and how these mobile food environments impact dietary intake and health.With the availability of big data on human mobility (i.e., geolocations captured by people's smartphones), population-level research on the food outlets that people have access to and visit given their daily movements is now possible.Some studies (often n<100) have begun to use GPS tracking technologies to continuously observe how people navigate their environment to acquire food over relatively brief time periods (i.e., 1 week) 9,10 .However, to our knowledge, large-scale mobility data has not been used to study food environments and their connection with nutritional health.
This study undertakes a critical first step in this line of research: investigating whether visits to food outlets observed in population-level mobility data provide meaningful indicators of dietary intake and diet-related disease.We focus these analyses on fast food (FF) outlets specifically because FF intake is linked to disease risk 11 , makes up 16% of Americans' caloric intake 7 , and because FF outlets are well-distributed across food environments.
We utilize a large mobility data set from Los Angeles County (LAC), U.S.A., to generate neighborhood-level measures of visits to FF outlets.The first objective was to determine whether visits to FF outlets from population mobility data are a meaningful indicator of individuals' self-reported FF intake.The second objective was to determine whether visits to FF outlets (mobility data) are a meaningful predictor of individuals' obesity and diabetes, and a comparable or better predictor than self-reported FF intake. .

Individual Health and Demographic Data Source and Measures
Individual-level measures of FF intake and diet-related disease come from the 2011 Los Angeles County Health Survey For this study, we excluded participants who: were missing residential census tract information, lived in a rural census tract 13 , or had missing data for all outcome variables.The final analytic sample was 5,447 participants residing in 1,941 census tracts.
All variables analyzed in this study were self-reported, and some were recoded from the original measures (eMethods the Supplement) for ease of interpretability.FF intake frequency was coded as a four-category variable: never, infrequent (< once per month), moderate (≥ once per month to < once per week), and frequent (≥ once per week).Obesity (having a Body Mass Index, BMI ≥30) and diabetes (having a diagnosis) were coded as binary variables (yes/no).
Sociodemographic factors included age group, gender, race/ethnicity, educational level, and household income level.
Respondents' census tract of residence was derived from their home address.

Geolocation (Mobility) Data Source and Measures
Geolocation (i.e., mobility) data were collected by Cuebiq 14 , a location-based services company that maintains anonymized geospatial datasets on human mobility by aggregating data across smartphone applications from mobile phone devices.The dataset consists of anonymized records of GPS locations from individual adult (≥18) smartphone .users who have opted in to provide access to their GPS location data anonymously through a General Data Protection Regulation and California Consumer Privacy Act compliant framework.Users across all major smartphone device operating systems (e.g., iOS, Android, Windows) are represented.The dataset includes 243,644 users with estimated residential census tracts (explained below) in LAC between October 2016 -March 2017 (6 months), representing 3.1% of the LAC adult population 15 .
The data consist of geolocation "pings" identifying the location of a given smartphone, typically recorded every 5-15 minutes (eFigure 1 in the Supplement).Each ping contains the GPS location of the phone (latitude and longitude), timestamp, and anonymous (encrypted and hashed) identifier which is unique for all smartphone users.No other individual information (e.g., demographic characteristics) was available on users.From the trajectories of pings for each user, we used a detection algorithm 16 to filter out transient locations and extract meaningful "stays" (or stops) at particular locations of at least 5 minutes duration.We excluded users if they had fewer than two stays at any location over the 6 months, resulting in an analytic sample of 234,995 users with over 63 million observed stays (eTable 1 in the Supplement).
Visits to food and FF outlets were identified by linking geolocated stays with a points of interest (POI) database obtained from the public Foursquare API 17 in 2017, which provides the names and geolocations of 239,509 POI in LAC.Food outlets were defined as any location where food might be sold (including restaurants, food retailers, and other locations) and FF outlets were defined as limited-service restaurants serving menus of predominantly ultra-processed and/or lownutrient, energy dense foods (e.g., McDonald's, Taco Bell, Pizza Hut).We identified food and FF outlets using a combination of Foursquare's existing categorization taxonomy, and a bottom-up search of known chain FF outlet names validated in previous research (eMethods and eTables 2-3 in the Supplement) 18,19 .After recoding, there were 53,588 food outlets and 4,151 FF outlets.A total of 14,498,850 visits to food outlets were detected across the analytic sample.
We estimated the home residential census tract for each user as the tract in which the majority of their activity between 10pm-6am occurred.Using the preprocessed mobility data, FF outlet visits were defined first at the level of an individual user, and then aggregated and averaged across users living within 247 LAC neighborhoods 20 to represent the "typical" FF .visit behavior of residents in that spatial area.The neighborhood level was the smallest administrative area we could demonstrate that our mobility user sample achieved broad geographic representation of the underlying population (eMethods and eFigures 4-6 in the Supplement).
The first FF outlet visit variable, the temporal frequency of FF outlet visits (FF visits/time), was defined as the percentage of observed periods (out of three possible daily periods: before 11am, 11am-4pm, after 4pm) in which a user visits at least one FF outlet, out of the total number of observed periods for that user.The second variable, relative frequency of FF to all food outlet visits (FF visits/food), was defined as the percentage of the total number of visits to FF outlets for a user out of the total number of visits to any food outlet for that user.We also defined a covariate representing average mobility behavior, as the average number of trips, measured as "stays" (defined above) per user per day (trips/day).These three variables were averaged over all users with an estimated home residence within a neighborhood, rescaled from 0-10 to enable comparison of effect sizes in regression analyses, and linked as contextual variables to individual respondents from the LACHS survey based on their home census tract of residence.A geographic visualization of the unscaled mobility variables is provided in the Figure.
We used sample post-stratification techniques, comparison with census data, and sensitivity tests on the mobility data to establish (i) its measurement accuracy, (ii) the validity of approaches taken to attribute stays to POI, and (iii) its population representativeness at the neighborhood level; see eMethods, eTables 4-5, and eFigures 3-6 in the Supplement.
All study protocols were approved by the Institutional Review Boards (IRBs) of the LACDPH, University of Southern California, and Massachusetts Institute of Technology.Where applicable, this study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines.

Statistical Analysis
Logistic regression models at the LACHS respondent-level with linked mobility variables were generated to test the study objectives.For the first objective, unadjusted (univariable) and adjusted (multivariable) multinomial logistic regression models were used to estimate the odds ratios (ORs) for the association between the two FF outlet visit variables (IV), and .self-reported FF intake frequency as the dependent variable (DV).The multivariable models adjusted for sociodemographic factors.P-values for the statistical significance of differences between adjusted and unadjusted ORs were tested using a χ 2 test.
For the second objective, multivariable logistic regression models tested whether (i) FF visits/time (IV), (ii) FF visits/food (IV), and (iii) self-reported FF intake (IV) were associated with obesity (DV), and in separate models, with diabetes (DV).
All models adjusted for sociodemographic factors.ORs for the categorical variable (FF intake frequency) and continuous variables (FF visits/time and FF visits/food) cannot be directly compared.Therefore, model fits were compared on the basis of their Akaike weights, which are transformations from raw Akaike information criterion (AIC) values to facilitate interpretation of AIC model comparisons 21 .A model's Akaike weight is interpreted as its probability of being the best out of a set of candidate models.
The LACHS and mobility datasets are from different years because FF intake was not assessed by the LACHS survey after 2011, while geolocation-based mobility data was not available before 2016.We conducted sensitivity analyses to examine whether regression model results were impacted by this time gap.Sensitivity analyses were also conducted to examine whether results were impacted by controlling for users' general mobility behavior (Trips/day).Mobility data were analyzed in Python.LACHS data and statistical analyses were conducted using R software, version 3.6.3.A 2-sided P < .05 was considered statistically significant.Additional data and analysis details are available in eMethods in the Supplement.

Descriptive Results
When comparing the full (n=8,036) and analytic (n=5,447) samples of LACHS respondents, we found small (1-3%) but statistically significant differences in age group, gender, race/ethnicity, household income level, and self-reported FF intake frequency (eTable 6 in the Supplement).The analytic sample had representation across all age groups, race/ethnicity groups, and education levels; and had higher proportions of female vs. male (Table 1).Of the analytic .sample, 17.3% reported never eating FF, 19.1% reported infrequent intake, 26.9% reported moderate intake, and 36.7% reported frequent intake; 24.8% had obesity; and 11.1% had diabetes.
Across the mobility variables linked to LACHS respondents, the median percentage of observed daily periods in which users visited FF outlets (FF visits/time unscaled) was 4.3% (range, 1.0-13.0%);the median percentage of visits to all food outlets that were FF (FF visits/food unscaled) was 10.6% (range, 2.8-22.4%);and the median number of trips per day (trips/day unscaled) was 4.0 (range, 2.2-8.0)(see eFigure 7 in the Supplement for histograms of these distributions).
In the adjusted models, all three measures of FF outlet visits remained strong and significant predictors of FF intake (Table 2), and the estimates were essentially unchanged; the CIs of the adjusted vs. unadjusted ORs are overlapping and there were no significant differences between P values (eTable 7 in the Supplement).
Comparing Akaike weights across the models of obesity, the probability that the model including FF visits/time was the best-fitting model was 0.10, including FF visits/food was 0.90, and including FF intake frequency was 3.6e-7 (Table 4).
Comparing Akaike weights across the three models of diabetes, the probability that the model including FF visits/time was the best-fitting model was 0.69, including FF visits/food was 0.31, and including self-reported FF intake frequency was 2.3e-5.
Regression results were not significantly impacted by (i) controlling for general mobility behavior (Trips/day) (eTable 8 in the Supplement), and (ii) the time gap between data collection for the LACHS (2011) and mobility data (2016-17) (eMethods, eTables 9-14, and eFigure 8 in the Supplement).

Discussion
Using large-scale mobility data from LAC, this study finds strong and consistent evidence that visits to FF outlets, aggregated at the neighborhood level, strongly and significantly correspond to individuals' consumption of FF.Thus, passively observed visits to FF outlets appear to be a good indicator of FF intake in a diverse, urban population.FF behaviors observed in the mobility data also predicted diet-related disease.Moreover, models of the association between FF outlet visits and obesity or diabetes had substantially better fits than between self-reported FF intake, the standard measure in population nutrition research, and obesity or diabetes.Findings held after controlling for individual sociodemographic factors and general mobility behavior, suggesting these indicators are uniquely representing visits to FF outlets rather than mobility alone.Several factors may explain the strength of this result, despite aggregation at the neighborhood level.First, measures of food behaviors observed directly from smartphone-captured mobility data may be .less prone to measurement error compared with self-reported food intake.This may be related to mobility data capturing behavior continuously, over months (or potentially longer), recording more detail of behavioral patterns than can be reliably assessed by self-reported recall 22,23 .Second, the neighborhood-aggregated measures of visits to FF outlets may be capturing an indicator of social behaviors of a larger group.Because eating is strongly influenced by social and cultural factors 24 , the 'social signal' 25 captured by the aggregated measure may be additionally predictive of disease risk.Third, the indicators may be picking up other neighborhood-level risk factors for these diseases.
This study advances research methods that are used to understand how food environments are related to health.It establishes that features extracted from large-scale mobility data provide a strong signal of FF consumption, suggesting that this data source may represent a valid population surveillance tool for this and possibly other eating and health behaviors.Mobility data is objective, and captured passively and continuously over long periods of time, making it a convenient and information-rich means of gathering population-level food behaviors that are notoriously hard to measure.
More broadly, this study introduces human mobility data as an untapped resource for future investigations into links between food environment use and nutritional health across large, diverse populations.Studies involving mobility data might include: re-defining notions of "food deserts" and "food swamps" to account for lived environments beyond the home neighborhood; investigating routine behaviors that determine spatio-temporal accessibility to different types of food environments; using "natural experiments" (e.g., users who change home or food environments) to identify causal mechanisms linking features of food environments and eating behaviors; and developing more effective policies and interventions on food environments that take routine behaviors beyond the home neighborhood into consideration.

Limitations
The results of this study are prone to ecological fallacy, since measures of FF outlet visits averaged across the aggregate of individuals within a neighborhood group are assumed to apply to all individuals within that group.Summarizing mobility features at the neighborhood (or other spatial) level will obscure group differences, for example, differential visits to FF outlets based on gender or other demographics 26 .Additionally, the mobility measures represent sample means .over a convenience sample of mobility users living in a neighborhood, which may not reflect the true mean within that neighborhood.
Smartphone users, although constituting 83% of the U.S. adult population in 2017 27 , represent a subset of the population that has some uneven representation across socio-demographic groups (e.g., low income 27 , older and non-white 28 ), leading to potential biases.Quantifying these biases in mobility data is challenging since demographic information is not available on individual smartphone users, however we use post-stratification sampling on the mobility data to achieve population representativeness at the level of the neighborhoods, which may alleviate some of the potential biases (eMethods in the Supplement).Additionally, our previous work on this dataset demonstrated low bias across income classes by imputing this characteristic for each user 29 .Different study designs will be necessary to investigate whether inferences derived from smartphone users are fully generalizable to these populations.
We have linked two different study populations and timeframes, which leads to potential sources of incompatibility.There may have been changes in food environments between 2011 and 2016-17 that could affect population visits to food outlets.However, we found that there has been very little change in population demographics in the LAC study areas between 2011 and 2017, and that regression model results were robust to removing the census tracts with the greatest change (eMethods and eTables 9-14 in the Supplement).
Limitations of the mobility data include under-sampled visits to FF outlets due to the gaps in measurement of each smartphone user (e.g., when phones are out of service).We address this by defining percentage-based variables in which observations of FF outlet visits are relativized to other observables (e.g., all food outlet visits), but there may be measurement error that cannot be quantified.Separately, the FF intake measures may be subject to biases common to selfreport food frequency measures.
While we have taken several approaches to validate the measurement accuracy of the mobility data and robustness of findings to our methods for attributing geolocations to places, there may be limitations to our ability to detect visits to certain food outlets (eMethods in the Supplement). .

Conclusions
Population-scale mobility data provide rich information about population use of fast food (FF) outlets.While there are sources of bias in mobility data, this study demonstrates that it provides (i) useful indicators of FF intake within large and diverse urban populations and (ii) meaningful predictors of diet-related disease (i.e., obesity and diabetes), and may have advantages over existing dietary assessment methods.Mobility data are likely to facilitate future research investigating how people of diverse backgrounds move around to dynamically use food environments, including and beyond the home neighborhood, and the links to their diet and health.Self-report FF intake frequency 5777.9 3.6e-7 3374.4 2.3e-5 Abbreviations: AIC, Akaike information criterion; FF, fast food.a All models adjusted for demographics: age group, gender, race/ethnicity, educational level, and household income level.Each model included the variable listed in this column as the primary independent variable alongside demographic covariates.
b Each model's Akaike weight can be interpreted as the probability that it is the best model out of the set of three candidate models.

(
LACHS).LACHS is a population-based dual frame (landline and cellular) telephone survey conducted by Los Angeles County Department of Public Health (LACDPH).It collects data from representative samples of adults and children living within LAC, on topics such as health conditions and behaviors, sociodemographics, and home residence.Our study uses data from the Adult Survey module, which includes 8,036 randomly selected LAC residents who are 18 years and over.Detailed study protocols are available from LACHS 12 .

Figure 1 .
Figure 1.Geographic Distribution of the Unscaled Mobility Variables Across Los Angeles County Neighborhoods

Table 2 . Odds Ratios for Unadjusted and Adjusted Multinomial Logistic Regression Analyses of the Association Between Visits to Fast Food Outlets Observed in Mobility Data and Self-Reported Fast Food Intake a Unadjusted analysis Adjusted analysis c Model b FF intake frequency, OR (95% CI) FF intake frequency, OR (95% CI)
Abbreviations: OR, odds ratio; CI, confidence interval; FF, fast food.a Multinomial logistic regression models for fast food intake frequency across four frequency categories.Reference group: no fast food intake.P<.001 for each estimated odds ratio.b Each model estimated fast food intake frequency using the fast food visit variable listed in this column as the primary independent variable.c Adjusted for age group, gender, race/ethnicity, educational level, and household income level.

Table 3 . Odds Ratios for Multivariable Binary Logistic Regression Analyses of the Association Between Visits to Fast Food Outlets and Diet-Related Disease a
Abbreviations: AOR, adjusted odds ratio; CI, confidence interval; FF, fast food.a All models adjusted for demographics: age group, gender, race/ethnicity, educational level, and household income level.b Each model included the variable listed in this column as the primary independent variable.Adjusted odds ratios for the continuous variables (FF visits/time, FF visits/food) and the categorical variable (self-report FF intake frequency) are not directly comparable.