Use of smartphone mobility data to analyze city park visits during the COVID-19 pandemic

Introduction The COVID-19 pandemic focused attention on city parks as important public resources. However, monitoring park use over time poses practical challenges. Thus, pandemic-related trends are unknown. Methods We analyzed monthly mobility data from a large panel of smartphone devices, to assess park visits from January 2018 to November 2020 in the 50 largest U.S. cities. Results In our sample of 11,890 city parks, visits declined by 36.0 % (95 % CI [27.3, 43.6], p < 0.001) from March through November 2020, compared to prior levels and trends. When we segmented the COVID-19 period into widespread closures (March–April) and reopenings (May–November), we estimated a small rebound in visits during reopenings. In park service areas where a greater proportion of residents were White and highincome, this rebound effect was larger. Conclusions Smartphone data can address an important gap for monitoring park visits. Park visits declined substantially in 2020 and disparities appeared to increase.


Introduction
The COVID-19 pandemic has highlighted the value of city parks. Parks have provided comparatively low-risk alternatives to indoor social gatherings and other recreational activities. However, it is unknown whether city park use increased during the pandemic. Increases in park use associated with changing exercise and leisure patterns might have been offset by drops associated with restrictions on organized sports, cancellation of concerts and other large gatherings, and school and playground closures. Moreover, differences in park access, including previously-documented inequities between White residents and residents of color, (Byrne, 2012;Das et al., 2017;Wen et al., 2013;Wolch et al., 2014) could have produced disparate patterns during the pandemic.
Existing evidence is mixed. According to studies, park use increased during initial COVID lockdowns in large Asian cities (Lu et al., 2021) while increasing in some European cities and decreasing in others (Ugolini et al., 2020;Venter et al., 2020). In the U.S., park use increased in New Jersey (Volenec et al., 2021)but declined in North Carolina cities, where visits declined most among non-White and low-income residents (Larson et al., 2021). Studies used both traditional surveys and digital sources (e.g., exercise apps, social media posts, smartphone locations), reflecting both the challenges of data collection during a pandemic and the opportunities created by digital sources.
To address evidence gaps, we used smartphone mobility data for a large-scale analysis of U.S. city park use from 2018 through 2020. Using an interrupted time series approach, we conducted empirical tests of the following hypotheses: when controlling for ambient temperature and COVID-related mortality rates, (i) park use increased with pandemic onset, and (ii) changes during the pandemic differed by neighborhood racial and economic composition. To test these hypotheses, we assessed longitudinal trends in park visits using smartphone mobility data.

Methods
We used location data generated from a panel of smartphones equal to approximately 10 % of the U.S. population to estimate park visits between January 2018 and November 2020. The data are provided by SafeGraph, a company that aggregates data from users whose settings allow their location data to be collected anonymously by third parties. These data appear to be highly representative of the general population: SafeGraph sample count is highly correlated with population count at the micro-neighborhood level.(What about bias in the SafeGraph dataset?, n.d.) SafeGraph data have been used to monitor mobility patterns during the COVID-19 pandemic,  but with limited application to park use. (Kupfer et al., 2021) These data are available at no cost for academic and public interest research.
Smartphone data could provide researchers and park managers an ability to monitor visits to parks by a large sample of the population; however, we also encountered two primary challenges in using the data for this purpose.
First, smartphone locations must be joined to parks data. SafeGraph provides data aggregated by points of interest ("POI"), each of which represents a spatial location (e.g., a coffee shop at a particular address) from which a device issues a location "ping." Until 2021, SafeGraph's POIs included relatively few parks, and SafeGraph's park boundaries did not always match actual boundaries. (Hsieh et al., 2020).
In 2021, however, SafeGraph incorporated park boundary data derived from a U.S.-wide park database assembled by the Trust for Public Land (TPL). TPL builds the database of publicly-accessible parks, trails, and open space from parks data submitted by local agencies and organizations.(ParkServe® -About, Methodology, and FAQ | The Trust for Public Land, n.d.) When no data was provided, park data was created based on available resources such as park information from municipal websites, spatial data available from counties and states, and satellite imagery.
After SafeGraph incorporated the TPL parks boundaries, we found a much higher correspondence between SafeGraph park POIs and the TPL parks database. However, the SafeGraph park POI dataset retained some non-park entries (e.g., "Forest Park Medical Center"). Moreover, we were not interested in every park in the TPL database, which includes playgrounds and green spaces at schools, which might display differing usage patterns from other parks. Therefore, we included only parks validated by both SafeGraph and TPL.
We filtered SafeGraph POIs by North American Industry Classification System (NAICS) code, to include only "Nature Parks and Other Similar Institutions." Next, we selected POIs that spatially aligned with parks in the TPL database. We considered a SafeGraph park to match a TPL park if the SafeGraph park centroid intersected a TPL polygon and the area of the SafeGraph polygon was within 0.5 to 1.5 times the TPL polygon's area. SafeGraph parks that matched TPL parks were included in the study.
We analyzed park visits in the 50 most populous U.S. cities, ranging from approximately 400,000 (Cleveland, OH) to over 8 million population (New York City, NY). We arbitrarily chose a sample of 50 cities to demonstrate the scalability of this analytical approach, which substantially exceeded the geographical scope of any other park use study known to us.
We also obtained data on the characteristics of the population living within a 10-minute (or 0.5 mile) walk of each park, which is frequently cited as the service area of city parks (Sugiyama et al., 2010). TPL generated these estimates by creating a 10-minute walkable service area using a nationwide walkable road network dataset provided by Esri, then using these service areas to calculate race/ethnicity and household income statistics. We used these data to construct a binary variable that classified whether each park served an area with >50 % White population and >50 % high-income households (i.e., household income >125 % of the urban area's median household income). Drawing on ecosocial theory, public health researchers have found that neighborhood advantage, indicated by White race and high income, predicts health advantage better than either race or income alone. (Krieger et al., 2016) We refer to this construct as racialized economic privilege, following Krieger and colleagues. (Krieger et al., 2016).
A second set of challenges comprised temporal dimensions of the SafeGraph data. The outcome of interest was the count of monthly visits to each park in our sample, from January 2018 through November 2020. However, the SafeGraph smartphone panel is not fixed, and visit counts tend to increase over time because SafeGraph's systems improve at recording visits. We adjusted raw visit totals to account for month-tomonth changes in each city's panel size. To address outliers for regression model validity, we top-coded observations so they did not exceed the 99th percentile for monthly visits. In addition, as described below, we accounted for increasing visit counts using model parameters.
As control variables, we obtained ambient temperature data from the North American Land Data Assimilation System (NLDAS-2), using procedures from (Adams et al., 2022). City-level temperatures were population-weighted and converted to within-city percentiles, to account for differing effects depending on typical ranges. At this stage we excluded Honolulu, HI because NLDAS-2 only covers the continental U. S. We obtained COVID-19 death rates from the New York Times (GitHub -Nytimes/Covid-19-Data: An Ongoing Repository of Data on Coronavirus Cases and Deaths in the U.S., n.d.). We calculated city-level COVID indicators as the mean of death rates from any counties overlapping the city boundary, weighted by the share of the city boundary in the overlap.

Statistical analysis
Since our outcome was overdispersed count data, we used quasi-Poisson regression. To account for seasonality we included calendarmonth fixed effects. We also included calendar-year fixed effects to remove bias from improved recording of POI visits. We included controls for COVID-19 deaths and temperature: based on exploratory analyses, we log-normalized death rates and used 3rd-degree basis splines to account for non-linearity in the temperature-park use relationship. Since intraclass correlation can bias standard errors, we conservatively clustered errors by city and month (Cameron et al., 2011).
The vast majority of U.S. jurisdictions implemented some closure measures in March 2020 (e.g., stay-at-home orders) and some reopening measures in May 2020 . U.S. COVID-19 deaths surpassed 100,000 in May. No U.S. city was COVID-free by the end of November 2020. While the specific timing of closures had only modest effects on activity patterns,  changes in activity patterns roughly tracked the timing of policy measures.
First, we examined the effects of the pandemic from March-November 2020 on park visits. We then segmented the pandemic period into (a) closure (March-April 2020) and (b) reopening (May-November 2020) periods (Fig. 1). Finally, we assessed whether pandemic changes in park visits varied by racialized economic privilege, using a moderation analysis that interacted COVID indicators with the privilege indicator for each park.

Results
Eighty-two percent of the parks in the SafeGraph POI data spatially aligned with parks in the TPL database. After discarding a small proportion (5 %) of parks for which a full panel of data was not available, we retained a final sample of 11,890 parks. Table 1. The greatest number of sample parks were in New York City, NY (n = 891) and Chicago, IL (n = 701); the fewest were in Colorado Springs, CO (n = 9) and Long Beach, CA (n = 94). City-level parks count correlated with population count at 0.84.
We found racialized economic privilege moderated the effects of reopening. Table 2. The rate of park visits in privileged service areas was 8.4 % greater during reopening (95 % CI [1.9, 15.4], p = 0.01), compared to non-privileged service areas. We did not find a moderation effect when comparing visits during closure.

Discussion
We demonstrated that mobility data from a large sample of smartphones can be used to monitor city park visits longitudinally and to address substantive research questions. These data enabled us to measure park visits on a large scale, using a sample that represented most parks in the 50 most populous U.S. cities. Our analyses indicated that park visits during COVID-19 were lower than the levels observed in 2018-2019. It appears this reduction was larger during the months when most public spaces were closed; after jurisdictions began reopening, the reduction in park visits was smaller but still substantial. However, the effect of reopening varied according to the racialized economic privilege of the population living within walking distance. Where the proportion of White and high-income residents was higher, reopening entailed a larger rebound in park visits.
The findings of this proof-of-concept study have implications for research on parks equity in cities. Examining longitudinal trends in park visits poses major methodological challenges. Systematic observation of parks (Cohen et al., 2009;McKenzie et al., 2006) is the gold standard but requires in-person observers, which limits scalability. Population surveys are limited by self-report, selection, and other biases. Researchers have used other digital methods, such as extracting geolocation data from wearable trackers Patnode et al., 2010;Ries et al., 2009;Tappe et al., 2013) or geotagged social media posts. (Hamstead et al., 2018) The current study used data from a much larger sample.
Our findings align with existing evidence that racialized economic privilege shape access to city parks and that the COVID-19 pandemic has disproportionately burdened marginalized communities. Proximity to city parks is not always the largest barrier faced by people of color and lower-income families; (Wen et al., 2013) access barriers may also include a lack of leisure time, (Byrne, 2012) social support, (Evenson et al., 2002;Wilbur et al., 2002) suitable programming, (Cohen et al., 2016) or safe walking routes. (Cutts et al., 2009;Das et al., 2017) These barriers may have grown during the pandemic, when lower-income workers were more often expected to report to work  and community violence increased dramatically, especially in racially segregated neighborhoods (Martin et al., 2022). In addition to these baseline inequities in park access, our analysis found that reopenings entailed a greater increase in park visits for those parks serving predominantly White and high-income urban populations. This finding demonstrates the feasibility of using smartphone mobility data to track changes in parks equity over time.

Limitations
SafeGraph park visits data have not been validated against traditional data sources. However, prior work has shown that SafeGraph data track closely with Google Community Mobility Reports for park visits, which are derived from smartphone locations using different methods   Notes. Quasi-poisson regression model also included fixed effects for calendar months, calendar years, temperature (basis spline with 3 knots) and COVID-19 death rates (natural log). Standard errors are clustered by city and month. a Beta coefficients are expressed as percent change in visits rate. . This study is vulnerable to ecological fallacy because it is based on aggregate park-level data, not individual-level data. We were unable to assess the manner in which city parks were used, nor the characteristics of the persons using them. Because we could not observe the race or income of park users, we assigned these characteristics at the park level based on the demographics of the population living within a short walk. Park users might travel substantially farther than this distance, (Saxon, 2019) but are likely to visit neighborhoods with demographic compositions similar to their own. (Wang et al., 2018) Therefore, we considered nearby demographics a suitable proxy for park user characteristics. Moreover, while smartphone uptake varies marginally by race and moderately by income, (Pew Research, 2018) our longitudinal design controls for baseline differences in smartphone ownership within each park service area, and also for time-varying changes in the denominator (i.e., devices the SafeGraph sample by city).
Finally, we refer to "closure" and "reopening" periods based on the general trends in public policy and population mobility, rather than policies that closed or reopened specific parks in our sample, which we did not observe.

Conclusions
COVID-19 has sharpened focus on city parks as important public resources, but racialized economic privilege may have influenced park access during the pandemic. Currently, few tools exist to monitor the use of parks. To advance the urgent cause of park equity, participatory methods (Curran & Hamilton, 2012;Daigneau, 2015) are needed to investigate barriers and advance solutions. Our study demonstrated that, despite some limitations, smartphone mobility data could assist these efforts by quantifying and tracking disparate outcomes.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.