A GIS-based analytical framework for evaluating the effect of COVID-19 on the restaurant industry with big data

ABSTRACT COVID-19 cripples the restaurant industry as a crucial socioeconomic sector that contributes immensely to the global economy. However, what the current literature less explored is to quantify the effect of COVID-19 on restaurant visitation and revenue at different spatial scales, as well as its relationship with the neighborhood characteristics of customers’ origins. Based on the Point of Interest (POI) measures derived from SafeGraph data providing mobility records of 45 million cell phone users in the US, our study takes Lower Manhattan, New York City, as the pilot study, and aims to examine 1) the change of restaurant visitations and revenue in the period prior to and after the COVID-19 outbreak, 2) the areas where restaurant customers live, and 3) the association between the neighborhood characteristics of these areas and lost customers. By doing so, we provide a geographic information system-based analytical framework integrating the big data mining, web crawling techniques, and spatial-economic modelling. Our analytical framework can be implemented to estimate the broader effect of COVID-19 on other industries and can be augmented in a financially monitoring manner in response to future pandemics or public emergencies.


Introduction
The outbreak of COVID-19 has caused economic damage in multiple industries and imposed negative impacts on economic growth in a long term. The hospitality and restaurant industry which relies on tourism, events, and eating meals out of homes is one of the most hard-hit sectors during the pandemic due to lockdown, stay-at-home orders, and other social restrictions implemented to limit human mobility (Gursoy & Chi, 2020). Estimating the effect of the COVID-19 pandemic on industries plays an important role for policy making and planning the post-pandemic economic recovery, although it is a tough task due to limited data availability. There is an urgent need to quantify the effect of COVID-19 on the restaurant industry as one of the most pandemic-affected industries.
Existing studies have investigated the influence of COVID-19 on the restaurant reservation and consumption (Peng & Chen, 2021), revenue loss at the country level (Nhamo et al., 2020) or at the company level (Song et al., 2021), restaurant viability (Gkoumas, 2021), restaurant operations (Brizek et al., 2021), and consumer's risk perceptions about restaurant food and packaging (Byrd et al., 2021). These existing works are largely from the perspective of hospitality management, in reliance on qualitative investigation or survey-based analysis to reveal the nexus between the pandemic and restaurant industry. The analytics in these studies remain at a coarse level (e.g. country) or are subject to smalldata issues such as under-representativeness and limited spatial and temporal coverage. What the current scholarship has less explored is to quantify how the restaurant industry has been affected by COVID-19 based on fine-level datasets with large spatial and temporal coverage which enable researchers to monitor restaurant visitations, the spatial and socioeconomic disparity of where customers reside, and the interrelationship among these measures.
To tackle this knowledge deficit, this study aims to establish a geographic information system (GIS)-based analytical framework to estimate the effect of COVID-19 on the restaurant industry via investigating the change of restaurant visitations in the period prior to and after the COVID-19 outbreak (1 January 2019 to 31 December 2020), the areas where restaurant customers live, and the association between the neighborhood characteristics of these areas and lost customers. This analytical framework integrates GISbased statistical and modelling techniques drawing on multi-source data. We create an integral restaurant dataset containing 1) the Point of Interest (POI) data and human mobility data retrieved from SafeGraph that provides mobility records of 45 million mobile phone users in the US, 2) restaurant attributes crawled from Yelp, a commercial restaurant review website, and 3) demographic and socioeconomic data from the US Census Bureau. We test out this analytical framework empirically in Lower Manhattan, New York City, as the pilot study and reveal the generality and reproducibility of our analytical framework to estimate the broader effect of COVID-19 on other industries in response to future pandemics or public emergencies.

US restaurants in the context of COVID-19
In the US, the restaurant industry is one of the crucial socioeconomic sectors that contributes immensely to the national economy. Restaurants emerged from the Post-Second World War economic boom, providing not just food, goods and services but also consumption spaces to new demand for high-end amenities (Bocock, 2008). There are many types of restaurants in the US, ranging from commercial franchises and fast-food chain stores to individually/privately owned restaurants at different classes (e.g. take-out dominant restaurants or fine restaurants). In the process of urbanization and urban redevelopment, restaurants are also proposed as "cheap and quick" solutions to revitalize local economy and decaying urban areas, and provide employment opportunities to populations with low socioeconomic status (Small, 2017). In this sense, restaurants play different and important roles in the economy, society, and daily life of citizens.
Since the global outbreak of the COVID-19 pandemic in early 2020, the restaurant industry has become one of the hard-hit industries due to restriction policies that were implemented to limit human mobility (e.g. lockdown, stay-at-home orders, travel ban, and the closure of public transit) (Gursoy & Chi, 2020). In the US nationwide, the restaurant industry was estimated to furlough more than eight million employees and lost around $240 billion in the whole year of 2020 (Nhamo et al., 2020). There were limited studies in the current scholarship that attempted to evaluate the impact of COVID-19 on the restaurant industry. Nhamo et al. (2020) employed data from OpenTable, an online restaurant reservation company, to evaluate the restaurant revenue loss at the country level (also including the US). Song et al. (2021) used the publicly traded US restaurant firm data to estimate how restaurant firms' pre-pandemic characteristics moderate the impact of COVID-19 shock on stock returns in the US restaurant industry. Peng and Chen (2021) examined how consumers' attachment to luxury restaurants and their emotional ambivalence contribute to the abandonment of restaurant reservations during the COVID-19 pandemic. Brizek et al. (2021) conducted a survey about restaurant operations at the early stage of the pandemic by assessing perceptions and perspectives of independent fullservice restaurant operators. Other studies (e.g. Byrd et al., 2021;Gkoumas, 2021) more focused on restaurant viability and consumer's risk perceptions about restaurant food and packaging. However, these existing studies were largely in reliance on qualitative investigation or survey-based analysis to reveal the nexus between the pandemic and the behavior of restaurant consumption.
In addition, there were a few works providing quantitative evaluation on the effect of COVID-19 on the restaurant industry. For example, Banerjee et al. (2021) revealed the significant differences in restaurant visits between rural and urban counties after shelterat-home orders, drawing on SafeGraph's core places dataset which provides visitation records to different types of points of interest in the US. Glaeser et al. (2021) developed a model to predict how lifting stay-at-home orders affects the dine-out behaviour of customers, using restaurant activity data. However, these studies were either at a coarse level (e.g. Nhamo et al., 2020) or lacking evidence about the varying effect of COVID-19 on different types of restaurants and the change of restaurant customers (e.g. their origins). To provide concrete evidence for government and policymakers, it is much needed is to quantify how the restaurant industry has been affected by COVID-19 based on fine-level datasets with large spatial and temporal coverage which enable researchers to monitor restaurant visitations, the spatial and socioeconomic disparity of where customers reside, and the interrelationship among these measures -the objectives that our study aims to achieve.

Neighborhood characteristics of restaurant customers
Prior studies on restaurants and their customers have largely focused on the relationship between restaurant types and locations, food offerings, the personal characteristics of restaurant customers, and/or their neighborhood characteristics in terms of the demographic composition, socioeconomic status, and locations (e.g. Bagozzi et al., 2000;Hyun, 2009;Liang & Andris, 2021;Morland et al., 2002). In the particular domain of neighborhood characteristics related to restaurant visitations, there were some consensuses from the literature showing that lower income populations tend to be more likely to dine or order food from fast-food restaurants (Austin et al., 2005) or likely to live in food deserts (Beaulac et al., 2009). Chain restaurants were observed to appear more frequently in African American and low-income neighborhoods (Block et al., 2004). In contrast, highincome groups had a high percentage of visiting independent restaurants (Carroll & Torfason, 2011). Bowman et al. (2004) contributed a nationwide evaluation showing that increased fast-food consumption was associated with factors such as gender, age, race/ethnicity (e.g. non-Hispanic black), and residential regions. Also, restaurants with cuisines from different countries were favored by different social/ethnic groups. For example, Chinese restaurants had higher proportion of customers as Chinese (Liu & Lin, 2009) and accordingly the location of Chinese restaurants were more likely to be close to Chinese communities. Referring to existing studies, we selected a number of neighborhood characteristics (e.g. age structure, ethnicity, education and income levels, employment, and household composition) to examine if the impact of COVID-19 on the customers' origins varies across neighborhoods.
In addition to the demographic and socioeconomic characteristics of neighborhoods, working and transport modes that affect the access to restaurants are also relevant to restaurant visitations (Gkiotsalitis & Cats, 2021;Wen et al., 2022), especially after the outbreak of COVID-19-in the pandemic era fulfilled with policy implementations to change people's working and travel modes. A number of studies on public health show that essential workers were more mobile than non-essential workers during the pandemic peak given they had to work onsite with the exemptions from COVID-19 policy (e.g. workat-home orders) (Zhang et al., 2021). In addition, regional differences, reflected by the distance to city centers, can also influence restaurant visitations (Banerjee et al., 2021). Burgoine and Harrison (2013) concluded that there are more food outlets and a wider variety of foods available in urban than in rural regions. Urban residents had a higher incidence of eating food away from home than rural residents given they had easier access to various restaurants (Dean & Sharkey, 2011). This highlights the importance of investigating the effect of COVID-19 differentially for restaurant customers' origins -in both urban and rural regions with different distances away from city centers. With this regard, our study takes account of commuting and working modes (e.g. the percentage of people working at home, needing for short commute, with/without cars, and using public transit for commuting) and the location of neighborhoods (e.g. the distance from a customer's origin to a city center) in the analysis to reveal the relationship between restaurant visitations and customers' origins.

Study area
We took Lower Manhattan, New York City (Figure 1), for a pilot study due to the technical and contextual reasons justified as below. New York City is highly populated and compact. It is a hard-hit frontier at the early stage of the pandemic (February to March 2020) that experienced rapid virus transmission and infection before the nationwide virus spread (Thompson et al., 2020). Lower Manhattan is the hard-core of New York City, where the world financial engine Wall Street is located and populated by high-rise office buildings, white-collar workers and commuters who are likely to be the customers of restaurants locally. It is also one of the major tourism destinations that attract a large influx of visitors that contribute to restaurant business before the pandemic. After the COVID-19 outbreak, lockdown and work-at-home orders, and dining-out restrictions had profound impacts on restaurant visitations and revenue although such impacts remain under-investigated. Thus, Lower Manhattan serves as a good contextual testbed for our proposed analytical framework. Another reason for limiting the study area to restaurant in Lower Manhattan is that the technical barrier existing in combining multi-source datasets (detailed in Section 3.2). We have to manually check the restaurant ID to merge multiple datasets for the analysis; it can be extended to all restaurants in the whole US once the technical barrier is overcome in future studies. It is worth noting that our target restaurants, as the destinations of restaurant visitations, are located in Lower Manhattan, NYC, while the origins of visitors spread out to all over the US. Thus, our analyses of the spatiotemporal patterns of the origins of visitors are multi-scale at the state, country, and tract level.

GIS-based analytical framework
We constructed GIS-based analytical framework in a three-step procedure ( Figure 2). It commenced with data collection from three data sources to retrieve four datasets and then employed a range of data processing and manipulation to create an integral restaurant dataset to be used for ensuing data analysis. Then, three sets of spatial statistical analyses were conducted to estimate the change of restaurant customers and revenue, where lost customers reside, and how they associate with the local neighborhood characteristics.

Step 1: data collection
We collected four datasets from multi-sources (Table S1). First, Dataset 1 contained the Point of Interest (POI) records retrieved from SafeGraph (2020), a commercial dataset tracking 45 million consumers' mobile devices with their consent. It uses GPS pings from different mobile applications to estimate foot traffic patterns, and provides daily visits to hundreds of thousands of points of interest in the US, including restaurants. SafeGraph has been widely used in COVID-19 related studies to unveil human mobility and place visitations (e.g. Huang et al., 2022;Kashem et al., 2021;Weill et al., 2020). Dataset 1 provided a restaurant list containing the unique identifier ("Placekey"), name and location (latitude and longitude) of total +800,000 restaurants, and other eating places defined in the North American Industry Classification System (North American Industry Classification System, 2021) with the NAICS code of 7225 in the US; among them, 240 restaurants were located in Lower Manhattan, New York City. All 240 restaurants remained open for business in 2020-2021 as their names appeared in both years (2019 and 2020) in the SafeGraph dataset. Second, based on the "Placekey" of 240 restaurants, we further retrieved origin-destination records (Dataset 2) from SafeGraph, containing information about the origin of customers as the home census tract (i.e. census tract "ID"), the number of visitations in a restaurant per week, and the month and year of such visitations (Li et al., 2021). It is worth noticing that SafeGraph only accounts for the visitations of a place if such a visitation lasts longer than 4 minutes. In other words, people coming to a restaurant to pick up take-away orders would be not accounted if the time staying in a restaurant is less than 4 minutes, neither for home deliveries -which may introduce some data bias that we would discuss in the ending section.
Third, we collected the attributes of restaurants (Dataset 3) from Yelp (www.yelp.com), one of the most popular customer-review websites in North America, providing the name of restaurants, its location (latitude and longitude), type (e.g. Chinese and Italian), and the average cost per customer (in the US dollar). Yelp data has been widely applied in the restaurant studies in the context of COVID-19 (e.g. Karniouchina et al., 2022;Kostromitina et al., 2021;Luo & Xu, 2021). To retrieve Yelp data, we defined the searching keywords as "Lower Manhattan" in the "Restaurant" category, resulting in 190 restaurants. Fourth, the demographic and socioeconomic characteristics at the census tract level (Dataset 4) were retrieved from the US Census Bureau (2019), including the number of population and children, age, ethnicity, household income, family composition, car ownership, education, labor force status, occupation, and access to public transport. They were used as independent variables in the ensuing regression analysis.

Step 2: data processing and manipulating
We implemented a range of data manipulation to process the four datasets by locating the restaurants based on the X, Y coordinates collected from Yelp and matching up their locations and names from Dataset 1 and 3, consequently generating a final list of 147 restaurants in Lower Manhattan as our study population. Based on these 147 restaurants, Dataset 2 was filtered to 94,567 visitation records with 69,518 (75.5%) in 2019 and 25,049 (26.5%) in 2020. In Dataset 3, restaurant types originally provided by Yelp were arbitrary with mixed classifications based on food types, cuisine, and places; they were reclassified on the primary type (Table 1). In addition, the average cost per customer originally retrieved from Yelp was in a categorical form (e.g. ranging from $ ($5-15) to (above $60)) which had been further numericized as $10, 20, 45, and 60 in the calculation of revenue. Furthermore, it is worth noting that if the number of visitations per week in a particular restaurant is 1, it would be dropped from SafeGraph due to privacy concern; if less than or equal to 4, it is recorded as 4 though it is likely to be 2, 3, or 4 in reality. To address this data bias, we generated randomized numbers (i.e. integers ranging from 2 to 4) proportionally through a curve estimation algorithm (see Supplementary Note 1 that contains Figure S1 and Table S2). This final restaurant dataset (Dataset 5) integrated all information from Dataset 1 to 4, including restaurant types and average costs per customer which enabled us to analyze the change of restaurant customers and revenue by restaurant type from 1 January 2019 to 31 December 2020.

Step 3: Spatial analytics
We conducted three sets of spatial statistical analyses at different spatial scales. First, we generated a statistical summary of restaurant customers and revenue by type at the national level via cross tabulation and box plotting. We also took account of the data representation that the sampling size of SafeGraph was based on the 10% of mobile devices in the US when calculating the lost customers and revenue that should be enlarged by ten times for more realistic estimates (SafeGraph, 2020). Second, an origindestination analysis was employed to track the change of customers and revenue from 2019 to 2020 in their origins at both the state and county level. We also graphed the relationship between the number of lost customers and the distance between customers' home states and Lower Manhattan, as well as mapped out the most popular restaurant in home counties (origins where restaurant customers reside).
Third, we employed an ordinary least squares (OLS) regression to test out the relationship between lost customers at the census tract level and the neighbourhood characteristics of home tracts (the origins where restaurant customers reside). OLS is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares -minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the given dataset and those predicted by the linear function of the independent variable. The OLS regression is written as: where X denotes the matrix containing a set of observed independent variables (i.e. demographic and socioeconomic characteristics of census tracts); α denotes the coefficient of the matrix X; δ denotes the intercept;ε denotes the error terms. Based on the result of the OLS regression (see Supplementary Table S3), we check the spatial autocorrelation of the residuals generated by OLS. If the residuals are spatially autocorrelated, it means that the OLS model may not be reliable because of violating the assumption of OLS that independent variables should be indirectly related. Spatial autocorrelation is characterized by a correlation in a signal among nearby locations in space. Statistically, we utilized the Moran's Index as a measure of spatial autocorrelation (technical details in Anselin et al. (2010)). The value of Moran's Index ranges from −1 (negative spatial autocorrelation) to +1 (positive spatial autocorrelation). In our analysis, the spatial autocorrelation report ( Figure S2) generated in ArcGIS Pro 2.8 shows that the z-score is 46.412 and Moran's Index is 0.028 with p-value as 0, indicating there is a strong spatial autocorrelation in the residuals generated by OLS and further revealing that there is a need to upgrade the OLS model to a spatial autoregressive model to avoid the bias caused by spatial autocorrelation of independent variables. Thus, we employed a spatial lag model as a typical type of spatial autoregressive model to optimize the modelling performance.
SLR is a linear spatial autoregressive regression model originated in spatial econometrics. SLR has the advantage of diminishing the data bias brought by the potential spatial autocorrelation of dependent and independent variables, revealing the spatial heterogeneity among variables, and avoiding unreliable significance tests (Anselin, 2009). It is also capable of providing location-specific parameter estimates at the regression points in a given spatial unit (Fotheringham et al., 2003). The parameter estimates are mappable to facilitate interpretation and to highlight spatial variation in the relationship between lost customers and neighborhood characteristics. SLR involves the construction of a spatial weight matrix, defined by setting up the first-order rook's move contiguity (adjacent edges) and using the diagnostics from GeoDa to determine the most appropriate weight matrix (Anselin et al., 2010). In our SLR model, the dependent variable Y denotes the lost customers from 2019 to 2020 in each census tract, normalized by the total population (Table 1), based on the assumption that the propensity of people for dining out is similar across urban space and the count of restaurant customers in an area unit is propensity-controlled, proportionally associated with the total population of that area (Ali et al., 2019). Each of Y was considered as a spatial lag variable on the idea that the number of lost customers in one census tract was spatial autocorrelated with that in its neighbouring tracts (with a spatial-lag effect), given that nearby neighborhoods were more likely to be affected by the same COVID-19 restriction policies and/or to have similar demographic and socioeconomic characteristics that may relate to similar dietary habits (Cunha et al., 2010). Thus, the SLR model is specified as (Anselin et al., 2010): where X denotes the matrix containing a set of observed independent variables (i.e. demographic and socioeconomic characteristics of census tracts); α denotes the coefficient of the matrix X; D is the disturbance of spatial weight matrix of each dependent variabley defined in Eq. (2); ε denotes the error terms; δ denotes the spatial autoregressive structure of the spatial weight matrix; W y (bold as a vector variable) denotes the spatial weight matrix of the dependent variable y 1;2...k , calculated as an inverse distance weighting in Eq. (4): where d ij denotes the distance between the centroid of spatial unit (i.e. a census tract) i and j; σ is a positive exponent, typically σ = 1. We ran three SLR models for the combination of New York state and New Jersey state as a whole, and each of the two states, respectively, in GeoDa, an open-source GIS software (Anselin et al., 2010). The SLR results included a set of global coefficients as the mean of local coefficients in all census tracts to indicate the overall relationship between each dependent and independent variable ( Table 2). The coefficients of independent variables were standardized to be comparable across different census tracts. The SLR results also provided a global R-square to indicate the overall model performance and a series of local R-squares in a given census tract to reveal the spatial variation of the relationships between lost customers and neighborhood characteristics.

Change of restaurant customers and revenue from 2019 to 2020
The number of lost customers and revenue from 2019 to 2020 by restaurant type are shown in Figure 3. For the range of lost customers (correspondingly the range of origins where lost customers reside), it is largest in other Asian restaurants (including Burman, Laotian, and Indonesian restaurants), followed by Mexican, Italian, and Japanese restaurants. For the range of lost revenue per restaurant, it is again largest in other Asian restaurants, following by Mexican, Italian, Japanese, and American restaurants. However, for the total lost revenue (in the table underneath X-axis), American restaurants rank the top (loss of $7,993,600), followed by Italian ($5,746,100) and Japanese restaurants ($3,890,300). It is possibly due to American restaurants (a total of 24 in Lower Manhattan) being more than other restaurants. For the average lost revenue per restaurant, other Asian restaurants rank the top (the average loss of $549,700 per restaurant), followed by French ($368,600) and American restaurants ($333,100). It indicates the COVID-19 pandemic has a more severe impact on other Asian restaurants Table 2. Dependent and independent variables used in the regression model.

Dependent variable Definition Lost customers
The number of lost customers from 2019 to 2020, normalised by total population in a census tract Independent variable Definition % Schooler The percentage of schoolers over the total population in a census tract % Elderly The percentage of people at and above 65 years old over the total population in a census tract % White The percentage of White population over the total population in a census tract % Households with children The percentage of households with child over the total households in a census tract % Low education The percentage of the population in 25 years and over less than high school plus the population in 25 years and over with 9-12 grade (no diploma) plus the population in 25 years and over with high school (or in equivalence) over the population with 25 years and over in a census tract % Unemployed The percentage of unemployed people over the total population in labour force in a census tract % Low income The percentage of housing units less than annual income $14999 over the total occupied housing units in labour force in a census tract % Short commute The percentage of people that commute less than 10 minutes over the total population that commute to work % Household with no cars The percentage of households without cars over the total households in a census tract % Work at home The percentage of people working at home over the total population in labour force in a census tract % Public transit commute The percentage of people that commute by public transit over the total population that commute to work Population density The total number of population over the areal size of a census tract Distance (km) The distance between the centroid of one census tract and the centroid of Lower Manhattan Source: US Census Bureau (2019).
(including Burman, Laotian, and Indonesian restaurants), in terms of lost customers and the average lost revenue per restaurant, compared to the major Asian restaurants such as Chinese, Japanese, and Korean restaurants.

Locales of lost customers and revenue by state and county
The spatial pattern of home states with the percentage of lost customers from 2019 to 2020 is revealed in Figure 4. A number of states in the central and central-north US mainland (i.e. Wyoming, Nebraska, North Dakota, Montana, Idaho, Iowa, and Wisconsin) encounter the most obvious losses of customers above 87%. It is possible due to that these states locate far away from NYC with large areas in rural and natural landscape and relatively less population and have a smaller number of visitations to NYC restaurants before the pandemic. Social restriction policies implemented after the COVID-19 outbreak (e.g. cancellation of air flights) impede the human mobility in these statues that may lead to a substantial decrease of customers visiting NYC restaurants in 2020. Conversely, the states nearby NYC (i.e. New York, New Jersey, Vermont, Connecticut, Massachusetts, and Pennsylvania) have a relatively lower percentage of lost customers (less than 55%). The relationship between lost customers and the distance between states and NYC is graphed in Figure 5. It has a clear exponential pattern revealing a distancedecay effect that a shorter distance from a state to NYC is associated with a lower percentage of lost customers. It is particular the case for the state of New York, New Jersey, and Connecticut, as indicated in Figure 5, possibly explained by the fact that people living nearby or within NYC have easy access to restaurants in Lower Manhattan compared to those living in home states far away. We further examine the percentage of lost customers and lost revenue in home counties with the top 20% visitations to NYC in both 2019 and 2020 and these counties are all located in New York and New Jersey (Figure 6). It is clear that counties far away from NYC in the north of New York state have relatively a smaller number of customers in 2019 and 2020 (indicated as light blue lines in Figures 6A and 6B) compared to the nearby counties with a larger number of customers (indicated as purple lines). However, the pattern in the percentage of lost customers varies across space ( Figure 6C). Counties with less than 43% of lost revenue per customer (dark blue areas) are associated with lower percentages of lost customers (less than 55% indicated as light blue and blue lines) and such counties are populated and urbanized areas where medium-and small-sized cities (i.e. Albany, Syracuse, Rochester, and Buffalo) are located. In contrast, counties with higher lost revenue per customer (yellow areas) are associated with  higher percentage of lost customers (purple lines) and such counties are remote and rural counties located in the north and west of New York.
We mapped out the type of restaurants in Lower Manhattan most favored by each county (reflected by the largest number of visitations) in 2019 and 2020 (Figure 7). There are 63 counties (80.7% out of the 78) that changed the favorite restaurant types from 2019 to 2020 (Figure 8). Among these 63 counties, 51 counties (80.9%) spend less average cost per customer in their favorite restaurants (a decrease ranging from $5 to $45), reflecting the potential financial impact of COVID-19 on the restaurant consumption of population who may tend to spend less in dining outside. It may be also partially explained by the COVID-19 restriction policies that customers may change more toward restaurants with easy access for drive-in or take-out (e.g. American restaurants) which were usually cheaper than fine restaurants (e.g. French restaurants).

Relationship between lost customers with neighborhood characteristics
The standardized coefficients generated by the SLR model (Table 3) indicate to what extent lost customers are associated with the demographic and socioeconomic characteristics of census tracts. In general, variables including the population density, distance, and the percentage of schoolers, low education, low income, short commute, and public transit commute are negatively (at least p < 0.05) associated with lost customers. It reflects that home tracts with larger proportions of schoolers and commuter by public transit tend to lose less customers visiting the restaurants in Lower Manhattan during the pandemic. It somehow implies that COVID-19  restriction policies (e.g. school closure and the temporary closure of public transit) may impede human mobility during the pandemic peak but have limited impacts on restaurant visitations in the yearly basis. What is beyond our expectation is that areas with a lower socioeconomic status (e.g. the concentration of low-educated and low-income population) tend to lose less customers visiting the restaurants. It could be possible that essential workers, usually with a lower socioeconomic status, had the exemptions from the COVID-19 restriction policies even during the pandemic peak and need to work onsite regularly with freedom for dining outside.
In contrast, home tracts with higher proportions of elderly, white population, households with children and without cars and people working at home tend to lose more customers (p < 0.05). It is partially because unemployed people who may have financial concerns or who may largely stay at home without going outside for jobs tend to dine outside less frequently during the pandemic. Moreover, households without cars might have difficulties to travel to restaurants in NYC. It might also be explained by the implementation of lockdown and homedwelling orders that keep more people stay at home, in particular, the elderly in face of higher risk of infection and people working at home who may have less intent to go outside. When breaking down to different states, the relationship between the preceding variables and lost customers remains consistent in home tracts in New York while slightly inconsistent in New Jersey. More specifically, the discrepancies between New Jersey and New York lie in the fact that the percentage of white insignificantly associated with lost customers in New York become significant in New Jersey (coefficient = 0.135, p < 0.01). But the percentage of unemployed and households without cars significantly associated with lost customers in New York become insignificant in New Jersey. It reflects that the spatial heterogeneity exists in the interrelationship between lost customers and neighbourhood characteristics and further reveals the diverse impact of COVID-19 on restaurant visitations and the shift of dining-out behaviours.

Discussion and conclusion
Our study constructs a GIS-based analytical framework to quantify and evaluate the impact of COVID-19 on the restaurant industry that can be extended to other industries. This analytical framework consists of big data mining and web crawling techniques, and GIS-based analytics to create and analyze an integral restaurant dataset in a given study area. Taking 147 restaurants in Lower Manhattan, NYC as the pilot study, we find that the COVID-19 pandemic has a more severe impact on other Asian restaurants (e.g. Burman, Laotian, and Indonesian restaurants) in terms of lost customers and average lost revenue per restaurant compared to the major Asian restaurants (e.g. Chinese, Japanese, and Korean), European restaurants (e.g. Italian and French), American and Latin American restaurants. There is a distance-decay relationship between lost customers and locales where customers reside. Home states or counties with shorter distances to NYC are associated with lower percentages of lost customers. Comparably, people living within NYC or areas nearby NYC in New York, New Jersey, and Connecticut have relatively easy access to restaurants in Lower Manhattan and thus are subject to weaker influence by the pandemic on their restaurant visitations. There also exists a potential financial impact of COVID-19 on people's average consumption on restaurants, and they tend to spend less in dining in restaurants during the pandemic. Moreover, the interrelationship between lost customers and neighbourhood characteristics varies across space, further revealing the diverse impact of COVID-19 on restaurant visitations and the shift of dining-out behaviours. Our findings and analytical framework advance the current knowledge in the field of industrial evaluation, hospitality management, and policy making in a number of ways. First, our analytical framework for evaluating the impact of COVID-19 on the restaurant industry can be implemented to other industries and be developed as parts of economic initiatives in response to future pandemics or public emergencies. The POI data derived from SafeGraph contain multiple types of places including green parks and national parks, supermarkets, pharmacies, transport stations, liquor stores, to name a few (SafeGraph, 2020). By following our analytical framework, further efforts can be made to estimate the effect of COVID-19 on a particular industry based on the visitation to avenues in that industry. It also has great potentials to be extended to a nationwide evaluation of the restaurant industry during the pandemic in the US, based on the nationwide records of +800,000 restaurants. Second, while existing research also examined restaurant visits using SafeGraph data (e.g. Banerjee et al., 2021), our analytical framework enables the estimation of lost revenues across different types of restaurants, by integrating SafeGraph data with Yelp data. Our findings are at various spatial scales that can be used for policy making in governments at different levels. Third, our findings enrich the existing qualitative studies that dominantly focus on hospitality management (e.g. Brizek et al., 2021;Byrd et al., 2021;Gkoumas, 2021;Nhamo et al., 2020;Peng & Chen, 2021;Song et al., 2021) with quantifiable and mappable measures in terms of lost restaurant customers and revenue, and the locales where lost most customers and revenue. These spatial explicit measures can be adopted by end-users with specific purposes.
Our study provides empirical evidence for designating effective strategies and economic recovery initiatives in the post-lockdown era. For local government officers (e.g. a town supervisor), they may be especially interested to know how the restaurants in the local districts of their charge have been affected by the COVID-19 pandemic. While such information could be collected via surveys with the restaurant owners, such a process requires a considerable amount of financial and labor resources that may not be available to the local district after the hard hit of the pandemic. Even when such resources are available, restaurant owners may not be able to accurately memorize the number of customers or estimate the revenue loss. Our analytical framework, therefore, provides a possible solution for local government officers to better understand and estimate the negative impact of COVID-19 on local restaurants with a lower cost of resources and time. Such impact estimations can then support the development of suitable financial programs to subsidize the affected restaurants. In addition, our analytical framework can also provide information on the general regions where the customers of a restaurant come. Such geographic information could be distributed to restaurant owners to help them decide where to put new advertisements to help attract their customers back during the recovery process.
There are several limitations in our study that can be further improved by future efforts to extend our findings. First, given the privacy concerns, the tract-level records with weekly visitation below four have been roughly assigned as four (see Step 2 in Section 3.2). We have to rectify these records by generating randomized numbers via a curve estimation (i.e. integers ranging from 2 to 4) to reduce the data bias caused by the fuzzy records. Despite that the estimated visitation counts might not perfectly comply with restaurant visitation patterns in the real world, such a tailored, data-driven randomization process allows us to approximate real restaurant visitation patterns in a reasonable manner. Future efforts can be made to explore other data sources in order to accurately capture the POI visitation patterns of places with low visitation records. However, extra caution needs to be exercised to avoid the potential violation of users' privacy during data retrieval and processing. Second, we aim to explain the loss of restaurant visitations in NYC by selecting a total of 13 explanatory variables from demographic, socioeconomic, and travel behavioural perspectives. Although it has been observed that there are strong relationships between these variables and restaurant visitations, we cannot rule out the possible contribution of other factors that are not included in this study. Future studies are encouraged to incorporate more explanatory variables and extend the study area to a larger spatial scale (e.g. the whole US territory). Furthermore, it is suggested to select uncorrelated factors to reduce the potential concern of multicollinearity, for example, by involving a principal component analysis for variable dimensional reduction purposes. Furthermore, future studies could utilize different types of models (e.g. more advanced spatial models or machine learning models) to examine the relationship between restaurant visitations and neighbourhood profiles of customers' origins. Third, the POI dataset we used only documents the physical visitations of restaurants based on the criterion that a visitation would only be accounted if the duration of that visit to a given POI lasted at least 4 minutes. It somehow fails to consider restaurant delivery services and take-out services that last shorter than 4 minutes, which may be supplemented by running a survey or questionnaire to roughly estimate the ratio of dine-in and take-out that can be further used to improve the data accuracy.
To conclude, we constructed a GIS-based analytical framework integrating the big data mining, web crawling techniques, and spatial-economic modelling to estimate the broader effect of COVID-19 on the restaurant industry. It has great potentials to be applied to evaluate the effect of COVID-19 on other industries and extended to other geographic contexts for combating the resurge of COVID-19 in the post-pandemic era. We call for researchers, government officials, policymakers, urban planners, and public and private authorities to employ our analytical framework for prevention and control of future virus outbreaks and public health emergencies arising from the increasing level of globalisation, urbanisation, and the human invasion of ecosystems. Xiao Huang received his Ph.D. degree in Geography from the University of South Carolina in 2020. He is a faculty member (Assistant Professor) in the Department of Geosciences, and the Center for Advanced Spatial Technologies (CAST) at the University of Arkansas with his expertise in GeoAI, deep learning, big data, remote sensing, and social sensing. His teaching interests encompass undergraduate and graduate courses that involve geospatial analysis, data mining, geovisualization, and natural hazards.

Zhenlong
Li is an Associate Professor in the Department of Geography and Director of the Center for GIScience and Geospatial Big Data (CeGIS) at the University of South Carolina (USC), where he established and leads the Geoinformation and Big Data Research Laboratory (GIBD). Dr. Li is recognized as a Breakthrough Star by USC in 2020 and selected as one of the Geospatial World 50 Rising Stars by the Geospatial Media and Communications in 2021. He is also a Peter and Bonnie McCausland Faculty Fellow (2020-2023) at the USC College of Arts and Sciences. Dr. Li's primary research field is GIScience with a focus on geospatial big data analytics, high performance spatial computing, and GeoAI/CyberGIS with applications to disaster management, public health, human dynamics, and climate analysis.
Shuming Bao received his Ph.D. in applied economics from Clemson University. He is currently the director of the China Data Institute in the US, and the codirector of the Geocomputation Center for Social Sciences at Wuhan University. He was a faculty member and the director of the China Data Center at the University of Michigan in Ann Arbor before he started the China Data Institute in 2018. Dr. Bao has published more than 90 papers in the areas of GIS, regional economics, and spatial data analysis.