Sandwich analytics: A dataset comprising one year's weekly sales data correlated with crime, demographics, and weather

Data collected from a quick-serve sandwich chain over one year provide an opportunity to study market, sociographic, meteorologic, and other factors impacting sales and sales forecasting. The weekly sales table contains over 79,000 rows which each represent summary statistics for the sales of an individual menu item in one store during one week of the year. The data were collected from the point-of-sale system of 10 stores. Secondary data regarding weather patterns, population, location, competition, and crime statistics were gathered and integrated with the original data set.


Data
The data were produced from a point-of-sale (POS) system from a quick-serve sandwich chain over the duration of one year. Data were produced by 10 stores which varied in location and other characteristics (see Table 2). The primary sales data and all secondary data are genuine. Nevertheless names, addresses and other key identifying information have been altered to increase anonymity. The secondary data includes demographic data, unemployment data, crime data, weather data, and statistics on other nearby restaurants. Secondary data was primarily collected at the county level, consequently some stores will share the same secondary data. The data set includes three tables: weekly sales, store attributes, and weather reports. All three tables have been anonymized. The datasets contain data for a one-year period from April 2012 to March 2013.

Weekly sales
The weekly sales data was collected from the chain's POS system, and contains raw data for each item sold in each store for each week, from April 2012 through March 2013 (see Table 1). Profit calculations are included in the dataset, but the costs do not include labor and other variable costs; therefore, profit is gross profit not necessarily an indication of total net profit. However, the owner gave estimates that can be used for labor and approximate rent/lease values for each store, which are included in store attributes table. These could be used as a supplement to estimate net profit and illustrate cost differences in profitability.

Store attributes
The store attributes table contains descriptive attributes about each store, including location information, information about nearby schools and other restaurants, and demographic and crime   Specifications table   Subject area Business and Economics More specific subject area How sales in restaurants are affected by store characteristics, crime, population demographics and weather.

Type of data
Six tables: Sales data, store attribute data, county crime data, county demographic data, county employment data, and weather data How data was acquired Sales and store attribute data provided by the business, secondary data collected from US Census, NOAA Climatic Data Center, Local State managed data portal, Bureau of Labor Statistics, and others.

Data format
Anonymized and aggregated raw data Experimental factors The data represent a natural experiment for the investigation of factors included in the database allowing for a correlative analysis of sales of individual items, location, store features, weather, crime statistics and local demographics.

Experimental features
Time, competition, location, weather, local demographics, regional crime statistics and store features. Value of the data These data provide an opportunity to investigate the effects of weather events on quick-serve sales across stores with different configurations. Secondary data enables analysis of the effects of county specific variables such as population demographics and crime. These variables can directly and indirectly address socioeconomic and other social factors such as those addressed in [1]. Competitive pressures on quick-serve sales can be analyzed and explored in this data. Nearby quick-serve and other types of restaurants are included in the data, providing the opportunity to investigate the impact on sales of both. Related work on location and competition has been done in [2,3]. The depth and breadth of the data provides additional opportunity beyond the market, sociographic, and meteorological analyses. The data set could also be used as a teaching case for demonstrating forecasting, sociographic interactions and predictive analytical techniques.
information for the county each store is located in. The 10 stores were located in 8 cities across 6 counties (see Fig. 1). The stores include a variety of structures and locations (see Table 2). The chain owner also provided several other data points available in this table. First, stores with more traveler clients are so designated. Second, several stores served higher portions of Hispanic and Native American races. Finally, estimated rent and labor cost is provided for each store.

County data included in the store attributes table
Crime data were collected from the northern state's data portal [4]. Employment data were collected from the Bureau of Labor Statistics Local Area Unemployment [7]. Census data were retrieved from the US Census and use the 2012 Census estimate [5] (see Table 3).

Weather data
The weather data were collected from NOAA's Climatic Data Center [6]. Eight variables were retained, described in the data description below. Observations exist for each day for five weather stations which are linked back to the sales data by the store attributes table. The weather data consist primarily of wind speed, precipitation, and temperature observations. Descriptive statistics, broken down by weather station, as well as a data dictionary can be found in Table 4. Averages across different weather stations can be found in Table 5.

Experimental design, materials, and methods
The sales data were captured directly from the subject company's POS reporting system and are produced and shared with permission of the original data stewards with obfuscation of the original stores.
The secondary data were collected from a number of sources. Weather reports were pulled from NOAA's Climatic Data Center for the nearest airport to each store. The store attributes table includes both primary and secondary data: the variables describing the location of the store and proximity to schools were provided by the firm. The variables describing nearby restaurants were manually collected with google maps. Variables describing competition were collected in 2018; therefore, there is a time lag between most of the data and the data related to competitive pressure. We recognize this weakness in the data, but note that generally this competitive pressure does not change quickly.
Crime statistics were pulled from the northern state's data portal, and aggregated into violent, nonviolent, property, society, and other crimes. Violent crimes include murder, manslaughter, forcible sex, assault, and kidnapping/abduction. Property crimes were aggregated in the original data, and include arson, bribery, burglary, counterfeiting/forgery, destruction of property, extortion/blackmail, robbery, and theft. Society crimes were also provided in the original data, and include drug violations, pornography, prostitution, weapon violation, and animal cruelty. The remaining reported crimes were aggregated into other crimes and include non-forcible sex and violation of no contact order.  Demographic statistics were retrieved from the US census bureau for the 2012 census estimate on the county level in both states. Unemployment statistics were collected from the Bureau of Labor Statistics' Local Area Unemployment estimates for 2012.
The sales data were anonymized by renaming menu items and renumbering the stores. Locations were anonymized by changing the names of cities, counties, and states. None of the other data was changed so that any conclusions drawn from the data can still be valid. Keys to de-anonymize the data are held by the authors. Whether the week had 5 or more bad weather days (Yes/No)