HoloSelecta dataset: 10’035 GTIN-labelled product instances in vending machines for object detection of packaged products in retail environments

To assess the potential of current neural network architectures to reliably identify packaged products within a retail environment, we created an open-source dataset of 295 shelf images of vending machines with 10’035 labelled instances of 109 products. The dataset contains photos of vending machines by the provider Selecta, the largest European operator of vending machines. The vending machines are a mix of machines in public and private office spaces. The vending machines contain food as well as beverage products. The product instances in the vending machine images are labelled with bounding boxes, where a bounding box encapsulates the entire product with as little overlap as possible. The labels corresponding to the bounding box consist of a structured, human-readable labels including brand, product name and size as well as the GTIN of the product. The GTIN is the global standard to identify products in the retail environment and therefore increases the value as a dataset for the retail industry. Contrary to typical object detection datasets that choose labels at a higher level such as a can or bottle for a much wider variety of objects, this dataset chooses a far more detailed label that depends less on the shape but rather on the exact design of the product. The dataset falls into the category of object detection datasets with a large number of objects, which next to the GTIN label, represents a main differentiator of the dataset to other object detection datasets.


a b s t r a c t
To assess the potential of current neural network architectures to reliably identify packaged products within a retail environment, we created an open-source dataset of 295 shelf images of vending machines with 10'035 labelled instances of 109 products. The dataset contains photos of vending machines by the provider Selecta, the largest European operator of vending machines. The vending machines are a mix of machines in public and private office spaces. The vending machines contain food as well as beverage products. The product instances in the vending machine images are labelled with bounding boxes, where a bounding box encapsulates the entire product with as little overlap as possible. The labels corresponding to the bounding box consist of a structured, human-readable labels including brand, product name and size as well as the GTIN of the product. The GTIN is the global standard to identify products in the retail environment and therefore increases the value as a dataset for the retail industry. Contrary to typical object detection datasets that choose labels at a higher level such as a can or bottle for a much wider variety of objects, this dataset chooses a far more detailed label that depends less on the shape but rather on the exact design of the product. The dataset falls into the category of object detection datasets with a large number of objects, which next to the GTIN label, represents a main differentiator of the dataset to other object detection datasets. ©

Value of the Data
Missing publicly available, labelled image data of packaged products is one of the most pressing limitations in research on product detection and identification in retail environments. Therefore, this dataset is one of very few publicly available image datasets ( Table 3 ), labelled at the product level in densely packed scenes (i.e. vending machines).
We maintain a list of relevant existing public datasets ( Table 1 ) on packaged products in retail environments on Github: Link .
• Unlike existing datasets, the HoloSelecta dataset is labelled with global trade item numbers (GTIN), which allows for data fusion with product master data. Thanks to the GTINannotation, multiple research avenues could be supported by this dataset via data fusion with nutritional composition (e.g. for health applications), logistics data (e.g. for checking shelf compliance) or prices (e.g. for self-checkout). • Moreover, this dataset contains edge cases that are hard to identify (e.g. reflections, backsideoriented products, etc.). Such important corner cases of object detection scenarios occur in the real world and are therefore important to be considered. The edge cases are not labelled explicitly but such labels will reflect realistic conditions and likely end up with lower accuracy in an object classification pipeline. • In addition, this dataset contains product meta data such as prices, brands and nutritional composition of products sold in the vending machine. This data can hence be used to guide shoppers towards an affordable or a rather healthy product, for example within a mixed reality setup where a user uses a smartphone or headset to view the products. • The data can be used to study object detection performance per se in densely packed or retail contexts using typical object detection metrics. Further the data can be used to power user studies that look at the effect of such technology on user behavior.

Table 2
HoloSelecta image annotation follows the established PascalVOC specification.

Table 3
Product data is linked via canonical product name and extendedable via gtin as secondary key (product_meta_data.py).
In addition, product meta data is provided in the attached product_meta_data.py. Both datasets, the labelled images and product meta data are linked via the product identifier (e.g. jacklinks_beefjerkyorginal__25__4047751730219 in this example below, Table 3 ).

Codelist (product data dictionary produced by product_meta_data.py):
name: Name of a product gtin: Global trade item number producer: Organization or brand that produced the product price and price_unit: Price of product at the vending machine in Swiss francs weight and weight_unit: Quantity of product in ml (beverages or water) or g (food) energy: calories (kcal per 100 g or ml of product) sugar: total sugars (g per 100 g or ml of product) sat_fat: saturated fatty acids (g per 100 g or ml of product) natrium: sodium (g per 100 g or ml of product) protein: protein (g per 100 g or ml of product) fiber: dietary fiber (g per 100 g or ml of product) health_percentage: Share of product that is composed of fruit or vegetable or nuts (according to Nutri-Score framework, from 0 to 1 ( = 100%) score: FSA points in the Nutri-Score framework from -15 (healthiest) to 40 (unhealthiest) nutriscore: Nutri-Score letter from A (healthiest) to E (unhealthiest)

Experimental Design, Materials and Methods
The HoloSelecta dataset was created by selecting a globally representative vending machine setting. We describe this setting in our publication of the HoloSelecta studies [2] . We purchased each product to collect multiple pictures of each item from outside of the vending machine. The aim was to take multiple pictures at varying angles ideally from the front of the product. The idea was to proxy the 'as-is' view of a consumer approaching the vending machine, where certain products (especially bottles) may not always appear with their frontal package display.
The dataset was collected to validate the potential of different neural network architectures to detect and identify products within a vending machine. The results in our manuscript indicate promising potential, as accuracy rates yield acceptable results. The dataset consisted of 295 vending machine images in which 10'035 product instances were labelled with labelImg [1] , an open-source tool for image labelling for computer vision applications ( https://github.com/ tzutalin/labelImg ).

Ethics Statement
The data collection did not involve any animals or humans (except for the authors who took and labelled the pictures). Therefore, no animals or humans were harmed in the data collection. The dataset does not contain any personal information and therefore does not require ethical approval by our university's ethics commission. Table 1 Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.