A dataset for illuminant- and device- invariant colour barcode decoding with cameras

Barcodes are visual representations of data widely used in commerce and administration to compactly codify information about objects, services, and people. Specifically, a barcode is an image composed of parallel lines, with different widths, spacing and sizes. Generally, the lines are dark (usually black) on a bright background (usually white) or vice-versa. Thanks to this representation, barcodes can be detected and decoded in a way robust to changes of light and noise. However, using barcodes with several colours for the lines is quite intriguing because it enables boosting the barcode's data capacity. Colour barcodes still pose a challenge today, even though numerous studies on the topic were conducted between 1990 and 2022. The main issue that needs to be solved is the creation of an optical technology able to decode colour sequences regardless of the ambient light, the acquisition and printing or visualisation device, and the physical support on which the barcode is printed or displayed. To the best of our knowledge, the studies currently available in literature do not provide the experimental data on which they are based, nor are there online databases that can be used for further studies or for training data analysis procedures based on artificial intelligence techniques. To fill this gap and push further research in this technology, we built COCO-10, a public dataset of colour barcode images, that would like to become a testbench for the development and testing of colour barcode decoding algorithms, taking into account the colour variability due to the light, to the printer and camera gamuts and to the quality of the paper on which the barcode is printed. COCO-10 contains 5400 images of 150 colour barcodes, each of one printed on two white papers with different density and printers and acquired under six illuminations by three smartphones’ cameras. For each colour barcode image, a mask identifying the region occupied by the barcode is released too. The 150 colour barcodes have been generated by colouring the lines of black & white barcodes with colours randomly selected from a palette of ten colours including both warm and colour hues. The name COCO-10 just refers to the fact that the dataset contains COlor BarCOdes with 10 possible colours for each line. We also provide a set of 300 images created as follows. The 150 COCO-10 colour barcodes were synthetically superimposed on 150 cluttered backgrounds, resulting in 150 images. The first 75 (group 1) were printed on thick paper, the others (group 2) on plain paper. Each group was further subdivided into subsets of 25 images, resulting in 3 subgroups, each of which was captured by 2 smartphones’ cameras under one of the 6 illuminants mentioned above. We also provide masks for these images. These images would like to be a benchmark for testing the accuracy of barcode decoding algorithms, bearing in mind that the performance of these algorithms may be influenced by the accuracy of the previous detection of the barcodes themselves in the background. The total number of images in COCO-10 is 11700, including the 300 synthetic images of the colour barcodes displayed on white and cluttered background, the 5700 real-world images of the colour barcodes printed on white papers and with cluttered backgrounds and their corresponding 5700 masks. We finally highlight that COCO-10 can be also used for developing and testing algorithms for gamut and tone mapping, machine colour constancy, and colour correction.


a b s t r a c t
Barcodes are visual representations of data widely used in commerce and administration to compactly codify information about objects, services, and people.Specifically, a barcode is an image composed of parallel lines, with different widths, spacing and sizes.Generally, the lines are dark (usually black) on a bright background (usually white) or viceversa.Thanks to this representation, barcodes can be detected and decoded in a way robust to changes of light and noise.However, using barcodes with several colours for the lines is quite intriguing because it enables boosting the barcode's data capacity.Colour barcodes still pose a challenge today, even though numerous studies on the topic were conducted between 1990 and 2022.The main issue that needs to be solved is the creation of an optical technology able to decode colour sequences regardless of the ambient light, the acquisition and printing or visualisation device, and the physical support on which the barcode is printed or displayed.
To the best of our knowledge, the studies currently available in literature do not provide the experimental data on which they are based, nor are there online databases that can be used for further studies or for training data analysis procedures based on artificial intelligence techniques.To fill this gap and push further research in this technology, we built COCO-10, a public dataset of colour barcode images, that would like to become a testbench for the devel-opment and testing of colour barcode decoding algorithms, taking into account the colour variability due to the light, to the printer and camera gamuts and to the quality of the paper on which the barcode is printed.COCO-10 contains 5400 images of 150 colour barcodes, each of one printed on two white papers with different density and printers and acquired under six illuminations by three smartphones' cameras.For each colour barcode image, a mask identifying the region occupied by the barcode is released too.The 150 colour barcodes have been generated by colouring the lines of black & white barcodes with colours randomly selected from a palette of ten colours including both warm and colour hues.The name COCO-10 just refers to the fact that the dataset contains COlor BarCOdes with 10 possible colours for each line.We also provide a set of 300 images created as follows.The 150 COCO-10 colour barcodes were synthetically superimposed on 150 cluttered backgrounds, resulting in 150 images.The first 75 (group 1) were printed on thick paper, the others (group 2) on plain paper.Each group was further subdivided into subsets of 25 images, resulting in 3 subgroups, each of which was captured by 2 smartphones' cameras under one of the 6 illuminants mentioned above.We also provide masks for these images.These images would like to be a benchmark for testing the accuracy of barcode decoding algorithms, bearing in mind that the performance of these algorithms may be influenced by the accuracy of the previous detection of the barcodes themselves in the background.The total number of images in COCO-10 is 11700, including the 300 synthetic images of the colour barcodes displayed on white and cluttered background, the 5700 real-world images of the colour barcodes printed on white papers and with cluttered backgrounds and their corresponding 5700 masks.We finally highlight that COCO-10 can be also used for developing and testing algorithms for gamut and tone mapping, machine colour constancy, and colour correction. ©

Data collection
We synthetically generated 150 colour barcodes and we printed them on white papers with different density (80 gr/m2 and 160 gr/m2) by suitable printers ( [ 1,2 ]).We then acquired each printed image by 3 smartphone cameras (see Table 2 ) under 6 lights (see Table 3 ), and specify the position of its barcode by a mask computed by a threshold-based segmentation, followed by manual refinement (if needed).We also synthetically created a set of 150 images, each containing a colour barcode over a cluttered background, we printed each of them on paper with density either 80 gr/m2 or 160 gr/m2, and we acquired it by 2 smartphone cameras (see Tables 2 and 9 ) under one of the 6 lights in

Value of the Data
• The barcodes currently in use are generally in black and white, since this feature allow a decoding robust to changes of light and noise [3][4][5][6][7].To the best of our knowledge, although colour barcode technologies have been investigated in the past [ 8 ], there are no colour barcode datasets.Our proposed COCO-10 dataset attempts to address this shortcoming by offering a collection of real-world colour barcodes captured by many cameras in varying lighting situations and printed on two types of paper.The dataset presents a variety of difficulties (listed in the following three bullets) crucial for detection and decoding methodologies.• First, the dataset presents several colour distortions , due to the printing process (e.g., printer's gamut, ink physical features), by the paper quality (e.g., paper absorption degree and reflectance), by the acquisition devices (e.g., cameras' sensor sensitivity and gamut, automatic white balance).• Second, the dataset presents geometric distortions , like changes of scales, in-plane rotation, skew, due to the cameras' lens, resolution, and field of view and to the mutual position of cameras and printed barcodes.We highlight that the printed papers were manually positioned in front of the camera with the recommendation of putting the paper parallel to the camera as much as possible.Nevertheless, due to the manual action, sometimes perspective distortions occur.Noise due to paper slight wrinkles, physical characteristics of the acquisition devices and -in few cases -some focus blur due to errors in acquisition are sometimes present too.• Third, palette colours and barcode structure make barcode decoding particularly challenging.
In fact, the COCO-10 palette contains 10 colours, among which some are clearly distinguishable from each other and some that are less so, especially under certain intensities and types of light and for certain acquisition devices.Moreover, unlike in [ 5,9,10 ], in COCO-10 there are no constraints on barcode length and barcode colour repetition are allowed.• Given the challenges described above, it can be understood how difficult the colour barcode decoding is, for example for clustering-based algorithms (e.g., [ 5 ]), and how necessary disposing of testbed datasets for decoding algorithms in real world scenarios is.COCO-10 has also the merit of having been obtained with an acquisition technique within the reach of any user with a camera: this favours the reproducibility and extension of the dataset.• Finally, the quantity of images provided, and the availability of masks make COCO-10 an ideal resource for machine learning algorithms.Moreover, development and test of algorithms for gamut and tone mapping, machine colour constancy, colour correction are additional tasks that can benefit from the use of this dataset.

Data Description
The dataset COCO-10 contains 11700 images, organized in three main directories (see Fig. 1 and Table 1 ).Precisely: (1) COLOUR-BARCODES -This directory includes the 150 images of the colour barcodes synthetically generated form black & white barcodes (see Fig. 2 ).These images are saved in (2) COLOUR-BARCODES-ON-WHITE-PAPER -This directory contains six folders, each of which named CAMERA-X-PAPER-Y, where X indicates the camera used in the acquisition and Y is the density of the white paper on which barcodes have been printed.For instance, directory CAMERA-1-PAPER-80 contains the images of the 150 barcodes printed on white paper

Experimental Design, Materials and Methods
In this Section we report the steps made for the generation / acquisition of the images in the three main directories of COCO-10, within some analysis performed on the acquired data to provide information about the size of the acquired barcodes and the illuminations.

Images in COLOUR-BARCODES
The first step for the creation of COCO-10 was the generation of its colour barcodes.For this task, we created 150 black & white barcodes from free online websites, i.e. [11][12][13], and we wrote a specific algorithm for colouring their lines.This algorithm assigns to each line a colour, randomly picked up from the palette of 10 colours shown in Fig. 2 (a).The 150 colour barcodes are saved as PPM images and stored in the folder named COLOUR-BARCODES.Some examples are shown in Fig. 2

(b).
Microsoft's pioneer work [ 5 ] proposed a colour barcode technology exploiting four colours only and constrained each barcode to contain all these four colours, that were sharply distinguishable from each other, and thus easy to be recognized under several lighting conditions.Although the capacity of these barcodes is quite low, this Microsoft technology was appealing, and other approaches were developed to extend the number of colours.A very good result was obtained in [ 9,10 ], where barcodes may contain 24 colours, but their recognition requires several a priori information about the sensor gamut and the possible variability range of the illuminations.The palette used in COCO-10 contains 10 colours (see Fig. 2 (a)), a number we chose a intermediate point between the 4 colours initially used in [ 5 ] and the 24 ones used in [9] and [10] .Our palette contains both warm and cold colours, some of them are clearly distinguishable from each other (see e.g., the first and the last colours in Fig. 2 (a)), while others are closer to each other and may become very similar under certain illumination conditions and/or for certain devices (see for instance the fourth and the seventh colours in Fig. 2 (a)).These characteristics make COCO-10 very challenging.

Images in COLOUR-BARCODES-ON-WHITE-PAPER
The acquisition of the images stored in COLOUR-BARCODES-ON-WHITE-PAPER consists of three steps, described in the following Subsections.

Colour Barcode Printing on White Paper
The second step of COCO-10 creation was printing the colour barcodes.To this purpose, we considered two kinds of white papers with different densities (80 gr/m 2 and 160 gr/m 2 respectively), we arranged two colour barcodes per sheet, and we printed the resulting 75 pages.Given the different paper densities, we used two different printers, i.e. [ 1 ] for the paper with the lowest density and [ 2 ] for that with the highest density.

Acquisition and Processing of the Barcodes Printed on White Paper
We acquired the colour barcodes printed on the white papers under six illuminations by three smartphone cameras, hereafter denoted as Camera 1, Camera 2 and Camera 3 (see Table 2 ).These cameras were always used with automatic mode ON, in particular, they always performed automatic white balance.
The illuminations involved in this work are labelled as Natural, Artificial-01, Artificial-02, Artificial-03, Artificial-04 and Artificial-05.The sources, within their correlated colour temperature (where known), are listed in Table 3 .They are a natural daylight, four LED lamps, and one halogen lamp.Illuminants Artificial-01, Artificial-04 and Artificial-05 are warm lights, Artificial-04 is a cold light, while Artificial-04 simulates natural, white sunlight.Illuminant Natural is the sun light: the images were collected at different day times and with variable weather conditions, so that it impossible to specify a correlated colour temperature for this illumination.
For each camera and for each light, the printed barcodes were acquired as follows.Each paper, depicting two barcodes, was manually located in front of the camera, with the prescription

Table 2
Summary of the devices used for acquiring the colour barcodes printed on white paper.

Table 3
Summary of the illuminants used for the acquisition of COCO-10. of avoiding (as much as possible) geometric distortions, like skew, perspective changes and remarkable in-plane rotations.To this end, the paper was fixed on a rigid support and the camera tripod was manually adjusted to orient the camera parallel to the paper support (see Fig. 6 ).

Label
Masking tape was used to keep the set up (camera and paper) as immobile as possible to minimise blurring.Nevertheless, in some images we observed some slight geometric distortions due to the manual positioning of the paper.However, the presence of these geometric distortions is not a negative aspect since it adds realism to the data and may be easily corrected by postprocessing.We also notify that the size of the barcodes may change from camera to camera due to the different camera physical characteristics (e.g., lens, field of view, resolution) as well as to changes of tripod position from an acquisition to another.Thus, there is no one-to-one correspondence between the images of the same barcode sheets acquired by different cameras and/or by different illuminations.The 75 images have been then processed and analysed as explained in the following.

Cropping and Masking Colour Barcodes
Each photo was processed by clipping out two sections that contained barcodes (see Fig. 7 ).In this way, image parts containing the background or the masking tapes that were used to secure the camera and/or the paper to its support were removed, while we left a portion of the paper around the barcode intact to provide data on how the various cameras under the six illuminations perceived the colour of the same, nearly uniform region (i.e., the white paper).To describe how the cameras "see" a white patch and to analyse colour differences on a uniform area, this information may be helpful.
At the end of this operation, we had 150 colour photographs that we saved as PPM files in the subfolder of COLOUR-BARCODES-ON-WHITE-PAPER relative to the camera and paper used for the acquisition.Then, for each of these photos, we created a mask, which is a binary

Measuring intensity and Chromaticity Changes around Barcodes.
We observe that the subimages with barcodes, cropped out from each sheet image, generally have a different brightness.Specifically, the top sub-image generally is brighter than the bottom one, indicating that the light intensity was varying on the image.This vertical gradient of the light was due to the position of the light with respect to the acquisition plane.In fact, the light source was fixed on the ceiling, about 2 metres from the acquisition desk, but it was impossible to locate it perpendicularly to this desk because of the formation of strong shadows of camera and tripod on the sheet.A horizontal gradient (i.e., an intensity changes from left to right in each sub-image) is also present in some acquisitions, but is in general negligible, especially for the artificial lights.Anyway, the variations of the light intensity and chromaticity on the barcode region are slight, so that each barcode can be considered uniformly illuminated.This claim is supported by the following analysis.
For any light and for any image of COLOUR-BARCODES-ON-WHITE-PAPER acquired by one of three acquisition cameras, we selected a rectangular crown 20 pixels thick around each barcode (see Fig. 8 for an example) and we computed the mean values and standard deviations of the intensity U and chromaticity (r, g ) of their pixels.Mathematically, for any pixel x of the crown, U (x ) and (r(x ) , g(x ) ) at x are given by: Figs. 9 , 10 , and 11 report, for each image, the mean luminance U around each barcode with standard deviations, broken down by camera and paper.The x-axis reports the ID of the images, lating trend.In fact, as already observed qualitatively, the brightness around on-top barcodes image, the standard deviation of U is small with respect to the mean value of U (see Tables 4  and 5 for a numerical comparison), meaning that, the light brightness can be considered uniform on the barcode regions.
Figs. 12-14 show the mean chromaticity (r, g ) around each barcode with standard deviation.Differently from U, there is no chromaticity gradient.The standard deviation of each chromaticity component c = r, g Is small with respect to the value of c (see Tables 6 and 7 for a numer-    ical comparison), meaning that the light chromaticity can be considered stable over the barcode region and its surround.
For each camera, and for each light and paper, Tables 4 and 5 report the mean brightness of U , while Tables 6 and 7 report the mean value of (r, g ) .From these data, we can observe that Camera 3 reports warmer chromaticies and darker brightnesses than Camera 1 and Camera 2. Camera 1 and Camera 2 have a similar behaviour: for both these cameras, the chromaticities measured around the barcodes under Artificial-02 and Artificial-03 are very close to each other, although these lights have quite different correlated colour temperature.However, for Camera 2, U is lower than for than Camera 1 and the chromaticities of Artificial-01 and Artificial-05 are also very close to each other (see also Fig. 13 ).Finally, we observe that, for the natural light, both chromaticiy and brightness change from image to image because this light depends on the time at which the acquisition is made (e.g., morning, afternoon, evening, …) as well as on other uncontrollable conditions, like sudden weather changes.This explains the remarkable variability of the values of (r, g ) reported for the acquisitions made by Camera 2 and regarding the barcodes printed on paper with density 80gr/m 2 .8 reports the mean value of the scale factor (with mean standard deviation) between the area of each barcode in the acquired images and the area of the corresponding barcode in COLOUR-BARCODES, averaged by the number of image acquisition (i.e., 150).The scale factor changes from camera to cameras because of the different cameras' field of view, lens, and focus length.Slight changes of scale factors can be observed also from image to image of the same acquisition group (e.g., images captured by the same camera under a given illumination), because of the automatic adjustment of the camera settings and/or variation of the barcode sheet from the camera due to the manual positioning.Moreover, also printing process contributes to barcode rescaling.We finally point out that the size of the barcodes in the acquired images also depends on the barcode segmentation accuracy, thus the values in Table 8 refer to the masks we provided.

Images in COLOUR-BARCODES-ON-CLUTTERED-BACKGROUND
To enable testing algorithms for the extraction and decoding of colour barcodes in images with cluttered backgrounds, we collected 150 real-world pictures of indoor / outdoor environments, textured and uniform and objects like dolls, plants, food, and we saved them as JPG files in SYNTHETIC-COLOUR-BARCODES-ON-CLUTTERED-BACKGROUND (see Fig. 5 ).For each i = 1, …., 150 we superimposed on the i-th image the i-th colour barcode of COCO-10 by an algorithm that selected randomly the scale and the position of the barcode.Only in-plane rotation of 0, 90, 180, 270 degrees where allowed, while other angles were forbidden to avoid aliasing when printing the images.
We divided these 150 images into two groups (group 1 and group 2), each containing 75 images and we randomly assigned to each group a paper type: therefore, we printed the images of Group 1 on paper with density 160gr/m 2 , while the images of Group 2 on paper with density 80gr/m 2 .
In turn, we divided the printed images of each group G ( G = 1, 2) into three subsets each containing 25 images and we acquired each of them under one of the six illuminants described above by two cameras.
In addition to Cameras 1, 2 and 3, here we considered three smartphone cameras (see Table 9 ).As for the acquisitions of the barcodes on white paper, automatic mode (in particular, automatic white balance) was always ON for all the cameras.The illuminations and the pairs of cameras used for the acquisitions were randomly assigned to the six image subgroups.

Fig. 3 .Fig. 4 .
Fig. 3. Two images of a colour barcode, both acquired by Camera 1, printed on paper with density 80 gr/m 2 (on left) and 160 gr/m 2 (middle).On right, the image of the same barcode printed on paper with density 80 gr/m 2 but acquired by Camera 6.All these three images have been captured under the illumination Artificial-01.The colours of the left image appear different from those of the middle image because of the different kind of paper, printer, and ink as well as because of the automatic white balance performed by the used camera.The colours of the left image also differ from those of the right image because different cameras render colours differently.

( 3 )
COLOUR-BARCODES-ON-CLUTTERED-BACKGROUND -This directory contains 15 folders, specifically: (a) directory SYNTHETIC-COLOUR-BARCODES-ON-CLUTTERED-BACKGROUND contains 150 colour images, where the i-th image displays the i-th colour barcode grounds show indoor or outdoor environments, texture and uniform surfaces, different kinds of objects, like tissues, dolls, food, plants.

Fig. 5 .
Fig. 5. On left: an image from SYNTHETIC-BARCODES-ON-CLUTTERED-BACKGROUND , where a colour barcode appears over a cluttered background; in the midlle: the image on left, printed on a paper with density 80 gr/m 2 , acquired by Camera 6 under illumination Artificial-05; on right, the mask of the image in the middle, where a black contour has been added around the image to better visualize its content.(b) 14 folders, each of which is named CAMERA-X-PAPER-Y-LIGHT-Z and contains 25 images from SYNTHETIC-COLOUR-BARCODES-ON-CLUTTERED-BACKGROUND (in JPG format) along with their masks (in PBM format).Precisely, the folder name has the following meaning: X indicates the camera used for the acquisition ( X = 1, 2, 3, 4, 5, 6), Y is the density of the paper on which the image has been printed ( Y = 80,160 (gr/m 2 )) and Z is the light used in that acquisition ( Z = Artificial-01, Artificial-02, Artificial-03, Artificial-04, Artificial-05, Natural).Each of these folders

Fig. 6 .
Fig. 6.Experimental settings used for the acquisitions of COCO-10.We use the configuration on left or right to minimize shadows caused by the position of the light sources with respect to the camera and to the barcode sheet.

Fig. 7 .
Fig. 7. On left, the image of a sheet containing two barcodes acquired by Camera 1 under Artificial-04, and on right the two regions with barcodes cropped out.

Fig. 9 .Fig. 10 .Fig. 11 .
Fig. 9. Mean luminance around each barcode for the various images of COLOUR-BARCODES-ON-WHITE-PAPER for Camera 1 and paper with density 80 gr/m 2 (on top) and 160 gr/m 2 (on bottom).X-axis reports the label of each image (i.e.,

Fig. 13 .
Fig. 13.Mean chromaticity around each barcode for the various images of COLOUR-BARCODES-ON-WHITE-PAPER for Camera 2 and paper with density 80 gr/m 2 (on top) and 160 gr/m 2 (on bottom).

Fig. 14 .
Fig. 14.Mean chromaticity around each barcode for the various images of COLOUR-BARCODES-ON-WHITE-PAPER for Camera 3 and paper with density 80 gr/m 2 (on top) and 160 gr/m 2 (on bottom).

Table 1
Number of files (i.e., images) and size of the three main directories of COCO-10.COLOUR-BARCODES-ON-WHITE-PAPER10800 47.60 Gb COLOUR-BARCODES-ON-CLUTTER-BACKGROUND 750 1.82 Mb

Table 5
Mean values of the mean intensity and standard deviation of the region around each barcode in COLOUR-BARCODES for the acquisitions with paper with density 160gr/m 2 .

Table 6
Mean values of the mean chromaticity ( r and g) and standard deviation of the region around each barcode in COLOUR-BARCODES for the acquisitions with paper with density 80gr/m 2 .

Table 7
Mean values of the mean chromaticities and standard deviation of the region around each barcode in COLOUR-BARCODES for the acquisitions with paper with density 160gr/m 2 .

Table 8
Mean scale factor with standard deviation of the barcodes in the acquired images with respect to their size in COLOUR-BARCODES, broken down by paper density, cameras, and lights.