Artificial Mercosur license plates dataset

Mercosur (a.k.a. Mercosul) is a trade bloc comprising five South American countries. In 2018, a unified Mercosur license plate model was rolled out. Access to large volumes of ground truth Mercosur license plates with sufficient presentation variety is a significant challenge for training supervised models for license plate detection (LPD) in automatic license plate recognition (ALPR) systems. To address this problem, a Mercosur license plate generator was developed to generate artificial license plate images meeting the new standard with sufficient variety for ALPR training purposes. This includes images with variation due to occlusions and environmental conditions. An embedded system was developed for detecting legacy license plates in images of real scenarios and overwriting these with artificially generated Mercosur license plates. This data set comprises 3,829 images of vehicles with synthetic license plates that meet the new Mercosur standard in real scenarios, and equivalent number of text files containing label information for the images, all organized in a CSV file with compiled image file paths and associated labels.


a b s t r a c t
Mercosur (a.k.a. Mercosul) is a trade bloc comprising five South American countries. In 2018, a unified Mercosur license plate model was rolled out. Access to large volumes of ground truth Mercosur license plates with sufficient presentation variety is a significant challenge for training supervised models for license plate detection (LPD) in automatic license plate recognition (ALPR) systems. To address this problem, a Mercosur license plate generator was developed to generate artificial license plate images meeting the new standard with sufficient variety for ALPR training purposes. This includes images with variation due to occlusions and environmental conditions. An embedded system was developed for detecting legacy license plates in images of real scenarios and overwriting these with artificially generated Mercosur license plates. This data set comprises 3,829 images of vehicles with synthetic license plates that meet the new Mercosur standard in real scenarios, and equivalent number of text files containing label information for the images, all organized in a CSV file with compiled image file paths and associated labels. © 2020 Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ) Table   Subject Computer Vision and Pattern Recognition. Specific subject area Automatic License Plate Recognition (ALPR) with synthetic generation and embedding of Mercosur License Plates. Type of data  Integer that identifies the class of the object (always zero in this project because there is only one class of object to detect). X_center

Specifications
Float number between (0 to 1] providing the coordinate x of the center of the boundary box. Y_center Float number between (0 to 1] providing the coordinate y of the center of the boundary box. Width Float number between (0 to 1] providing the coordinate width of the center of the boundary box. Height Float number between (0 to 1] providing the coordinate height of the center of the boundary box.

Data Description
The legacy Brazilian license plate standard was introduced in 1998 and is characterized by three-letter and four-digit sequence above which are the state code and municipality. Following an agreement in 2010, the four active countries in the Mercosur trade bloc -Argentina, Brazil, Paraguay and Uruguay -agreed to roll out a unified license plate model by 2020. This data set contains images of real-life contexts where legacy three-letter license plates were detected using Tiny-YOLOv3 [2] and replaced with artificially-generated images of license plates designed to the new Mercosur standard. It is organized in two folders: • Images -containing the image files (JPEG) of the data set; and • Labels -containing text files with the image category identification number and the coordinates of the detected license plates in the image according to the Yolo_mark annotation specification (accessible though https://github.com/AlexeyAB/ darknet#how-to-train-to-detect-your-custom-objects ).
The images are organised in five categories based on their acquisition method. These are identified by a prefix in the filename: 1 monitoring_system_ -2925 JPEG images with resolution of 800 × 600 obtained from a license plate detection model from a public traffic monitoring camera video stream at 800 × 600 resolution; 2 parking_lot1_ -566 JPEG images with resolutions of 3264 × 2448 and 3264 × 1836 obtained using a digital camera at 6 megapixels resolution from a parking lot; 3 cropped_parking_lot_ -315 JPEG images with diverse resolutions because this subset is composed by zoomed and cropped versions of selected images from parking_lot1_; 4 parking_lot2_ -23 JPEG images with resolutions of 3264 × 2448 and 3264 × 1836 obtained using a Samsung Galaxy Tab 10.1 tablet camera at 8 megapixels resolution from a parking lot; and 5 parking_lot3_ -11 JPEG images with resolutions of 3264 × 2448 and 3264 × 1836 obtained using an Asus ZenFone 5 smartphone camera from a parking lot.
In addition, the data set contains a CSV file listing all license plates featured in all images organized in seven features: image, label, class, x_center, y_center, width and height . These are further defined in Table 1 .

Experimental Design, Materials and Methods
To generate the data set, images were acquired from the sources above and then (i) coupled (a) synthetically-generated images of the new Mercosur license plates with (b) frames of real scenes containing the legacy, in this case, Brazilian, three-letter license plates, and (ii) transformed these images using various digital image processing techniques. This process is outlined in Fig. 1 ). The data set is a product of the first three phases of the full LPR pipeline, described in [1] . The first phase involved a number of steps. First, a Brazilian Mercosur License Plate template is created using HTML and CSS3 in accordance with the Mercosur License Plate specification. Then, a program written in C ++ using OpenCV [3] is used to embed and position the Brazilian National flag in the template image, merge back-ground text containing required diagonal text, and generate randomly vehicle identification alphanumeric characters and write these in the template, generating a synthetic license plate.
Once a base synthetic license plate is generated. The same C ++ and OpenCV program apply one of four shading effects -horizontal, vertical, or rectangle and tree -randomly chosen by 1, 2, 3 and 4, respectively. The produced mask is then submitted to an appropriate non-linear transformation to its gradient field and then integrated back in the license plate with a Poisson solver. This solver locally modifies the apparent illumination of the image [4] so that the synthetic license plates integrate commonly experience shadow effects resulting from local light conditions ( Fig. 2 ).   • Rectangle : This filter simulates a combination of the horizontal and vertical shadows, where the points y0, y1, x0 and x1 are selected in the same manner as above. Thus, the combination is presented by Equation 3.
where <> is defined as per Table 2 .
• Tree : This filter simulates shadows caused by trees. Therefore, max v is randomly selected in interval [0,999], and for each M ij the Eq. (4) is applied.
The second phase consists of acquiring the identification of the vehicle and detecting the boundary box of the license plate. This was achieved by using a Tiny-YOLOv3 model trained to detect legacy three-letter license plates under the neural network hyper-parameters and Dark-net53 architecture configuration, summarized in Table 3 . The Tiny-YOLOv3 setup adhered to the default CNN configuration [5] except the number of classes and filters were set to 1 and 18, respectively. Also, to enable the training process to be stopped and restored, the current weights were saved every 100 iterations. At the end of training the weights with the lowest average loss were selected.
Lastly, in the third phase, the information acquired in Phase 2 is used to complete the synthetic license plate and the boundary box is used to position it and overwrite the original license plate. The angle was obtained by a cut performed on the detected boundary box of the legacy three-letter license plate, and an adaptive threshold technique is applied to the sloped license plate using OpenCV, considering the mean of neighbourhood as divisor [4] . Then, the Probabilistic Hough Transformation is applied to detect lines [6] and its angles and mean are calculated. Finally, the newly generated license plate is placed in the scene at the correct position and orientation ( Fig. 3 ) so that the slope is captured and the synthetic license plate replaces the original images as closely as possible.