RAP3DF - One shoot 3D face dataset

Develop researches in the 3D biometrics field requires a large set of images, whether for training the algorithms or during the recognition test. Several datasets can be found in the literature. In an analysis of these datasets it was observed that a single dataset does not have the types of infrared images, visible and three-dimensional light, for the same sample. Given this context, the present work conceived this 3D facial dataset, with its respective visible light image and infrared spectrum, providing the entire image acquisition process from the Kinect One device. The work consists of 267 samples from 64 volunteers, where each volunteer has a frontal facial image and 3 images in arbitrary positions.


Specifications
Computer Science (General) Specific subject area Facial biometrics Type of data

Value of the Data
• The dataset has infrared facial images, 3D depth images, and images of the visible light spectrum for each sample. • The data can be used to train classifiers or artificial neural networks to identify a face, age or gender. Different classifiers can be trained to identify the best type of image for a given issue, such as facial recognition. • Samples from volunteers in random positions and in frontal positions can be used to check whether a facial recognition system can identify faces in different perspectives. • The dataset can be used to identify the direction of a face from 2d and 3d models / images and also to analyse which one best accurates.

Data Description
The images in this dataset were collected by a single person during the period of October 10th to 27th of the year 2017. Eight classes from the computer science course at UNIVALI University were invited to participate in this paper, where 64 accepted the invitation.
Two groups of images were extracted for each participating volunteer: in the first group the volunteers established themselves with their faces in the frontal position without performing facial expressions, looking straight at the camera and in the second group of images the volunteers, freely, could look in any direction and perform any facial expression. These facial expressions were classified into 6 different groups: happiness, surprise, fear, sadness, anger and disgust. The nomenclatures of the expression groups are the same groups used in the Bosphorus database [14] .
Each sample collected has 3 types of images, a visible light image, an infrared image and a depth image with the 3D representation extracted from Kinect One. All 3 images were collected at the same time. The depth image was created by calculating the distance from each point of the three-dimensional face to a virtually created surface, located in the same three-dimensional position as the Kinect One camera.
BMP-type images were generated for all types of the collected images, allowing interested parties to quickly see the dataset. "Data" extension files were also saved with the information provided by Kinect One, without manipulation. The "data" extension files of the depth images have in their content a list of floating points of 8 bytes. All of the images are 119 pixels wide and 149 pixels high. Table 1 presents the image type for each file type; the file types are differentiated by the suffix of their names. Each participating volunteer received a unique identifier, which became a folder of the same name, grouping the images of that individual. All images collected and stored in their folders were made available at the link http://dx.doi.org/10.17632/kpdkpcs8zb. After data collection 71 volunteers participated in this dataset where only 64 volunteers agreed to make their images available, counting 267 samples of frontal facial and random positions images. Seven volunteers, during the acquisition, gave up participating the dataset. Fig. 1 shows the types of images collected with Kinect One and described below: • Depth image, with three-dimensional information from the volunteer; • Infrared image, with information from the infrared spectrum; • Visible light image, with information on the visible light spectrum.
The red colour intensity in the pixels of the depth images represents the distance from a certain point on the face in relation to the Kinect camera at the time of the collection. From this information it is possible to recreate the object three-dimensionally, as shown in Fig. 2 .
To allow facial recognition tests to be performed images were also collected in arbitrary positions from the volunteers, who expressed themselves freely in front of the Kinect One camera. Fig. 3 illustrates some collected images. The black background in the volunteers' facial images  occurs due to a configuration created to collect objects less than 2 m away, and also because one of the collection sites did not have a wall behind the volunteers.
In Fig. 4 b, 4 c and 4 d we can see the 3D modelling resolution when the triangulation of the points is performed ( Fig. 4 a). Adjacent pixels were used to triangulate the image.

Demographic information
For other studies to be developed, with this dataset, demographic information (age, sex, colour, weight and height) was also collected from the participants. This information allows us to perform studies other than facial recognition, such as age and sex detection based on image. Regarding skin colour, the volunteers were instructed to choose as described in [7] .

Comparison between datasets
To assess the quality of this dataset a comparative analysis was carried out with other datasets. On Table 2 below this comparison is presented, where the first column exhibits the name of the dataset, the second column the population (number of individuals who participated the project), the third column the number of samples in the dataset and from the fourth to the seventh column it is informed with an "X" which type of image the dataset has and the symbol "?" demonstrates the dataset information does not indicates what type of image the dataset has, however the illustrative images on its articles indicate it. Depth images were considered as 3D images.
We can see on Table 2 that in none of the datasets we can find all types of images . However, if we consider that a grayscale image is a simplified representation or channel reduction of an RGB, IR or 3D image, the dataset on this article includes all types of images.

Experimental design, materials and methods
To capture the 3D, infrared and visible light images a structure was established in different locations at UNIVALI -University of Itajaí Valley, so that the volunteers were properly positioned during the capture of the images. For the structure a chair and a tripod with Kinect One, a laptop and 2 monitors were placed, with one monitor for the volunteer to see the images and the second for the researcher to check and collect the images, as shown on Fig. 5 a. With the structure assembled a software was used, developed for the purpose of this article, which collects the images from Kinect One to a region of 60 cm in front of the device, extracting all information in an area of 30 cm wide by 30 cm high and 60 cm deep, as shown on Fig. 5 b.

Ethics statement
During the approach and data collection the volunteers were informed about the project, its risks and benefits, the collection procedure, the required participation time, the right to stop participating the dataset without penalties and also all doubts had been solved before data collection began. After clarifications the volunteers were asked if they would like to participate the project, if the response was affirmative they were asked to read and sign a consent form as well as an image use term.
This dataset was created with the approval of CEP (Research Ethics Committee) from University of Itajaí Valley -Brazil with CAAE (Certificate of Presentation of Ethical Appreciation) number: 97615018.9.0 0 0 0.012.

Declaration of Competing Interest
None.

Supplementary Materials
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.dib.2020.106281 .