The Mandrillus Face Database: A portrait image database for individual and sex recognition, and age prediction in a non-human primate

The Mandrillus Project is a long-term field research project in ecology and evolutionary biology, monitoring, since 2012, a natural population of mandrills (Mandrillus sphinx; primate) located in Southern Gabon. The Mandrillus Face Database was launched at the beginning of the project and now contains 29,495 photographic portraits collected on 397 individuals from this population, from birth to death for some of them. Portrait images have been obtained by manually processing images taken in the field with DSLR cameras: faces have been cropped to remove the ears and rotated to align the eyes horizontally. The database provides portrait images resized to 224 × 224 pixels associated with several manually annotated labels: individual identity, sex, age, face view, and image quality. Labels are stored within the image metadata and in a table accompanying the image database. This database will allow training and comparing methods on individual and sex recognition, and age prediction in a non-human animal.


a b s t r a c t
The Mandrillus Project is a long-term field research project in ecology and evolutionary biology, monitoring, since 2012, a natural population of mandrills ( Mandrillus sphinx; primate) located in Southern Gabon. The Mandrillus Face Database was launched at the beginning of the project and now contains 29,495 photographic portraits collected on 397 individuals from this population, from birth to death for some of them. Portrait images have been obtained by manually processing images taken in the field with DSLR cameras: faces have been cropped to remove the ears and rotated to align the eyes horizontally. The database provides portrait images resized to 224 × 224 pixels associated with several manually annotated labels: individual identity, sex, age, face view, and image quality. Labels are stored within the image metadata and in a table accompanying the image database. This database will allow training and comparing methods on individual and sex recognition, and age prediction in a nonhuman animal. ©

Value of the Data
• This is the first public database that contains annotated face pictures of wild mandrills living in their natural environment. To our knowledge, this is also the largest photographic portrait database for non-human animals in the wild regarding the number of sampled individuals and the time frame (397 individuals totaling 29,495 pictures taken during 10 years) • Pictures are labelled by experts in primatology for applications in Computer Vision, Machine Learning/Deep Learning, Data Science. • The database is specifically designed to benchmark methods of individual recognition, face verification, sex recognition, and age prediction in a non-human primate

Objective
This database is used to study the role of face attributes in visual communication in wild animals. Beyond applications in behavioral ecology, the database is also currently used to develop and compare Deep Learning methods of individual recognition, face verification, sex recognition, and age prediction in a non-human primate. The database also allows training Deep Learning models to automatically pre-process future data (automatic cropping, alignment, and labelling of face view and image quality).

MFD_metadata.csv file
This metadata file is a .csv file with 29,496 rows (first row contains the name of each column; and one row per picture) and 9 columns (attributes of each picture). Columns, arranged in that order in the file, contain the following information: Photo_Name (type 'string'): this column indicates the name of the picture (see above for the syntax). If the date of shooting is unknown, the date is "unknown" instead of "YYYYM-MDD" format. Id (type 'integer'): this column provides the identifier of the individual depicted in the picture. Sex (type 'categorical'): this column indicates the sex of the individual on the picture ("f" for female, "m" for male and "unknown" if the sex was unknown). dob (type 'date'): this column gives the date of birth of the individual on the picture (with "YYYYMMDD" format). If unknown, this cell returns "NaN". dob_estimated (type 'boolean'): this column indicates whether the date of birth is known with certainty ("False") or whether it has been estimated by the field assistants, based on observational data on the mother's ovulation cycle ("True"). If the date of birth is unknown, this cell returns "NaN". error_dob (type 'integer'): if 'dob_estimated = 'True', this column indicates the uncertainty (measured in days) around the date of birth. If 'dob_estimated = 'False', this cell returns 0. If the dob is unknown, this cell returns "NaN".
FaceView (type 'integer'): this column indicates whether the mandrill's face depicted on the picture is in frontal (1) or in profile (0) view. The face is considered to be frontal when both eyes  are visible and the face is fully frontal or on 3/4 (approximately < 30 °) and the occlusion covers less than 50% of the face, otherwise the face is considered to be in profile view. The database includes 26,846 frontal and 2,649 profile pictures.
FaceQual (type 'categorical): this column indicates the quality of the picture, ranging from 0 to 3 (or -1: when the quality has not been evaluated because the individual is in profile view). 0: pictures of bad quality and for which experienced field assistants are unable to recognize the individual from the picture alone, without the contextual information. 1: pictures of average quality for which experienced field assistants are able to recognize the individual from the picture alone, with some difficulties but without any contextual information. 2: pictures of good quality for which individual recognition is straightforward but the portrait does not meet the criteria of quality 3. 3: pictures of high quality for which individual recognition is straightforward, and the face has a neutral expression and is in perfect frontal view, with no shadow, bright spot or partial occlusion ("id card-like" portraits). The majority of the portraits are of quality 2 and 3 (see Fig. 4 ).
Shootdate (type 'date'): date of shooting ("YYYYMMDD"). The database includes 191 females, 203 males and 3 individuals of unknown sex (infants aged less than a year, only). The database contains individuals from birth to 23 years old. The age is calculated as the difference between the shooting date ("Shootdate" column) and the date of birth ("dob" column). Fig. 3 provides the histogram of the age distribution of the portrayed individuals with infants (0-1 year) corresponding to 20% of the total number of pictures.
The Fig. 4 represents the number of pictures collected per year (extracted from the "Shoot-Date" column). Most pictures were taken from 2018 onward (more than 87% of the pictures).
Finally, Fig. 5 represents the number of photos per quality score ("FaceQual" column). Most pictures (85%) are of quality 3 and 4.

Experimental Design, Materials and Methods
The database includes 29,495 photographic portraits collected on 397 individuals from the only wild social group of mandrills habituated to human presence. This group, which roams in the Lékédi Park and its surroundings, in southern Gabon (near to the village of Bakoumba), is daily monitored by the Mandrillus Project [1] ( www.projetmandrillus.com ) for researches in ecology and evolution (see for examples: [ 2 , 3 ]). The group was founded after the release of 65 semi-captive individuals (born and raised at CIRMF; Centre International de Recherches Médicales de Franceville, Gabon), in 2002 (36 individuals) and 2006 (29 individuals [4] ) . Starting as early as 2003, wild males joined the group to reproduce with released females. In 2021, most of the individuals of the group ( > 95%) were wild-born. Photos were taken directly in the forest by field assistants while following the study group. Since the beginning of the project, field assistants have used different models of DSLR cameras and long-focal lenses (varying from 70 to 500 mm, depending on camera models and distance to the subjects). Photos are uploaded on a computer regularly, and renamed by the assistants using the syntax presented above. Co-authors of this article (BRT, MH, MJEC, LS), who know the identity of the studied mandrills, monthly validated the names of the individuals depicted on all pictures from the database. Pictures were then processed using Adobe Photoshop Lightroom Software version 10.1.1. Images were first oriented to align the pupils of the eyes horizontally, and then centered and cropped to keep only the face (removing the neck and the ears). No further processing was applied.

Ethics Statements
The Mandrillus Face Database is based on non-invasive methods (pictures taken during the daily routine of the animals). Pictures were taken from a distance and without a flashlight. Photographers took pictures on the fly without any obvious perturbation of the study mandrills.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: