A validation dataset for Macaque brain MRI segmentation

Validation data for segmentation algorithms dedicated to preclinical images is fiercely lacking, especially when compared to the large number of databases of Human brain images and segmentations available to the academic community. Not only is such data essential for validating methods, it is also needed for objectively comparing concurrent algorithms and detect promising paths, as segmentation challenges have shown for clinical images. The dataset we present here is a first step in this direction. It comprises 10 T2-weighted MRIs of healthy adult macaque brains, acquired on a 7 T magnet, along with corresponding manual segmentations into 17 brain anatomic labelled regions spread over 5 hierarchical levels based on a previously published macaque atlas (Calabrese et al., 2015) [1]. By giving access to this unique dataset, we hope to provide a reference needed by the non-human primate imaging community. This dataset was used in an article presenting a new primate brain morphology analysis pipeline, Primatologist (Balbastre et al., 2017) [2]. Data is available through a NITRC repository (https://www.nitrc.org/projects/mircen_macset).


Value of the data
Reference data for algorithms dedicated to the non-human primate brain is lacking. This is the first publicly available set of manual segmentations of Macaca fascicularis brain MRIs. Segmentation into 17 anatomical regions was performed in 15 relevant sections in all three incidences (axial, coronal and sagittal).
Data is shipped within BrainVISA, along with a process allowing computing section-wise Dice scores.

Data
MR images of the brain of 10 healthy young adult macaques were acquired on a 7T scanner. In each volume, 15 relevant sections (7 coronal, 5 axial, 3 sagittal, encompassing all the major brain anatomic regions), spanning the whole brain, were selected for manual segmentation. Raw MRIs and manual segmentations are available as NifTi volumes in a NITRC repository (https://www.nitrc.org/projects/ mircen_macset). They will also be available with the next release of BrainVISA (version 4.6, http:// www.brainvisa.info), through the BrainVISA installer. This dataset is distributed under the CeCILLv2.1 license (http://www.cecill.info), a GPL-compatible license, and can be freely used for academic work, upon citing this paper. This dataset was used to validate automated segmentations obtained with Primatologist, a pipeline dedicated to macaque brain morphology analysis [2]. The list of provided files is summarized in Table 1.

Animals and imaging
All animal studies were conducted according to European regulations (EU Directive 2010/63) and in compliance with Standards for Humane Care and Use of Laboratory Animals of the Office of Laboratory Animal Welfare (OLAW -no#A5826-01) in a facility authorized by local authorities (authorization no#B92-032-02). All efforts were made to minimize animal suffering and animal care was supervised by veterinarians and animal technicians skilled in the healthcare and housing of NHPs. All animals were housed under standard environmental conditions (12-h light-dark cycle, temperature: 22 7 1°C and humidity: 50%) with ad libitum access to food and water.
Ten male cynomolgus monkeys (Macaca fascicularis, supplied by Noveprim, Mauritius Island) aged 2 to 5 years (mean: 3.76, SD: 0.77) underwent baseline MR imaging. Animals weighed 3.8 to 6.3 kg (mean: 4.85, SD: 0.73) at examination time (see Table 2 for detailed age and weight). Animals were anesthetized with ketamine (1 mg/kg) and xylazine (0.5 mg/kg), maintained with intravenous infusions of propofol (1 ml/kg/h) and placed in the magnet in a sphinx position with the head fixed in a stereotaxic MRI-compatible frame (M2E, France). Animals were heated by a hot air flux and their temperature and respiration parameters were monitored remotely.

Preprocessing
Data format was converted with BrainVISA from Varian FDF, produced by the scanner, to gzipped NifTi. Because Varian's on-disk storage is sequence-dependent and does not provide voxel-to-world transforms, volumes were flipped, and the appropriate voxel-to-world transforms were stored into the NifTi header, with BrainVISA's python tools (pyAIMS) so that volumes can be used with any software able to handle NifTi metadata. Offsets were set so that the origin is the center of the volume. See https://nifti.nimh.nih.gov for additional precisions on the NifTi format and the way it handles transforms.

Regions definition
The choice of regions to segment was based on the rhesus macaque atlas published by the Center for in vivo Microscopy (CIVM) [1]. This atlas consists in the segmentation into 241 regions of a template (i.e., a mean image) built from MRIs acquired in 10 post mortem brain specimen. These regions belong to an ontology that we used to simplify the atlas and reduce it to 17 major labels and 5 hierarchical levels. The resulting simplified ontology is shown in Fig. 1, along with the corresponding numeric labels.

Sections selection
Rather than manually segmenting all 80 coronal slices that constitute a MR volume, we decided to select a subset of relevant sections in all three incidences. This choice was guided by the will to avoid any incidence-induced bias in the segmentation as well as lower the segmentation load. As a result, 7 coronal, 5 axial and 3 sagittal sections were selected so that all anatomical classes were found in all three incidences.
Coronal sections were selected based on the Paxinos Macaque Atlas [3] relatively to the anterior commissure (AC) and posterior commissure (PC). These coordinates were then converted to pseudo-Talairach coordinates (without alignment of the AC-PC axis). Let us note AL the brain anterior limit and PL the brain posterior limit. Selected coordinates are provided in Table 3. A rough correspondence with sections from the CIVM template is also provided.

Manual segmentation
A single operator manually segmented the 17 anatomical regions present in the simplified atlas with a Cintiq 24HD touchscreen (Wacom, Saitama, Japan), using the Anatomist software (NeuroSpin, CEA, France). Both the CIVM and the Paxinos atlases were used as references to delineate the structures. A representative example is shown in Fig. 1.
Segmentations were stored in Anatomist's ARG format in three different files per animal (one per incidence), and then converted to NifTi. The voxel-to-world transform of the raw MRI was also stored in the segmentation NifTi files. A 3D segmentation volume was also created by fusing the three incidences. A majority vote was used to fill overlapping voxels.

Classification scores
This dataset is provided with a set of python functions, which allow computing classification scores with our manual segmentations as a ground truth. Because scores can only be based on sections that were manually segmented, the corresponding sections in the evaluated 3D segmentations can be automatically extracted.
Image segmentation can be seen as a classification problem, a domain where the F 1 score is a widely used metric. The F 1 score is exactly equivalent to the Dice coefficient [4], a more common designation in the field of image segmentation. For a given class, let us call P the set of voxels that belong to it and F the set of non-belonging voxels. Then |P| þ|F| ¼ n, the number of voxels in the image. Let us note C P the set of voxels classified as belonging to the class and C F those classified as non-belonging. True positives (TPs) are voxels accurately classified as belonging to the region (C P ∩ P) and false positives those inaccurately classified as belonging to the region (C P ∩ F). True negatives (C F ∩ F) and false negatives (C F ∩ P) are defined the same way.
Precision is defined as the ratio between the number of TPs and the number of observations classified as positive and can be seen as a measure of over-segmentation: Precision vary between 0 and 1, with a maximal score indicating no type I error, i.e. no overdetection. Recall is defined as the ratio between the number of TPs and the number of truly positive observations and can be seen as a measure of under-segmentation: Recall varies between 0 and 1, with a maximal score indicating no type II error, i.e. no underdetection. The F 1 score is defined as the harmonic mean of precision and recall and thus includes information on both over-and under-segmentation: Consequently, the F 1 score also vary between 0 and 1, with higher scores indicating agreement between segmentations.
However, this score was only defined for binary classifications, where observations can be separated between positive and negative. In the case of multi-labels segmentation, it must be extended. We used the micro-averaged F 1 with a multi-labels definition of sets P (positives) and C P (classified as positives), according to the conventional F 1 formula. Let R the ground truth volume and S the evaluated segmentation: The provided set of functions allows computing the binary F 1 score for each node of the atlas hierarchy, as well as the micro-F 1 score. User-friendly processes will also be available within the next release of BrainVISA (4.6).