Auto-kNN: Brain Tissue Segmentation using Automatically Trained k-Nearest-Neighbor Classification

. In this paper we applied one of our regularly used processing pipe-lines for fully automated brain tissue segmentation. Brain tissue was segmented in cerebrospinal fluid (CSF), gray matter (GM) and white matter (WM). Our algorithms for skull stripping, tissue segmentation and white matter lesion (WML) detection were slightly adapted and applied to twelve data sets within the MRBrainS13 brain tissue segmentation challenge. Skull stripping is performed using non-rigid registration of 5 atlas masks. Our tissue segmentation is based on an automatically trained kNN-classifier. Training samples were obtained by non-rigid registration of 5 manually labeled scans followed by a pruning step in feature space to remove any residual erroneously sampled tissue voxels. The kNN-classification incorporates voxel intensities from a T1-weighted scan and a FLAIR scan. The white matter lesion detection is based on an automatically determined threshold on the FLAIR scan. The application of the algorithms on the data from the MRBrainS13 Challenge showed that our pipeline produces acceptable segmentations. Average resulting Dice scores were 77.86 (CSF), 81.22 (GM), 87.27 (WM), 93.78 (total parenchyma), and 96.26 (all intracranial structures). Total processing time was about 2 hours per subject.


Introduction
The segmentation of magnetic resonance (MR) images in white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) is one of the classic neuroimage analysis challenges.Brain tissue volume measurements are used in studies on ageing and neurodegenerative diseases [1,2].These segmentations are also commonly employed as regions of interest for other neuroimage analyses, including cortical thickness measurements [3], voxel-based morphometry [4], and connectivity analyses [5].
Given the relevance of brain tissue segmentation, many automated different segmentation methods have been proposed over the years.Almost all of these methods rely on a supervised or unsupervised voxel classifier.Supervised methods use manu-ally segmented training data to learn the typical distribution of intensity or appearance features for the tissue classes [6].This has the advantage that the classifier explicitly follows the manual segmentation protocol and it allows the use of large amounts of features.Supervised methods, however, also require that the training data resembles the unlabeled target scan.Since the manual segmentation of a few MR images is very time-and labor-intensive suitable training data is not allows available.
Unsupervised methods, particularly those based on expectation maximization (EM), do not require training data and are therefore more widely used than supervised methods.EM-based methods start with an initial segmentation, which is often based on a probabilistic brain tissue atlas that is registered to the unlabeled target scan.From this initialization, class-specific Gaussian intensity distributions are estimated.This intensity model can then be used to update the segmentation and this process is repeated until the segmentation converges.
The main reason for its popularity is that several EM-based methods are publicly available [7,8], as standalone application or as part of a neuroimaging analysis package like SPM [9,10] and FSL [11].But the EM framework is also attractive from a methodological perspective.In particular, the segmentation procedure can be easily enhanced with features like bias field estimation, Markov Random Field (MRF) regularization, and an iteratively updated atlas registration.
In this paper we validate an alternative fully automated and unsupervised segmentation method, originally introduced in Cocosco et al. [12] and later expanded in Vrooman et al. [13] and De Boer et al. [14].This automatically trained k-Nearest-Neighbors segmentation method (auto-kNN) uses a similar strategy of initialization, estimation, and segmentation but offers two advantages over the previously mentioned EM-based methods: it uses a non-parametric kNN-classifier that can model more complex decision boundaries than a Gaussian intensity model; and it uses multiatlas registration to estimate the intensity model which is known to be more accurate and robust than single atlas registration [15].

Methods
The core of the method used in this Grand Segmentation Challenge was previously described in De Boer et.al. [14].This section will provide a summary of this technique.We did make some minor changes in the pre-and post-processing, which will also be outlined below.

Data
All experiments were performed on multi-sequence MR imaging data made available for this challenge.The scans were acquired at the UMC Utrecht (the Netherlands) in patients with diabetes and matched controls (with increased cardiovascular risk).In this work we used the following sequences: a T1-weighted scan (T1w), a T1-weighted inversion recovery scan (IR), and a fluid attenuation inversion recovery scan (FLAIR).All images had a voxel size of 0.958mm x 0.958mm x 3.0mm and were corrected for MR bias field artifacts.The IR and FLAIR images were also coregistered to the T1w.Five manually segmented datasets were available for training and parameter tuning.Labels were provided for the background, cortical GM, basal ganglia, WM, white matter lesions (WML), cortical CSF, ventricular CSF, cerebellum, and the brain stem.Twelve datasets were supplied to test the proposed method.For these datasets manual labels were held back for unbiased testing.Segmentations were evaluated on three tissue classes GM (cortical GM and basal ganglia), WM (WM and WMLs), and CSF (cortical CSF, cistern CSF and ventricles).Cerebellum and brainstem voxels were ignored during validation.All manual segmentations were performed by one of two trained observers on the T1w using a contouring tool.

Preprocessing
Preprocessing consisted of two steps: masking and intensity normalization.For the training images masks were obtained by binarizing the manual segmentations.For the test images, masks were computed using a multi-atlas segmentation method.As atlases we used the T1w training images, both in the original and in a left-right-flipped version, and their associated brain masks.Each atlas image was registered to the unlabeled test images using Niftyreg [16]; the registration was applied by computing an affine transformation, followed by a non-rigid deformation (using a 5mm B-spline grid and normalized mutual information).A final mask was then computed using STEPS [17].This method deforms both atlas images and labels, selects per voxel location the five most similar atlases (based on local normalized cross correlation), and fuses their labels using STAPLE [18].The masking procedure was different from De Boer et al. [14] in which a single atlas was used to obtain the intracranial space.Intensity normalization was performed for all images by a linear mapping obtained by setting the lower and upper 4th percentile intensities to zero and one, and interpolating the values in between.

Tissue segmentation
The tissue segmentation is initialized by constructing GM, WM, and CSF tissue probability maps in the unlabeled target image coordinate frame using multi-atlas registration.These maps are then used as a mask to sample multi-modal intensities from the target image.To correct for residual misregistration a pruning operation is performed using a clustering method.Finally, the resulting target-specific samples are used to train a kNN classifier that can be used to segment the target image.
For the challenge data, the tissue probability maps were constructed using the five manually labeled T1w training images.GM, WM, and CSF segmentations were created by fusing the eight available labels as described above.The atlases were first affinely registered to the target images, followed by a non-rigid B-spline registration.The deformations were computed with Elastix [19] using mutual information as similarity measure and a 5 mm B-spline grid.The tissue probability maps were then obtained by deforming the atlas labels and averaging them.
Multi-dimensional intensity brain tissue samples were then extracted from all images of the each target subject by thresholding the GM, WM, and CSF probability maps at 0.7 and randomly choosing 7500 voxels per class.These settings were based on previously published parameter tuning experiments on different data [13,14].This procedure allowed us to benefit from the well-documented ability of multi-atlas registration to compensate for registration errors.To remove any residual erroneously sampled tissue voxels a pruning step was performed.This was done by mapping the samples in the multi-dimensional feature space defined by the intensities of all scans and computing a minimum spanning tree.The tree was then iteratively pruned by removing connections for which its length exceeds a threshold of a constant times the average length of the other connections of a sample.This process is repeated until three large clusters remain that predominantly contain a single tissue class.All minority tissue samples in these three clusters were then removed, as well as any unconnected smaller clusters.
After this pruning step a k-Nearest-Neighbor classifier with a k-value of 45 was trained in the same feature space as the minimum spanning tree.The value for k was again based on previous experiments [13].Finally, a segmentation was obtained by applying this classifier to the target subject images.

White matter lesion segmentation and post-processing
The white matter lesion detection step was based on an automatically-selected threshold on the FLAIR scan [14].First the tissue segmentation from the previous step was used to localize the GM voxels in the FLAIR scan.Assuming that the voxels with the highest intensity in the histogram of these voxels are WML candidates, a threshold was set on 2.3 standard deviations higher than the location of the top of the smoothed histogram.This parameter was set based on previous experiments.The WML segmentation was further refined using two minor morphological operations [14].
Based on the visual inspection of the test results, two ad-hoc morphological operations were included to refine the results.Firstly, small local minima were relabeled as the surrounding class (MevisLab®: itkGrayscaleFillholeImageFilter), especially to fill small areas (< 3x3x3 voxels) in CSF that were labeled as background voxels.Since the brain masks had the tendency to overestimate the intracranial space, they included parts of the dura or bone marrow that were labeled as GM.Therefore, we applied a second post-processing step, in which GM voxels that directly bordered the background were relabeled as background.These operations were both not included in the work of de Boer et al. [14].

3
Experiments and results

Parameter tuning
To tune our processing pipeline for the challenge date, we created brain masks for the five training subjects by binarizing the corresponding label images (CSF+GM+WM) as said in Section 2.2.The probability maps for background, CSF, GM and WM were created using a non-rigid registration (as described in Section 2.3) of the label images using a leave-one-out-procedure.For each of the five training subjects the label images of the other four were used as atlases.
For the application of our automatically trained kNN-classifier, we tested different combinations of input sequences.In Table 1, the resulting Dice factors for three combinations are shown: 1) T1w + IR; 2) T1w + FLAIR; and 3) T1w + FLAIR +IR.Since the combination of the T1w and FLAIR scans yielded the best scores, we used that combination to segment the test subjects.As a second tuning step, we varied a number of parameters for the kNN-classifier.However, changing the number of samples, the k-value or the threshold for the probability maps didn't change the outcome scores in a significant way.We decided to use the default values as described in Section 2.3.Also for the WML detection, further tuning of the parameters did not improve the training results.In our opinion, this shows that our WML detection and our automatically trained kNN-classifier are reasonably robust when applied to different cohorts.

Challenge data segmentation
We applied the method to the twelve test images.Results were visually inspected and compared to the manual segmentations by the Challenge organizers.The accuracy was evaluated using the following measures: ─ Dice coefficient (DC) [20] ─ Modified Hausdorff distance (MHD) [21] ─ Absolute volumetric difference (AVD) [22] In In Figure 1, the tissue segmentation for test subject 3 is shown including the WML detection.Subject 3 has a large number of white matter lesions and Figure 1 shows that our WML detection is able to detect the white matter lesions in an acceptable way.For this challenge, only CSF, GM, and WM were required.Therefore, we relabeled the white matter lesions to WM to obtain the final segmentation results.s not needed ubject data it orithm; for oxel inten-L detection scan.
for tissue tself, after non-rigid registration of brain tissue atlases.Therefore, our segmentation pipeline works robust and accurate on data obtained from different scanners and with different scan protocols.Our in-house tissue atlases can be used for several, different populations.We have noticed, however, that using atlases matching the population better gives slightly better results.
In our opinion, we obtained reasonable segmentation results on the challenge data, with a slightly adapted pipeline.The segmentation results are sometimes a little bit more noisy than segmentation results published elsewhere, since we don't use for example a Markov Random Field or morphologic post-processing steps (smoothing of kNN posteriors) to clean up or smooth the segmentation result.We are planning to incorporate this step in the future, although we expect volume measures not to be influenced significantly by such a processing step.
Another issue is that our algorithm sometimes does not detect the total amount of GM in the globus pallidus and thalamus as can clearly be seen in Figure 2.This is a well-known problem with brain tissue segmentation.The globus pallidus derives its name from its pallid appearance in fresh unstained specimen.It is traversed by numerous bundles of heavily myelinated fibers, which give a relatively high intensity on a T1w scan and distinguish it from the dark appearance of for example the corpus striatum, the putamen and the caudate nucleus, which are generally better segmented.We have seen in previous cases that other type of scans, i.e. a T2w scan can improve the segmentation of the basal ganglia.Another possibility is the registration of atlases containing the manual segmentation of specific brain structures.
Looking at the final evaluation scores delivered by the challenge board, it is obvious that most errors were made on the CSF.In our opinion, this is mainly due to the fact that the created brain mask was sometimes crossing the intracranial border due to slight misregistration of the mask atlases.In previous applications of our segmentation pipeline only the cerebrum was involved.Since for this challenge also the CSF around cerebellum and brain stem had to be segmented, we needed to create and register new brain masks.The skull stripping is in our opinion still one of the most difficult steps in segmentation pipelines in general, leading to subsequent errors in the classification process.The applied post-processing steps could improve our results but not completely eliminate the errors.Our evaluation scores for intracranial volume are reasonable, for example compared to a publication of Iglesias [23].Since the CSF, however, is a relatively small set of voxels within the mask, the influence of mask errors on CSF segmentation is substantial.We are planning to improve the skull stripping step in the near future to deal with this overestimation of CSF.
In this challenge, a set of five training sets (including labels) were made available.We like to mention that if no manual segmentations are at hand, our processing pipeline can also be used using other atlases obtained from other institutions or build inhouse.In the last few years we are applying our brain segmentation pipeline on different cohorts (scanned on several types of scanners and with varying scan protocols).In most cases we achieve acceptable results using our in-house set of brain atlases.
The total runtime of our algorithm was about 2 hours per subjects.Most of the time was needed for the 15 non-rigid registrations involved.We are currently working on a parallelization of the Elastix code to reduce processing times.Another possibility to increase computation speed is the integration of the registration needed for skull stripping and for the creation of tissue probability maps.Speeding up the processing pipeline is especially relevant for the translation of our processing pipeline to a radiologic workstation in the clinic.For large population studies, increasing the speed of the pipeline is less urgent, since in that case we process large cohorts of patients or healthy subjects in parallel on a multi-core computing cluster.
We believe that if the limitations mentioned above are further investigated and solved, our fully automated brain tissue segmentation pipeline can be applied to a diverse set of cohorts with an accuracy and reproducibility that are comparable human raters.

Fig. 2
Fig. 2. Se above.Th segmentat average re the frontal The under The main classifica

Table 1 .
The resulting Dice factors (%) for the segmentation of CSF, GM, and WM in the training subjects, using different combinations of input scans (T1w + IR, T1w + FLAIR, and T1w + IR + FLAIR) for the kNN-classifier.With the second combination (T1w + FLAIR) we obtained the highest Dice scores during tuning.

Table 2
, the mean and standard deviation (Std) of the final scores for the twelve test subjects, as computed by the Challenge organizers, are shown.Five tissues were evaluated: GM, WM, CSF, brain (WM + GM), and all intracranial structures (WM + GM + CSF).From the three brain tissue classes, WM scored the best.Overall, CSF had the lowest evaluation scores.

Table 2 .
The resulting scores for the twelve challenge training subjects, as calculated by the MRBrainS13 Challenge Board.