Automatic segmentation of the hippocampus and the amygdala driven by hybrid constraints: Method and validation

.


Introduction
Volumetric analyses of brain structures have become increasingly common, for diagnostic purposes and for identifying disease progression.In this context, the hippocampus (Hc) and the amygdala (Am) are of major importance, due to their implication in epilepsy and Alzheimer's disease.Nevertheless, volume measurement of these structures remains mainly manual, making the study of large cohorts difficult.Fully automatic extraction of Hc and Am is challenging, due to the poor definition of some of their boundaries on Magnetic Resonance Imaging (MRI) scans and consequently prior knowledge (statistical shape information, atlas templates, probabilistic region distributions or anatomical descriptions) has to be taken into account in order to define their boundaries.
Most registration based segmentation methods (Haller et al., 1997or Dawant et al., 1999), are based on a single subject-template.The subject may be more or less representative of the population, resulting in possible bias (Carmichael et al., 2005, Park et al., 2005), and making intensity refinements necessary (Firbank et al., 2007, Barnes et al., 2008).A more thorough and robust approach (Svarer et al., 2005) would be to use multiple templates, either for building probabilistic information in standard space (Fischl et al., 2002 and2004;Mega et al., 2005;Pohl et al., 2007;Gouttard et al., 2007), or for information fusion in native (subject) space following parallel registration (Wang et al., 2005or Heckemann et al., 2006).More complex positional relationship information can also be derived between structures (Fischl et al., 2002), but some of the relationships thus inferred may break down in data from patients, being inferred from specific training samples; to address this specific issue, anatomical knowledge, derived only from anatomical characterisation, has been specifically designed to be stable for controls and patients (Bloch et al., 2005;Barra and Boire, 2001).
Fast fully automatic robust segmentation of healthy and pathological hippocampi and amygdalae suitable for routine use has yet to be demonstrated.On the one hand, segmentations based on global atlas information require high dimensional deformations to be computed, in order to achieve precise extraction (Heckemann et al., 2006).On the other hand, methods based on local anatomical and image information can be fast, but require good initialisation to be efficient and robust with respect to spurious edges (Shen et al., 2002).We have proposed a semiautomatic method using simultaneous region deformation constrained by anatomical landmarks automatically retrieved during the process.Initialisation of the deformation was performed manually by defining a bounding box and placing two seeds, one in the head of the hippocampus and the other in the middle of the amygdala (Chupin et al., 2007).
In this work, we propose an important upgrade of the algorithm that makes it fully automatic and more robust, by incorporating global spatial knowledge in the form of probabilistic atlases.This is based on a probabilistic atlas built in MNI space using MRI data from sixteen healthy subjects and a state-of-the-art registration method (Ashburner and Friston, 2005).Global information can be inferred, which allows to automatically initialise the deformation (bounding boxes and initial objects).The probabilistic information is also used as a new prior in the energy functional, minimised to drive the deformation.
The remainder of the article is organised as follows.First, the algorithm is described, focusing on its novel aspects; second, we evaluate the method by comparing its results with those of manual segmentation, on 1.5 T data from healthy subjects and patients with epilepsy, and on 3 T data from healthy subjects.

Overview of the segmentation algorithm
We present below a global overview of the upgraded operational steps of the method.Briefly, two objects iteratively deform through the reclassification of their border voxels, according to the minimisation of a global energy functional, through an Iterated Conditional Modes (ICM) algorithm.A competitive scheme is used to allow the identification of the partly visible Hc-Am interface.The segmentation is driven by the constraint of anatomical priors, derived from eleven sets of neuroanatomical landmarks.It has been thoroughly described in Chupin et al (2007).
In the new method, global probabilistic information is used to automatically initialise the deformation, namely determine bounding boxes and initial objects.Probabilistic prior knowledge is modelled by low and high likelihood zones, derived from isoprobability regions in the probabilistic atlases, and consisting of voxels with low or high probability of belonging to Hc or Am.This information is then used to constrain the segmentation as described in the section Constraint during the deformations.Furthermore, we have addressed a common problem faced by atlas-based segmentation methods, namely a possible local mismatch of the atlas, which is more frequent in patients with non standard appearance of a structure (here, only Hc).Below is an overview of the whole segmentation process.Only the upgraded steps are described in detail in the following sections (in brackets):

Atlas construction
Construction of the Am and Hc probabilistic atlases with healthy controls' datasets) (performed only once) • Two probabilistic atlases from manually segmented Hc and Am in 16 young healthy subjects.

Initialisation
Initialisation of the deformations: • Registration of the probabilistic atlases for Hc and Am to the subject's native space.• Automatic creation of two initial objects from the maximal probability level for Hc and Am.
Alternating iterative deformation (as in Chupin et al (2007) Constraint during the deformations • Setting of anatomical landmark set to NULL.

Homotopic deformation of Hc voxel front
• Selection of re-classification 'candidates' in the neighbourhood of the voxels of the Hc front; • Detection of interface voxels, landmarks and likelihood zones, ICM initialisation; • Voxel re-classification (ICM energy optimisation): at each iteration, for each voxel candidate, re-classification in the object leading to the smaller local energy: • For non-interface voxels, re-classification is restricted to either Hc or the background; • For interface voxels, re-classification is restricted to either Hc or Am.

Homotopic deformation of Am voxel front
• Selection of 'candidates' to re-classification in the neighbourhood of the voxels of the Am front; • Detection of interface voxels, landmarks and likelihood zones, ICM initialisation; • Voxel re-classification (ICM energy optimisation): • For non-interface voxels, re-classification is restricted to either Am or the background; • For interface voxels, re-classification is restricted to either Am or Hc.

Convergence criterion checking and termination
Construction of the Am and Hc probabilistic atlases with healthy controls' datasets N healthy young adult controls (here N = 16, S1-S16, age b 35 years), scanned on a 1.5 T Signa scanner (GE Medical Systems, Milwaukee, WI, USA) (see Table 1 for IR-FSPGR sequence parameters), were used to build the probabilistic atlas.The images were manually segmented by an expert following a 3D voxel-based protocol described in Chupin et al (2007).This step resulted in 2N = 32 binary-labelled datasets, N with right and left Hc {L Hc i , i =1...N} and N with right and left Am {L Am i , i = 1... N}.The transformations from native space to MNI standard space, {T i , i = 1...N}, were computed using the unified segmentation module available in SPM5 (Ashburner and Friston, 2005, http://www.fil.ion.ucl.ac.uk/spm/software/spm5/), which allows to iteratively optimise simultaneously registration parameters (linear combination of cosine transformation bases), tissue classification, and intensity non-uniformity correction.The registration step was necessary to create a probabilistic atlas, but did not aim at achieving the segmentation only by propagating labels.The default parameters of SPM5 were used.The binary-labelled datasets were registered to the MNI standard space following T i (resolution: 0.9375 × 0.9375× 1.3 in a 256 × 256 × 124 matrix).We obtain the two probabilistic atlases PA Hc and PA Am : where v is a voxel in the MRI set Ω. PA Hc (v), respectively PA Am (v), is the probability that v belongs to Hc, respectively Am, according to the prior derived from the young controls.

Initialisation of the deformations
Individual probabilistic atlases, IPA Hc and IPA Am , were created by back-registering the two probabilistic atlases, PA Hc and PA Am , to the subject's space, using the inverse transformation T − 1 given by the unified segmentation.If the test dataset was one of the atlas datasets, the atlas was computed with a leave-one-out strategy.Information inferred from the probabilistic atlases was used to automatically determine the limits of two bounding boxes (right, left) and create initial objects for Hc and Am as follows.

Bounding boxes
Bounding box BB HcAm extraction mainly aimed at reducing memory load; it should fully contain Hc and Am, but could be larger, as it was not used to geometrically constrain the segmentation.Each bounding box was thus defined as the smallest parallelepiped subvolumes in Ω around the non-null probability Hc-Am object HcAm min , with an additional layer of one voxel, as illustrated in Fig. 1a.

Initial object
The initial object Hc init (respectively Am init ), was created from the maximum probability object Hc max (respectively Am max ).Hc max (respectively Am max ) was defined as the 1-level of the probability map IPA Hc (respectively IPA Am ), which was built by keeping the voxels for which the probability equals one, while regularising to prevent holes (IPA Hc (v)b1 but v is "inside" Hc max ) and wires (IPA Hc (v) = 1 but v "spikes" from Hc max ) to appear.More precisely, let

N 6N
O v ð Þ be the number of 6-neighbours of v labelled in an object O.
O v ð Þ is smaller than 1, including v in O will result in a wire.Combining regularity rule and probability threshold iteratively, we obtain: with equivalent equations for Am max .The iterations proceeded until Hc max (respectively Am max ) remained unchanged.Hc max (respectively Am max ) was then eroded with a 1 mm-structuring element, and the largest connected component was kept to create the initial objects Hc init (respectively Am init ).The regularised initial object is illustrated in Fig. 1b.

Automatic detection of atlas mismatch and correction strategies
Atlas-based segmentation method can result in errors due to gross mismatches of the registered atlas, most of all in data from patients.Initial tests of the new method showed that some atlas mismatch could occur occasionally.Therefore, a strategy was implemented, based on successive tests comparing intensity characteristics for the grey matter (GM) estimated on the bounding box BB HcAm and on the initial object Hc init or the 0.5-level probability object Hc 0.5 , defined as: The test values were compared with the average and standard deviations for data S1-S16 obtained when using the S1-S16 atlas.Note that the same values were used for all subjects, as no atlas mismatch was detected for subjects S1-S16.The mismatch of the atlas was detected and corrected according to a global to local sequence: detection and correction of a global misplacement of the registered atlas, detection and correction of mis-estimation of atrophy as recovered by the registered atlas, detection and correction of a misplacement of the computed initial objects and, finally, detection of a remaining global misplacement of the atlas.Details on each test are given in Appendix A.

Constraint during the deformations
The local energy E(v) was made of five terms for each structure (Chupin et al., 2007): global data attachment (E G ), local data attachment (E L ), and context terms dedicated to non-stationary anisotropic Markovian regularisation (E I ), volume (E V ) and surface (E S ) control.
The regularisation term E I of the energy functional was modified to take into account the prior probability of the voxel v to belong to the deforming object O (Hc or Am), derived from IPA Hc and IPA Am .E I was locally expressed as the comparison of the number of O-labelled neighbours of v, N O (v), and a standard number of neighbours, Ñ (13, in 26-connectivity), with a tolerance σ I around Ñ: As described in Chupin et al (2007), the α T parameter modelled an anisotropic non-stationary behaviour of the regularisation towards the tail.The γ parameters influenced the classification according to prior probabilities of v belonging to O; for γ values superior to 1, N O (v) was artificially increased, thus decreasing the global energy, and vice versa for values inferior to 1.The γ O AZ parameter modelled the constraint introduced in anatomical zones AZ defined by the anatomical landmarks already described in Chupin et al (2007).
The new γ O PZ parameter modelled the constraint introduced in four probability zones PZ inferred from probability levels in IPA Hc and IPA Am .Four levels of constraints (C1-C4), from one PZ to four PZ, were compared, as detailed in Appendix B.

Segmentation performance evaluation
Segmentation performance was evaluated based on qualitative and quantitative comparisons between automatic (without and with atlas constraint), semi-automatic (with manual initialisation, as in Chupin et al., 2007) and manual segmentations (Chupin et al., 2007), together with comparison to the 0.5-level object derived from the registered atlas in each subject's space (Hc 0.5 and Am 0.5 ), as a basic atlas-based segmentation.
All datasets were manually segmented according to the same protocol as that used to create the probabilistic atlas, by the same investigator.For 3 T data, non-uniformity correction was performed with the unified segmentation (Ashburner and Friston, 2005).All automated segmentations were evaluated by comparison with the reference manual segmentation.A global qualitative evaluation was followed by a quantitative evaluation of the performances, which was based on the quantitative indices described in Chupin et al (2007): relative error in volume RV; Dice overlap, K, false positives and negatives ratios, FP and FN; misclassified interface voxels, MIV; and two measures based on the symmetric Haussdorf distance: mean value over the surface, Dm and maximal value, DM.Statistical significance of the differences was evaluated using a paired-samples Student t-test with SPSS.

Implementation issues
Most of the settings described in Chupin et al (2007) for the semiautomatic segmentation were used in this work.Two mechanisms were improved, resulting in different parameters, for intensity estimation and anisotropic regularisation.
Grey matter intensity characteristics are now estimated by fitting three Gaussian curves to the intensity histogram on the ROI (one for the grey matter, one for the white matter and one for the mixture).Average intensity parameters for the global and local data attachment terms described in Chupin et al (2007) are derived from average intensity of grey matter, with predefined ratios as previously (1 for Hc and 0.9 (1.5 T) or 0.95 (3 T) for Am).Tolerances are derived from standard deviation of grey matter intensity, with predefined ratios (1.8 for Hc and 1.1 for Am).The influence of this setting on the segmentation results is discussed in Appendix D.
The anisotropic regularisation in itself is described in Chupin et al (2007), but it is now introduced in a non-stationary way: its weight increases following an anterior-posterior direction in the bounding box BB HcAm , and the mechanism influences the segmentation more strongly in regions where the tail is anatomically more likely to be present.
New parameters were defined for the atlas constraint, namely the weighting parameter γ O PZ , which was empirically chosen, while The whole automatic segmentation procedure, including registration, requires less than 15 min for both left and right Hc and Am and is integrated as the SACHA module (Segmentation Automatique Compétitive de l'Hippocampe et de l'Amygdale) in BrainVISA (Cointepas et al., 2001, www.brainvisa.info).

Validation on the data used for atlas construction (group 1)
The segmentation results were first analysed qualitatively (Fig. 2 for the best and worst results determined as in Chupin et al (2007)).Global visual observation showed that the results were in general more accurate for S1-S16 with the constrained automatic segmentation.
The results were also compared against manual segmentation with the quantitative indices, as shown in Table 2. Overall, the automatic segmentation with the optimal atlas constraint (see Appendix B; "constrained automatic" in Table 2) and the mismatch correction strategy gave accurate results for both Hc and Am.This method performed better than semi-automatic segmentation (Chupin et al., 2007)  strategy had a negligible effect on the results, due to correct registration.Finally, the automatic segmentation outperformed the atlas-based segmentation given by the 0.5-level object, even if the improvement is not as marked for Am.The last column of the table shows that the anatomical priors are still necessary to ensure accurate segmentation, most of all for the most problematic cases.
Fig. 2. Segmentation results for group 1: 3D-renderings of automatic and manual segmentations, and overlap between segmentations (manual segmentations in shades of grey) and probabilistic atlases on a sagittal slice for the best and worst results (cases S3R and S16R respectively).
registration was locally inconsistent: atlas mismatch was detected in 6 out of 9 sclerotic Hc and in one normal Hc of a patient with TLE and both Hc of a patient with TLE with grey matter heterotopia in the Hc region.
Note the improvement of the segmentation brought by atlas mismatch correction, illustrated for the worst case in Fig. 3.
Quantitative comparisons between automated and manual segmentations are summarised in Table 3. Overall, the automatic segmentation with atlas constraint and the mismatch correction strategy gave correct results for Hc and promising results for Am.This method performed largely better than the semi-automatic segmentation (Chupin et al., 2007)  ; the atlas mismatch correction strategy slightly improved the average results, but was highly useful for sclerotic patients, as described in the following paragraph.Finally, the automatic segmentation outperformed the atlas-based segmentation given by the 0.5-level object, even if the improvement is not obvious for Am; note that the potentially pathological structure was only Hc.As for controls, the anatomical priors are still necessary to ensure accurate segmentation, most of all for the most problematic cases.
Results for the fully automatic segmentation with atlas constraint and mismatch correction were RV = 14%, K = 77% and DM= 5 mm for the 9 sclerotic Hc.The greater shape and signal variations in these Hc is again shown by the overall inadequacy of the 0.5-level object (RV =39%, K = 59% and DM= 6.4 mm for the 9 sclerotic Hc).The semi-automatic segmentation was unable to cope with the loss of image quality in the hippocampal region in some cases (RV = 31%, K = 73% and DM= 10.3 mm for the 9 sclerotic Hc).Finally, the effect of the mismatch correction on the quantitative values for Hc was large for some subjects, leading to an improvement of 19 percentage points for the maximal RV value and 11 percentage points for the minimal K value overall sclerotic Hc, indicating that the correction was efficient for cases for which the segmentation failed otherwise.Note that, after correction, the maximal value for RV (35%) was obtained for HS3L, for which the manually estimated volume was the smallest in the group studied, namely 0.7 cm 3 ; the error corresponds to 0.3 cm 3 or 10% of the average manual volume of NC1-NC8.
Evaluation on the 3T control cohort (group 3) The segmentation was qualitatively correct for all the subjects, as illustrated in Fig. 4 top rows.For one subject (NC3T7) the left Hc appeared mis-rotated, resulting in a mismatch of the registered atlas, correctly detected and corrected by the automatic strategy, as illustrated in Fig. 4 bottom row.
The results of the quantitative analysis are given in Table 4. Overall, the automatic segmentation with the atlas constraint and the mismatch correction strategy gave accurate results for Hc and promising results for Am.This method performed better than the semi-automatic segmentation (Chupin et al., 2007) (for Hc and Am, 3.6 ± 0.9 (2-6.9)3.6 ± 0.9 (2-6.9)3.6 ± 0.9 (2-6.0) Values = average ± standard-deviation (minimummaximum).Mismatch was detected only for the left Hc of NC3 T7 and the correction resulted in an improvement for K from 66 to 78%, and for DM from 11.5 mm to 4.8 mm.Finally, the automatic segmentation outperformed the atlas-based segmentation given by the 0.5-level object for Hc, but the tendency was reversed for Am; this may be due to decrease in contrast in these datasets and lower variability in Am than in Hc.As for controls, the anatomical priors are still necessary to ensure accurate segmentation, most of all for the most problematic cases.

Discussion
We have demonstrated that the introduction of hybrid knowledge, in the form of local anatomical priors and global and local spatial probabilistic priors, allows robust accurate, fast fully automatic segmentation of the hippocampus and the amygdala.In addition to being fully automatic, the new method is more robust than the semi-automatic one to variation in image quality and image acquisition parameters.In patients with hippocampal sclerosis, the segmentation was made more difficult by the wide variety of sclerosis patterns and poorer image quality.Nevertheless, the full method which incorporates a simple strategy for automatic detection and correction of local atlas mismatch resulted in consistent segmentations even for patients with severe sclerosis.
A key point of the method is that both control and patient datasets are segmented with the same parameters and the same atlas, built with datasets from young healthy controls.Robustness with respect to acquisition parameters is greatly increased in the new method compared to the previous, semi-automatic, version (Chupin et al., 2007).We have also shown that both the atlas and anatomical constraints benefit the segmentation, as accuracy decreases when the anatomical constraint is removed.The segmentation performance is stable in relation to relatively important variations in parameter settings, introduced in data attachment energy terms and initialisation, as shown in Appendix D, and related to atlas constraint, as shown in Appendix B and C. It has also been tested on other datasets, as detailed in Chupin et al (2008).
Regarding atlas registration methods, exact registration is not necessary, as the segmentation is not directly derived from the atlas.On the one hand, semi-deformable registration was preferred to a rigid transformation, as it gave the means to encompass larger variability and improve the probabilistic constraint, while remaining fast.SPM5's registration method was chosen due to its availability and Table 4 Quantitative indices for comparison against manual segmentation for group 3 (average value ± standard deviation (minimummaximum)).

Index
Semi-automatic 0.5-Level object Automatic Constrained automatic Corrected automatic Corrected automatic no anatomical prior Fig. 4. Segmentation results for group 3: 3D-renderings of automatic and manual segmentations, and overlap between segmentations (manual segmentations in shades of grey) and probabilistic atlases on a sagittal slice for the best and worst results (cases NC3T2L and NC3T4L respectively) and NC3T7L, for which atlas mismatch is detected.
rapidity, and considering that more deformable methods, such as the one described in Heckemann et al (2006), would not necessarily have resulted in a perfect match, as demonstrated in Hammers et al (2007) where a correction of the registered atlases by the SPM5 grey matter map was used for patients with epilepsy.On the other hand, it was not necessary to use a fully-deformable registration method, which would have been more time consuming, as for example the LDDMM method used in Khan et al (2008), which needs around 90 min for each structure on a subvolume, with parallel computing.
For the hippocampus, the performance of the fully automatic method was superior to that of both semi-automatic and atlas-based (derived from the 0.5-level object) versions.The preliminary results of the automatic strategy for detection and correction of initial mismatch allowed more robust segmentation of extreme cases.More extensive tests in other populations are required to ensure that it is robust enough for a large range of acquisition sequences and pathological hippocampi.Nevertheless, most of the mismatches which are detected are located at the interface with the ventricle, which makes the detection more robust to variations in the image acquisition parameters than if they were between white and grey matter.The mismatch correction strategy remains very simple and more complex strategies, for example including rotations and a stochastic search in the neighbouring positions, will be evaluated.
For the amygdala, the comparison of the performances of the different methods is not so straightforward: except for the young controls S1-S16, the results of the atlas-based segmentation are comparable or better than the results of the automatic segmentation.This can be explained by a lower variability for the amygdala compared to the hippocampus in our samples, no amygdala-specific pathology being considered, and also by less well-defined borders on MRI for the amygdala, particularly for the anterior border, which is not clearly visible on MR images and makes the refinement step less robust.

Comparison with other approaches
Published results for segmentation methods are difficult to compare because of different subject samples, image quality and evaluation strategies.Nevertheless, a comparison of values reported in the literature gives a rough estimate of the relative performance of our segmentation method.For healthy controls, the segmentation results (a K overlap value of 87% for Hc and 84% for Am) compare favourably with the published studies for which a quantitative evaluation is available.Furthermore, few methods are evaluated on both 1.5 T and 3 T data.Other segmentation methods have been proposed in the literature, using a first step based on the registration of a probabilistic atlas and a intensity refinement step (Firbank et al., 2007, andBarnes et al., 2008); nevertheless, these methods do not include any mechanism to ensure 3D coherence and register single subject templates.Most importantly, our segmentation is less biased, because the atlas-derived constraint is not binary, and its effect is balanced by the anatomical priors and the data attachment terms; this is demonstrated by the behaviour of the segmentation results while varying the atlas definition.Results for the learning curve and using an atlas built with a very different manual segmentation protocol are given in Appendix C. Furthermore, the segmentation is more robust than the above mentioned methods, because our homotopic and regularised segmentation mechanism ensures 3D coherence of the segmented object.
Amongst single atlas-based methods, K is 84% and RV is 6% for Hc (Hogan et al., 2000) after the manual placement of 28 landmarks, with the HDBM method described by Haller et al (1997), registering a single subject atlas; the method is also initialised by the FreeSurfer segmentation in Khan et al (2008), to make it fully automatic, but the K values are only 75% for Hc in young healthy subjects.Recently, a method based on the registration and segmentation module of SPM5, using a Hc template drawn in MNI space (Firbank et al., 2007) was evaluated on 9 elderly controls with an RV of 5% and a K of 74%.
Amongst multiple or probabilistic atlases approaches, a high dimensional deformation of a probabilistic atlas followed by a thresholding of the probability maps (Gouttard et al., 2007), gives K value of around 70% for both Hc and Am; note that there may be some differences in the protocol used to build the atlas and that used for the test subjects.Heckemann et al (2006) registered independently presegmented subjects to a target, with a high-dimensional deformation method followed by determining the most probable label, and report K values of 82% for Hc and 81% for Am.For subjects with medial temporal lobe epilepsy, the same method combined with grey matter thresholding based on SPM5 yielded K values of 83% for non-sclerotic Hc (Hammers et al., 2007).Another method, based on finding the best match amongst a library of Hc templates, with a refinement step based on intensity thresholds and conditional dilation, was evaluated on 19 elderly controls, with a K of 84% (Barnes et al., 2008).
Some hybrid methods were also reported, using atlas-based approaches and classification, as in the approach we propose.The method developed by Fischl et al. (2002) for FreeSurfer, based on a Markovian classification using priors derived for a probabilistic atlas, achieved 80%/65% for Hc / Am.In a more recent study (Han and Fischl, 2007), the algorithm was made less sensitive to scanner platform, by renormalizing atlas intensity distributions; it was tested on two samples acquired on two different scanners, with K values for Hc/Am of 79% / 69%, on the new scanner and 86% / 81%, on the "training" scanner, respectively.Pohl et al (2007) reported the introduction of a probabilistic atlas prior in a hierarchical framework, and K values were 81% for Hc and 86% for Am.

Conclusion
In conclusion, the method we proposed proved fast, accurate, robust to changes in acquisition parameters and field strength, and quite robust to pathology.The method has been validated on data from different institutions and patients with epilepsy with known hippocampal sclerosis.One of its main advantages is to require only one single probabilistic atlas for segmenting even highly pathological structures, as the method uses the information derived from probabilistic atlases in combination with anatomical priors.Results of the automatic strategy for detection and correction of mismatches of the registered atlas proved efficient in cases when it was most necessary, namely patients with focal sclerosis.

Appendix A. Atlas mismatch detection and correction
The four steps of the mismatch detection and correction strategy are all related.The order in which they were applied was chosen to be from the more global mismatch to the more local one, ending with a last check on the global atlas once all the corrections are done.

Detection and correction of global misplacement
The first test detected whether global atlas misalignment had taken place in the region of interest.It was based on the assumption that Hc 0.5 misalignment would correspond to a wider intensity range.σ Hc 0.5 , the standard deviation of the intensity on Hc 0.5 , was therefore compared to σ Hc 0.5E , the standard deviation on Hc 0.5E , created by eroding Hc 0.5 with a 1-voxel element.For large misplacements of Hc 0.5 , Hc 0.5E would also include comparable proportions of extrahippocampal tissues, and the ratio R global = σ 0.5E /(σ 0.5 − σ 0.5E ) would be above the threshold, defined as 4 standard deviations above the average.If misplacement was detected, it was corrected by iteratively minimising R global through a search amongst the objects Hc 0.5 translated by one voxel in the 6 main orientations (x ± 1, y ± 1, z ± 1).

Detection and correction of mis-estimated atrophy
The second mismatch test was aimed at detecting cases in which the deformed atlas failed to match the Hc atrophy.˜i0:5 , the average intensity on Hc 0.5 , was compared to ˜iGM , the average intensity of GM on BB HcAm ; the assumption was that an object Hc 0.5 which was overestimated compared to the actual Hc would include too many cerebrospinal fluid (dark) voxels, and the ratio R atrophy = ˜i0:5 = ˜iGM would be too low; mismatch was detected when R atrophy was under the threshold, defined as 4 standard deviations under the average.The correction relied on shrinking IPA Hc by erosion, to create IPA E Hc .To do so, the atlas was divided into probability level objects, PL, as follows: The PL objects were eroded with a 1-voxel element to create PL Ei Hc .The eroded atlas IPA E Hc was defined as follows: Hc then replaced IPA Hc .The correction was taken into account only if the initial object created from IPA E Hc was not below 5% of the average size.

Detection and correction of initial object misplacement
The third mismatch test detected a misplacement of Hc init .It compared σ init , the standard deviation of the intensity on Hc init , to σ GM , the standard deviation of the intensity of GM estimated on BB HcAm .The assumption was that, if Hc init was not entirely included in the target Hc, it would include a wider range of intensities, and the ratio R init = σ init /σ GM would be too high, the threshold being defined as 4 standard deviations above the average.Initial misplacement was corrected as global misplacement, the whole atlas IPA Hc being then translated according to the displacement of the initial object.

Final detection of global misplacement
It could happen that Hc was in fact not misplaced but mis-rotated, and would not be corrected by the previous quick strategy.This test was the same as in the section Detection and correction of global misplacement.In the case that no better position had been found during the correction process, misplacement was detected again and the probabilistic constraint was removed for Hc, thus resulting in a segmentation not wrongly biased by the probabilistic atlas.

Illustration of the measures of mismatch
In order to estimate the actual misplacement of the registered atlas and evaluate how well it was detected by our strategy, we computed quantitative indices for comparing atlas-derived objects and manual segmentation in our three test groups.Note that the misplacement detection does not require manual segmentations; these are used to facilitate visualisation.This figure also shows how the threshold of 4 standard deviations divides between the subjects with and without actual mismatch, and that the atlas is correctly registered in most subjects.The threshold was chosen at 4 instead of 2 standard deviations, in order to detect only large mismatches.
For the global misplacement of the registered atlas, we compute the average symmetric distance between the 0.5-level object and the manual segmentation; it is displayed as a function of the test value in Fig. 5a.Five global misplacements are correctly detected: three for sclerotic hippocampi, one for a TLE patient with grey matter heterotopia in the hippocampal region, and a normal control in whom the left hippocampus was obviously and largely rotated compared to the standard position.
For the mis-estimation of Hc atrophy by the registration procedure, we compute the relative error on volume (the absolute value is not taken here, as only over-estimations are of interest) between the 0.5-level object and the manual segmentation; it is displayed as a function of the test value in Fig. 5b.A large misestimated atrophy is detected for one single sclerotic hippocampus.The registration was correct regarding atrophy estimation in the other sclerotic hippocampi.
Finally, in order to detect the actual initial object misplacement, we can compute the false positive ratio between the initial object and the manual segmentation, thus indicating the proportion of misclassified voxels in the initial object; it is displayed as a function of the test value in Fig. 5c.The largest initial object misplacement was not detected, because the initial Hc was partly in Am, which intensity is nearly similar as Hc's intensity.

Appendix B. Atlas constraints
To introduce the atlas constraint was to consider four discrete probability zones PZ: T being here equal to 0.25, PZ 0 and PZ 0.25 corresponded to low likelihood zones; PZ 1 and PZ 0.75 corresponded to high likelihood zones.In order to evaluate the usefulness of each region, four constraints C1, C2, C3 and C4 were built including more and more regions, C1 using the information in PZ 0 , C2 in PZ 0 and PZ 1 , C3 in PZ 0 , PZ 1 and PZ 0.75 , and C4 in the four PZs.
The choice of the four probability thresholds defining the probability zones PZ was motivated by the "U" shape of the probability histogram and the true (TP) and false positives (FP) rates as a function of IPA O (v) for the leave-one-out atlases (Figs.6a and b).The choice of the probabilities 0 and 1 is natural; the intermediate levels reflect the plateaus observed in the true positive rate below the 0.75-probability level, respectively in the false positive rate above the 0.25-probability level, indicating that the number of voxels correctly (respectively incorrectly) labelled in the atlas increases in both structure dramatically above (respectively below) the indicated probability threshold.
The γ O PZ values were empirically chosen, with respect to the γ O AZ values and after a preliminary test on S1-S16, and were taken as follows (for probability zones not taken into account here, γ O PZ = 1): The PZ and γ O PZ values are illustrated in Fig. 6c and d for the 4 levels of constraint evaluated here.
The effect of the constraint can also be characterised by comparing the results obtained when adding the PZ regions sequentially, for the atlas construction cohort.The results are shown in Table 5 for a subset of four quantitative indices.They show that the refinement of the constraint tends to improve the segmentation accuracy.The difference between C3 and C4 is slight, C4 being mostly better for Am, and thus considered as the optimal constraint.
The influence on the segmentation results when varying the threshold T between 0.1 and 0.5 is very low.When T varies between 0.1 and 0.3, RV and K vary less than 1 percentage point for Hc and Am, apart from RV for Am which varies of ∼ 3 percentage points for C3 and 1 percentage point for C4.When T varies between 0.1 and 0.5, the variation is still less than 1 percentage point for K, for Hc and Am; it becomes larger for RV for Am with C3 (∼ 6 percentage points), but remains low enough for Hc (∼ 2 percentage points) and for both structures with C4 (b1 percentage point for Hc and ∼ 2 percentage points for Am).The variation in the values of the quantitative indices is much larger when switching between C1 and C4 than when varying T.
The effect of varying γ O PZ values has also been studied.For each PZ, we have run the segmentation with the constraint in which this PZ is introduced (for example C2 for PZ 1 ) while letting the value of γ O PZ vary of 30% of the parameter value, while ensuring the constraint that unlikely zones will correspond to values lower than 1 and vice-versa.
Results for RV and K values for Hc and Am are presented in Fig. 6e.
-For PZ 0 , γ O PZ varies between 0.5 and 1, using C1; RV varies of 3 percentage points for Hc and 2 percentage points for Am and K varies of 1 percentage point for Hc and 5 percentage points for Am.
-For PZ 1 , γ O PZ varies between 1.4 and 2.6, using C2; RV and K for Hc and Am vary less than 1 percentage point.
-For PZ 0.75 , γ O PZ varies between 1 and 2 using C3; RV for Hc and K for Hc and Am vary between 1 and 2 percentage points; RV for Am varies of about 3 percentage points.
-For PZ 0.25 , γ O PZ varies between 0.6 and 1, using C4; K varies of about 3 percentage points for Hc and 1 percentage point for Am, and RV varies of 12 percentage points for Hc and 7 percentage points for Am.Nevertheless, as γ O PZ is set to 0.75, and as the confidence in PZ 0.25 is lower than the confidence in PZ 0 , the only real range which has a meaning for γ O PZ is 0.75 to 1; in this range, K varies of about 1 percentage point for Hc and Am, and RV varies of 6 percentage points for Hc and 3 percentage points for Am.
These variations are thus very small compared to the variation range of the parameters; only γ O PZ associated with PZ 0.25 causes a larger variation of RV for both structures.First, the variation of the indices is to be compared to the variation between with the values of the indices with and without the atlas constraint: 2 percentage points for RV for Hc, 4 percentage points for K for Hc, 6 percentage points for RV for Am, 8 percentage points for K for Am.This variation is explained by adding the constraints in the 4 PZs at the same time; the values reported in Table 5 when adding the PZs one by one are consistent with the variation range reported above, apart from PZ 0.25 , for which the improvement (between C3 and C4) is less obvious.

Appendix C. Influence of the probabilistic atlas on the segmentation results
The influence of the probabilistic atlas introduced in the segmentation was tested in two ways, for S1-S16.First, the influence of the number of subjects used to build the atlas was evaluated, through a "learning curve" procedure, by comparing the results of the segmentation with an atlas built from 4, 8, 12 subjects, in addition to the leave-one-out procedure and the full atlas built from S1-S16.Second, the influence of the age range and the manual segmentation protocol on the final result was established by using an atlas with the 10 adults from the 18 subjects IBSR database (the MR brain data sets and their manual segmentations were provided by the Center for Morphometric Analysis at Massachusetts General Hospital and are available at http://www.cma.mgh.harvard.edu/ibsr/),using the manual segmentations supplied.All tests were performed without change in the segmentation parameters.
with atlas constraint (a), and the 0.5-probability object (b).For RV and K, the effect on the mean values is noticeable for the 0.5-level object but far more reduced for atlas-constrained automatic segmentation.
For DM, the effect is weak for both the segmentation and the 0.5-level object, on both mean and maximal results.Finally, it can be noted that the differences between the leave-one-out atlas and the full atlas are very small.Therefore, for the automatic segmentation, a sample of scans from 15 healthy subjects is likely to be sufficient to take into account the variability of the young healthy subjects within our framework.

Fig. 1 .
Fig. 1.Automatic initialisation illustrated on subject S1.(a) Bounding box extraction, on sagittal, coronal and axial sections.(b) Probabilistic atlas, maximal probability zone (IPA O (v) = 1) obtained by thresholding and regularised thresholding and initial object, for Hc and Am, on two representative axial slices (one per row).
keeping it consistent with the γ O AZ values (γ O AZ = 2 for O-likely zones and 0.5 for O-unlikely zones) (cf Appendix B).
(for Hc and Am, respectively: improvements of 2 percentage points (p = 0.13) and 5 percentage points (p = 0.005) for RV, 3 percentage points (p b 0.001) and 4 percentage points (p b 0.001) for K and 1 mm (p b 0.001) and 0.7 mm (p b 0.001) for DM).The atlas constraint improved systematically the results (for Hc and Am, respectively: improvements of 2 percentage points (p = 0.046) and 6 percentage points (p b 0.001) for RV, 4 percentage points (p b 0.001) and 8 percentage points (p b 0.001) for K and 1.2 mm (p b 0.001) and 1.3 mm (p b 0.001) for DM) whereas adding the mismatch correction (for Hc and Am, respectively: improvements of 6 percentage points (p = 0.018) and 11 percentage points (p b 0.001) for RV, 3 percentage points (p = 0.003) and 3 percentage points (p b 0.001) for K and 2.7 mm (p b 0.001) and 1.1 mm (p b 0.001) for DM).The atlas constraint systematically improved the results (for Hc and Am, respectively: improvements of 7 percentage points (p = 0.009) and 18% (p = 0.007) for RV, 8 percentage points (p b 0.001) and 14% (p b 0.001) for K and 3 mm (p b 0.001) and 2.7 mm (p b 0.001) for DM)

Fig. 3 .
Fig. 3. Segmentation results for the patients with Hc sclerosis in group 2: 3D-renderings of automatic and manual segmentations, and overlap between segmentations (manual segmentations in shades of grey) and probabilistic atlases on a sagittal slice for the best and worst results (cases HS8R and HS5L respectively).

Fig. 5 .
Fig. 5. Measure of actual atlas mismatch (Y-axis) as a function of detection tests (X-axis: z-score of the test value with respect to S1-S16): (a) global mismatch (measure = mean symmetric distance between the 0.5-level object and the manual segmentation); (b) atrophy mismatch (measure = relative error on volumes between the 0.5-level object and the manual segmentation); (c) initial object mismatch (measure = false positives ratio between the initial object and the manual segmentation).The solid vertical line indicates the threshold value and the dashed line indicates the median between the average value and the threshold.All the points in the green area are those for which mismatch is detected.

Fig. 6 .
Fig. 6.Definition of the probabilistic atlas constraints C1, C2, C3, and C4.(a and b) Histogram of the area corresponding to voxels which are inside (true positives, TP) and outside (false positives, FP) the manual segmentation as a function of IPA O (v), for the leave-one-out atlases, for S1-S16 (a. for Hc and b. for Am).(c and d) probability zones (PZ) used in the 4 levels of probabilistic atlas constraint (C1-C4) evaluated for the method, on an axial slice (c. for Am and d. for Hc).(e) Sensitivity to γ PZ parameters (RV and K for Hc (in red) and Am (in green) when varying γ PZ ) (see text for details).
• Automatic definition of a bounding box delimiting the region of interest (ROI, Hc and Am); • Automatic estimation of intensity characteristics from the ROI intensity histogram; • Atlas mismatch detection and correction;

Table 1
Acquisition parameters for the MRI datasets used to build the atlas and in the evaluation process.

Table 2
Quantitative indices for comparison against manual segmentation for group 1 (average value ± standard deviation (minimummaximum)).

Table 3
Quantitative indices for comparison against manual segmentation for the group 2 (average value ± standard deviation (minimummaximum)).

Table 5
Quantitative indices for the segmentation run with the 4 levels of probabilistic atlas constraint for S1-S16

Table 6
Quantitative indices for atlases derived from various numbers of subjects and from another set of subjects and manual segmentations (IBSR) for a. the atlas-constrained automatic segmentation and b. the 0.5-level object for S1-S16.