Systematic validation of an automated thalamic parcellation technique using anatomical data at 3T.

The thalamus is a brain region formed from functionally distinct nuclei, which contribute in important ways to various cognitive processes. Yet, much of the human neuroscience literature treats the thalamus as one ho-mogeneous region, and consequently the unique contribution of speciﬁc nuclei to behaviour remains under- appreciated. This is likely due in part to the technical challenge of dissociating nuclei using conventional structural imaging approaches. Yet, multiple algorithms exist in the neuroimaging literature for the automated segmentation of thalamic nuclei. One recent approach developed by Iglesias and colleagues (2018) generates segmentations by applying a probabilistic atlas to subject-space anatomical images using the FreeSurfer software. Here, we systematically validate the eﬃcacy of this segmentation approach in delineating thalamic nuclei using Human Connectome Project data. We provide several metrics quantifying the quality of segmentations relative to the Morel stereotaxic atlas, a widely accepted anatomical atlas based on cyto- and myeloarchitecture. The automated segmentation approach generated boundaries between the anterior, lateral, posterior, and medial divisions of the thalamus. Segmentation eﬃcacy, as measured by metrics of dissimilarity (Average Hausdorﬀ Distance) and overlap (DICE coeﬃcient) within groups was mixed. Regions were better delineated in anterior, lateral and me- dial thalamus than the posterior thalamus, however all the volumes for all segmented nuclei were signiﬁcantly diﬀerent to the corresponding region of the Morel atlas. These mixed results suggest users should exercise care when using this approach to study the structural or functional relevance of a given thalamic nucleus.


Introduction
The study of how the brain contributes to cognitive processes in human neuroimaging research often focuses on studying the cerebrum, while subcortical processes -and in particular the role of the thalamus -are typically understudied. However, an increased appreciation for the contribution of the thalamus in cognition is emerging in the neuroscientific literature ( Wolff & Vann, 2019 ), echoing historic lesion based work (e.g. Bogousslavsky et al., 1988 ;Cheek & Taveras, 1966 ;Partlow et al., 1992 ;Schmahmann (2003) ; Van der Werf et al., 2000, 2003, and more recent functional work (e.g. Liebermann et al., 2013 ;Monchi et al., 2001 ;Schepers et al., 2017 ;Yang et al., 2020 ) that details the influence of the thalamus on cognition, and challenges the notion that the thalamus acts simply as a sensory-motor relay. Nevertheless, part of the challenge in studying the activity of the thalamus is due to the small size of its nuclei, relative to the spatial resolution typical of even high-field (3 Tesla) functional magnetic resonance imag- Recent studies have used MNI space parcellations ( Krauth et al., 2010 ) to study the functional roles of thalamic nuclei. For instance, Geier et al., (2020) describe how the mediodorsal and anterior thalamic nuclei contribute in distinct ways to associative memory processes, while Huang et al, (2019) show how different cognitive demands produce dissociable modulatory responses in the mediodorsal nucleus and its functional connectivity with the prefrontal cortex, while also demonstrating that these dissociations are aberrant in schizophrenia. In both studies, MNI space thalamic parcellations created by Krauth et al, (2010) are derived from the Morel stereotaxic atlas ( Morel, 2007 ;Morel et al., 1997 ). The Morel stereotaxic atlas is a widely used atlas of the thalamus derived from cytoarchitecture, myeloarchitecture, and functional relevance -using immunohistochemical staining of the calcium binding proteins parvalbumin, calbindin D-28k, and calretinin -to identify anatomical boundaries from five postmortem brain specimens ( Figure 1 ). The Morel atlas thus provides a description of the anatomical structure of the thalamus that is not possible to acquire in vivo using magnetic resonance. Yet, the existence of this atlas in MNI space does not allow neuroscientists to infer subject-specific architecture, as it is based on the average of multiple individuals.
Numerous methods are described in the neuroimaging literature for defining the anatomical structure of thalamic nuclei using diffusion weighted, resting state, and anatomical data. Automated segmentation techniques first appear just after the turn of the century and include initial reports relying on anatomical ( Amini et al., 2004 ;Deoni et al., 2007 ), diffusion data ( Behrens et al., 2003 ;Johansen-Berg et al., 2005 ;Wiegell et al., 2003 ), and resting state functional data ( Kim et al., 2013 ;Zhang et al., 2008 ). Parcellation methods have used prior knowledge of distinct cortico-thalamic connectivity to estimate structural boundaries within the thalamus using tractography ( Behrens et al., 2003 ;Johansen-Berg et al., 2005 ), functional connectivity ( Zhang et al., 2008 ), and machine learning approaches such as clustering algorithms ( Amini et al., 2004 ;Deoni et al., 2007 ;Wiegell et al., 2003 ) or independent component analysis ( Kim et al., 2013 ;Kumar et al., 2017 ). More recent ap-proaches using anatomical, diffusion, and resting state data have been reviewed by Iglehart et al. (2020) .
A recent, and alternative approach for using anatomical data was developed by Iglesias et al. (2018) , and uses FreeSurfer to provide automated segmentations of thalamic nuclei. This approach involves the application of a probabilistic atlas to the anatomical image in subject space, using Bayesian inference to refine predicted locations: In this modelling framework prior knowledge of brain anatomy is combined with the likelihood of a given voxel intensity conditioned on the probability assignment to a given label. This model is used to estimate labels, which are progressively refined until a stable segmentation is obtained. This probabilistic atlas is in the form of a tetrahedral mesh that can adapt to changes in subject-specific differences in anatomy. The mesh was formed from a combination of ex-vivo MR images, histological staining, blockface photography, and in-vivo MRI. This approach may be preferable relative to techniques using resting state and diffusion data because it is able to delineate more nuclei with the thalamus, and is easier to apply. Additionally, this approach does not require the acquisition of data using specialist sequences, such as WMn MP-RAGE, meaning it is applicable to a wide range of existing and future datasets.
Having good subject-specific parcellation techniques for thalamic nuclei is important for both clinical practice and empirical research. For instance, these techniques could be used to guide surgeons when implanting deep brain stimulation electrodes to treat Parkinson's disease ( Whiting et al., 2018 ). These techniques can also be used by researchers who may be interested in the role of different thalamic nuclei in functional processes, or the role of anatomical differences in health, development, and disease. Nevertheless, the effectiveness of segmentation approaches must first be validated before inferences can be made based on their output. In their original publication, Iglesias et al. (2018) provide volumetric comparisons between six segmented nuclei and their corresponding regions in the Morel atlas for 66 participants, yet these volumes are not statistically compared to each other, nor is the spatial specificity of segmentations quantified. Here, we aim to systematically compare the segmentation approach described by Iglesias et al. (2018) with regions in the Morel atlas. We provide volumetric, overlap, and isometric comparisons between all segmented regions and portions of the Morel atlas. These insights will allow clinicians and researchers to make informed decisions about the applicability of this segmentation approach for identifying specific thalamic nuclei in individuals.

Data acquisition
Anatomical data were sourced from the publicly available dataset of the Human Connectome Project (HCP) 1200 Subjects Data Release under the HCP Open Access Data Use terms ( https://www. humanconnectome.org/study/hcp-young-adult/document/1200subjects-data-release ; Van Essen et al., 2013 ). The following description of data acquisition and preprocessing steps from the HCP project follow reporting guidelines from Horien et al. (2021) on the use of secondary neuroimaging datasets.

Probabilistic atlas generation
The probabilistic atlas of Iglesias et al. (2018) was generated using data from 6 older adults post mortem who had no history of disease affecting brain morphology, and in vivo data from 39 younger adults. Post mortem brains were fixed by perfusing tissue in situ with 4% paraformaldehyde in phosphate buffer (pH 7.4); brains were then extracted and stored in 4% paraformaldehyde until ex vivo MRI scanning. Whole brain ex vivo images were acquired using a spatial resolution of 0.25mm isotropic using a 3D multi-slab balanced steady-state free precession sequence ( McNab et al., 2009 ;see Iglesias et al., 2018 for all acquisition parameters). Following ex vivo imaging, post mortem tissue for the left and right hemispheres were separately sectioned into blocks, and blocks containing thalamic tissue were immersed in cryoprotectant. Tissue was cryosectioned into 50 m thick slices using a sliding microtome and blockface photos were taken for each section. Every tenth slice was Nissl stained for histological analysis, with individual nuclei delineated following the approach of Jones (2012) .
Blockface photos were co-registered to oneanother, segmented from the block holder and microtome, and stacked to form a 3D volume. The 3D stack of blockface photos were used as an intermediary for registering ex vivo MRI and histologial data. Nuclear segmentations identified using Nissl staining were transformed into MRI space and refined to account for gaps between blocks. The probablistic atlas, based on a tetrahedral mesh, was built following the approach described by Iglesias et al. (2015) . The generated tetrahedral mesh is locally adaptive to the shape of the thalamus, with each vertex of the mesh having an associated vector of probabilities for thalamic nuclei. The topology and resolution of the mesh varies based on anatomy, with finer detail in less uniform regions.
Segmentation using the probabilistic atlas is posed as a Bayesian inference problem, and is modelled with the following assumptions 1. The atlas is spatially warped following a deformation model 2. The segmentation of each voxel is drawn independently, based on the categorical distribution of the deformed atlas at that location 3. each nucleus in the atlas has a set of Gaussian parameters describing the image intensity of its voxels, which are drawn independently for each voxel. To segment the thalamus using the atlas, point estimates for Gaussian parameters and the atlas deformation model are computed. Point estimates are then updated using a coordinate ascent strategy, where Gaussian parameters and the atlas deformation model are updated alternately while the other parameter is fixed. The Gaussian parameters are estimated with the Expectation Maximization algorithm, while the atlas deformation model is updated with a conjugate gradient algorithm. The deformation model is initialised by fitting the atlas to the automated segmentation of the whole thalamus generated by recon-all in FreeSurfer. The posterior probability of a segmentation is calculated by the Expectation Maximization algorithm.

Preprocessing
Data were preprocessed using the HCP minimal preprocessing pipelines Smith et al., 2013 ;Sotiropoulos et al., 2013 ). Firstly, T1w images were corrected for gradient distortions using a customised version of gradient_nonlin_unwarp in FreeSurfer, then each subject's two T1w scans were aligned using FSL FLIRT and averaged. The averaged T1w image was then registered to MNI space using a 12 DOF affine registration with FLIRT, and a subset of 6 DOF transforms were used to align the anterior commissure, the anterior commissureposterior commissure line, and the inter-hemispheric plane, while preserving the size and shape of the brain in native space. The skull was removed by inverting linear (FLIRT) and non-linear (FNIRT) warps from anatomical to MNI space, applying the warp to the MNI space brain mask, and then applying the mask to the averaged T1w image. Finally, the image was corrected for readout distortion and biases in B 1 and B 1 + fields.

Participants
One hundred participants were pseudo-randomly selected for inclusion in analysis. First, participants were filtered on the HCP database to include only subjects with complete anatomical, diffusion, and resting state acquisitions. Then subject IDs were sorted in ascending order and shuffled in MATLAB (2017b) using the seed 03112020 (the date of shuffling). The first one hundred subject IDs were chosen from this list for inclusion in analysis (Supplementary material 1).

Analysis
Data were processed following the method previously described by Iglesias et al., 2018 . Data processing was run using a Nipype pipeline integrating FSL (version 6.0.4) and FreeSurfer (version 7.1.1) on Ubuntu 18.04.2. Output files from Nipype nodes were used as analytic checkpoints to confirm each step in the analysis ran as expected. Anatomical T1 images were first processed and parcellated using recon-all in FreeSurfer; the output of recon-all was used to initialise the parcellation of thalamic nuclei for anatomical data using the algorithm described by Iglesias et al. (2018) . The parcellated thalamus was converted from FreeSurfer space to native anatomical space and changed from mgz to Nifti file format using mri_label2vol and mri_convert in FreeSurfer, respectively. Volumetric calculations for nuclei from the Morel atlas, and for segmented nuclei in subject space were performed using fslstats in FSL.

Comparison to Morel thalamic atlas
Linear rigid and affine transformations and non-linear warps between native anatomical space and MNI space were generated using the Advanced Normalization Tools (ANTs) package (Version 2.3.5, Ecphorella) script antsRegistrationSyN.sh ( Avants et al., 2008 ). The ANTs package was selected due to its superior performance in registering skull-stripped images compared to other algorithms ( Ou et al., 2014 ). Linear transformations and non-linear warps were then used to convert subject-level parcellations into MNI space. Parcellations in MNI space were re-binarised using a threshold of > 0.5.
Group-level probabilistic atlases were created by calculating a mean probability map for each parcellation in MNI space (described previously by Najdenovska et al. (2018) ). Group-level segmentations were then compared to the Morel probabilistic atlas, which was used as ground truth ( Krauth et al., 2010 ). Parcellations of thalamic nuclei in subject space generated by FreeSurfer, group-level mean probability maps in MNI space, plus linear transformations and non-linear warps from subject space to MNI_T1_1mm_brain space are available publicly at https://doi.org/10.17864/1947.000339 .
Each parcellation was compared separately to individual nuclei in the Morel atlas using the EvaluateSegmentation toolbox ( Taha & Hanbury, 2015 ), a threshold of > 0.25 was set for specificity metrics that do not accept non-binary input. The following metrics from the Evaluate-Segmentation toolbox were used to assess each parcellation approach.
The DICE coefficient was used as a measure of overlap between segmentations and ground truth; it is a widely used measure in imaging processing for assessing the overlap between segmentation approaches. The DICE coefficient is defined as: between true positives, false positives, and false negatives, but not the number of true negatives. Efficacious segmentations would have high overlap with their reference space of interest, yet overlap is not sufficient for determining the suitability of a segmentation approach as it cannot account for isometry.
The Average Hausdorff Distance is used as a measure of dissimilarity and can account for differences in isometry. Distance based metrics are advantageous -relative to overlap metrics in situations where segmentations are small -because overlap-based metrics disproportionately penalise errors in smaller than larger segmentations ( Taha & Hanbury, 2015 ) as is the case with thalamic nuclei. Importantly, the Average Hausdorff Distance is sensitive to the position of false positives in a segmentation relative to ground truth, and therefore is a useful metric when considering the boundary of a segmentation, accounting for isometry. In relation to image segmentation, the Hausdorff Distance can be defined as the minimum number of voxels between a point in segmentation X and a point in segmentation Y. Therefore, the Average Hausdorff Distance is the average minimum distance between all points in segmentation A and segmentation B in voxels. The Average Hausdorff Distance is defined as: is the average minimum distance ( min − ) from voxels in the ground truth ( ) to the segmentation ( ), and ( , ) is the average minimum distance ( min − ) from voxels in the segmentation ( ) to the ground truth ( ). The Average Hausdorff Distance is then the maximum of either of these two average distance measures in voxels.

Segmentation overlap
DICE coefficients between the Morel atlas and the group-level segmentations of the left and right thalamus are summarised in Figure 2 and Figure 3 , respectively. On the x-axis are the thalamic nuclei from the Morel atlas; the segmented regions are displayed along the y-axis. A list of abbreviations used to refer to nuclei in text and in Figures 2 , 3 , 5 , and 6 is presented in Table 1 ; the dendrograms presented above and adjacent to each heatmap show the hierarchical structure of the thalamus as defined by ( Morel et al., 1997 ) and the relative position of each nucleus within that hierarchy, as presented in Table 1 . Cells where groups of nuclei are equivalent between the Morel atlas and the segmentation are bound by a box. Here, when interpreting DICE coefficients, we use the following terminology, DICE = 0 no agreement, 0 < DICE < 0.2 slight agreement, 0.2 ≤ DICE < 0.4 fair agreement, 0.4 ≤ DICE < 0.6 moderate agreement, 0.6 ≤ DICE < 0.8 substantial agreement, 0.8 ≤ DICE ≤ 1 almost perfect agreement; these values have been used previously by ( Pajula et al., 2012 ) for comparing Nifti images, and are a widely used convention for interpretating DICE coefficients.
DICE coefficients were similar between the left and right thalamus (Wilcoxon Signed-Ranks Test, Z = 9536, p = 0.531). Figure 2 and Figure 3 show that for the anterior nuclei, the AV had higher overlap than the LD. Overlap for anterior, lateral, medial, and posterolateral divisions of the lateral thalamus generally showed moderate to substantial overlap with nuclei from the same subregion, and mostly only slight overlap between subregions. Posterior thalamic nuclei had overlap values that were generally higher between subregions of the posterior thalamus than for other nuclear groups and were the only set of nuclei to show greater than slight agreement with nuclei from other nuclear groups. Of the midline nuclei in the medial thalamus, only the CeM was segmented, showing fair and moderate overlap in left and right hemispheres, respectively. For the intralaminar nuclei (CL, CM, Pf) the CM and Pf had higher overlap than the CL, which showed overlap with anterior and posterior at the upper end of slight agreement. Subregions of the mediodorsal thalamus (MDm and MDl) showed slight to moderate overlap with corresponding regions in the Morel atlas, and fair overlap with the CL in the left hemisphere.

Segmentation volume
Although nuclei volumes in the segmentation and the atlas were correlated ( t (42) = 5.869, r = 0.671, p < 0.001; Figure 4 ), one sample ttests showed that the volumes were significantly different for all nuclei ( Table 2 ). Most volumes were smaller in the segmentation, except LGN, MDm, PuA, PuI, VLa, and VPL which bilaterally were larger in the segmentation than the equivalent volumes in the Morel atlas.
Bilaterally, the Average Hausdorff Distance (measured in voxels) between segmentations and the Morel atlas was lower along the diagonal (part of the same nuclear subgroup), and larger for nuclei off the diagonal, particularly for nuclei that were part of different nuclear groups ( Figure 5 , Figure 6 ). The lowest Average Hausdorff Distance between each segmented nuclei and a nucleus from the Morel atlas are summarised in Table 3 ; the lowest Average Hausdorff Distance between each nuclei in the Morel atlas and a segmented nucleus are summarised in Table 4 .
Lastly, we calculated the Spearman's correlation coefficient between the volume of nuclei in the Morel atlas and the Average Hausdorff Distance of the group-level segmentation to determine whether nucleus size affected dissimilarity and found they were significantly anticorrelated t (42) = -2.396, r = -0.347, p = 0.021 ( Figure 7 ). This trend was also significant when we correlated Average Hausdorff Distance of the grouplevel segmentation with the average segmentation volume ( t (42) = -3.458, r = -0.471, p = 0.001). Visually, this trend looked like it was driven by segmentations with volumes at the tail end of the plot. These nuclei may have had particularly small volumes because they were not properly segmented by the algorithm. Therefore, we performed an exploratory analysis to test what effect removing these values had on the relationship. To do this we removed the five smallest nuclei bilaterally (CL, LD, L-Sg, VAmc, VM) which all had volumes lower than 21mm 3 , and used this as a threshold to re-run the correlation between the Average Hausdorff Distance of the group-level segmentation and the average segmentation volume. Removing these values reduced the strength of the relationship, and meant the correlation was non-significant t (32) = -1.602, r = -0.273, p = 0.119.

Discussion
The thalamus is known to be important for a wide range of cognitive processes, yet delineating and studying the roles of specific nuclei in aspects of cognition is a non-trivial problem in human neuroimaging. Here, we systematically compare the segmentation approach developed by Iglesias et al. (2018) to the Morel atlas of the thalamus ( Krauth et al., 2010 ;Morel et al., 1997 ) using data from the Human Connectome project. We use DICE coefficients to measure volume overlap, and the Average Hausdorff Distance to compare the dissimilarity and isometry of segmentations relative to nuclei in the Morel atlas. Firstly, we found mixed levels of overlap across anterior, lateral, posterior, and medial portions of the thalamus, and at the group level we found the volumes for all nuclei were significantly different to the defined regions within the Morel atlas. Nuclei had lower Average Hausdorff Distances within groups than across groups, showing segmentations clearly discriminated between anterior, lateral, medial, and posterior portions of the thalamus. We assessed whether there was a systematic effect of volume on the isometry of segmented nuclei relative to the atlas. We found a significant negative correlation between the volume of nuclei in the Morel atlas and the Average Hausdorff Distance of segmentations, and between the size of the segmented volume and its Average Hausdorff Distance; the latter correlation was no longer significant after the five smallest nuclei (all with a volume < 21mm 3 ) were removed from the analysis, suggesting this relationship was driven by an inability to segment these nuclei with smaller values, rather than changes in isometry across nuclei size. In particular, the segmentation of smaller nuclei for individual subjects may be particularly sensitive to participant-related motion artefacts. These include head motion and physiology-related processes such as pulse, respiration, and swallowing; these motion sources can disturb the homogeneity of the magnetic field and the spatial localisation of signal, creating artefacts in the data ( Williams & Lindner, 2020 ). Motion is a major issue during data acquisition, and the most effective way to minimise its effects is to correct for motion during data acquisition. For instance, head motion can be corrected online using techniques such as prospective motion correction ( Callaghan et al., 2015 ), while correction for physiological processes could involve time locking data acquisition with specific phases of the cardiac or respiratory cycles . Implementing either of these motion correction methods could improve the segmentation efficacy of all nuclei, and particularly smaller nuclei, by improving signal to noise ratio in data.
The segmentation approach developed by Iglesias et al. (2018) showed substantial overlap as indexed using the DICE coefficient for one nucleus within each of the anterior, lateral, posterior and medial thalamus (AV, VA, PuM, and CM respectively). Conversely, slight overlap between segmentations and their corresponding region in the Morel atlas was seen for a single segmentation in the anterior thalamus (VAmc), and four regions in the posterior thalamus (LGN, MGN, L-Sg, and LP). We hypothesise that segmentation of nuclei within the posterior thalamus may present difficulties that are not present for the anterior, lateral, and medial thalamus for the following reason. Adjacent to the pulvinar and protruding from the exterior surface of posterior thalamus, the geniculate nuclei may be misidentified as non-thalamic by the recon-all segmentation in FreeSurfer due to poor contrast relative to surrounding white matter on T1-weighted images ( Fujita et al., 2001 ). This whole thalamus segmentation from recon-all is then used as a spatial prior for thalamic parcellation. If this spatial prior were under specified as it did not include the full extent of the geniculate nuclei, then the algorithm could have difficulty when segmenting these regions. Indeed, we see comparable DICE coefficients, particularly for the LGN, between the segmented LGN region and the inferior and lateral pulvinar regions as defined by the Morel atlas. Therefore, this misspecification may have a knock-on effect on the segmentation of the posterior thalamus as a whole. The posterior region was also the only portion of the thalamus to show greater than slight overlap with nuclei in other regions of the thalamus; both the LP and PuA showed fair bilateral overlap with the LD in the anterior thalamus, and the CL in the medial thalamus. Therefore, this segmentation approach formed clear anatomical boundaries between the anterior, lateral, and medial thalamus, but not the posterior thalamus. Additionally, the delineation of nuclei within each sub-region of the anterior, lateral, and medial thalamus appeared to be more distinct than in the posterior thalamus, as indicated by an increased number of nuclei with only slight overlap and being the only group of nuclei showing greater than slight overlap with nuclei from other groups. Nevertheless, we recommend that investigators studying the thalamus verify the efficacy of this segmentation approach for regions of interest using data from their own hardware, in addition to the results presented here.

Figure 2. DICE coefficient between group-level segmented (vertical axis) regions and volumes in the Morel atlas (horizontal axis) in the left thalamus. Higher DICE coefficients show there is greater overlap between segmented regions and volumes in the Morel atlas.
Dendrograms show the hierarchical structure of nuclei within the thalamus. DICE coefficients within BOLD boxes along the diagonal are regions that are part of the same nuclear group or sub-group; nuclei from the Morel atlas and segmented volumes are not always equivalent (see Table 1 for an overview of abbreviations).
There were significant volumetric differences for all nuclei between the volume as segmented by the automated algorithm and the Morel atlas. One potential explanation for this is that the Morel atlas, which was converted into MNI space ( Krauth et al., 2010 ) was not corrected for the effects of fixation during the preparation of the tissue. Fixation with solutions such as formalin or paraformaldehyde is an important step when working with biological specimens as it limits the decomposition of biological tissue; yet fixation also causes tissue to shrink. The atlas derived by Morel from cyto-and myeloarchitecture was not corrected for shrinkage due to fixation ( Bender et al., 2011 ), therefore, the atlas produced by Krauth et al, (2010) may include slight distortions of the morphometry of thalamic nuclei in-vivo. Another possible reason for our observed volumetric differences between the segmentation and the atlas is that the initial template used to begin segmentation is the whole thalamus volume estimated by recon-all in FreeSurfer. Recon-all has previously been shown to over-estimate the volume of the thalamus relative to manually derived segmentations of the whole thalamus, and in particular for participants with volumes at the lower end of the distribution ( Makowski et al., 2018 ). Nevertheless, these explanations cannot fully describe why we observed volumetric differences for most nuclei as segmentations were generally smaller than those produced by the Morel atlas, with the exceptions being the LGN, MDm, PuA, PuI, and VLa. One reason may be that certain nuclei are overestimated and others are underestimated, while the general trend for nuclei volume remains consistent between segmentations and the Morel atlas. Our correlational results between segmented and atlas volumes suggest this may be the case, as we see a moderate positive correlation between the two sets of volumes. Improvements in volumetric estimation may be driven by combining diffusion and anatomical data as proposed by ( Iglesias et al., 2019 ), yet there is currently no publicly available implementation for the approach they describe.
In their original paper describing their segmentation algorithm, Iglesias and colleagues do provide volumetric comparisons for six regions within the thalamus, namely the AV, LP, CM, MD (composite of MDl and MDm), VL (composite of VLa and VLp), and the pulvinar (PuA, PuM, PuL, and PuI combined) in 66 subjects. However, their approach involves the transformation of the equivalent Morel nuclei into subject space before plotting the distributions for each nucleus. Though this provides a visual overview of the relationship between segmented volumes and the Morel atlas, no inferential statistics are used to characterise this relationship. Additionally, though the authors state these regions were chosen based on their functional and structural connectivity, it is unclear why other nuclei were not also selected. Based on our metrics we see these regions appear well segmented at the group level, except for the LP. Here, we systematically show that, although there is a positive relationship between segmented volumes and volumes in the Morel atlas, these volumes are significantly different across all regions. Furthermore, we also extend upon the work of Iglesias et al. (2018) by describing differences in the overlap and isometry between segmented nuclei and regions within the Morel atlas and show that not all regions can be delineated as clearly as those selected in the original paper for comparison. These insights call for caution regarding the use of this segmentation approach to make inferences about the role these nuclei play in cognitive processes, or how aberrant changes in the thalamus occur in disease ( Elvsåshagen et al., 2021 ;Park et al., 2020 ).
Accurate and reliable segmentation is especially important within the context of disease. Firstly, pathology may impair automated segmentations methods and increase variability relative to manual delineations in comparison to normative cases. At the volumetric level, automated FreeSurfer segmentations appear to delineate the whole thalamus at a level in-line with multiple manual human raters in cases of thalamic atrophy due to juvenile myoclonic epilepsy ( Keller et al., 2012 ).  Table 1 for an overview of abbreviations).  Table 1 for an overview of abbreviations). The shaded region denotes 95% confidence intervals (bootstrapped, 1000 iterations).
However, automated segmentations of the whole thalamus appear to be less efficacious when compared with manual approaches. In multiple sclerosis, for instance, white matter lesion load is inversely correlated with the overlap between automated and manual thalamic delineations, and the difference in overlap between automated and manual thalamic delineations is significantly worse for multiple sclerosis cases than healthy controls ( de Sitter et al., 2020 ). These effects are especially pertinent in multiple sclerosis when severe thalamic atrophy and cognitive impairment are observed ( Burggraaff et al., 2021 ); in these cases, although FreeSurfer segmentations show good overlap with manual delineations, it is unclear how appropriate the spatial prior for the automated segmentation of individual thalamic nuclei would be. None of the participants used in the generation of the probabilistic atlas of Iglesias et al. (2018) had disorders that affected brain morphology, and therefore the atlas may sometimes deform innaccurately. This would then impact the identification of specific nuclei in space. Nevertheless, Iglesias et al. (2018) do demonstrate that the segmentation approach for individual thalamic nuclei can be used to classify individuals with Alzheimer's disease with greater accuracy than when the whole thalamus is used, suggesting there is some utility of this approach in cases of neurodegeneration.
Future developments to improve segmentation efficacy of Iglesias et al.'s (2018) tool could include the continued use of multimodal imaging for refining nuclear-level segmetations ( Iglesias et al., 2019 ). As previously mentioned, the lateral geniculate nucleus is not well defined by standard T1-weighted contrast; however, proton densitiy imaging improves contrast between the lateral geniculate and surrounding white matter tissue ( Fujita et al., 2001 ). Additionally, Proton density has been used for identifying the medial geniculate ( Devlin et al., 2006 ) and centromedian ( Kanowski et al., 2010 ) nuclei, while magnetisation transfer imaging also improves thalamic contrast ( Gringel et al., 2009 ). Therefore, the inclusion of proton density images alongside T1-weighted and diffusion images could enhance the algorithm's performance, particuarly within the posterior thalamus, since additional imaging contrasts could provide unique anatomical information. Furthermore, proton density and magnetisation transfer are advantageous for improving imaging contrast over other approaches Table 2 Descriptive statistics for segmented thalamic volumes and regions within the Morel atlas. Mean, standard deviation and range are provided for each segmentation bilaterally. Additionally, the number of participants where this nuclei was delineated is specified. The following regions within the Morel atlas are combined for the purpose of statistical analysis (LGN = (LGNmc + LGNpc), VA = (VAmc + VApc), VLp = (VLpd + VLpv), VPL = (VPLa + VPLp) (see Table 1 for an overview of abbreviations). such as WMn MP-RAGE ( Tourdias et al., 2014 ), since sequences for Fast low angle shot MRI are readily available as a vendor product. While it is important to have segmentations that are faithful to their underlying neuroanatomy, it is also worth being pragmatic about the limitations of automated segmentations and how we use them. At the implementation level it is plausible that heterogeneity in segmentation efficacy will be observed across and within participants. This is despite Iglesias et al. (2018) demonstrating high test-retest reliability for most nuclei volumes. For example, these differences could be driven by changes in relative signal, intensity, and noise across multiple acquisitions within the same individual ( Kiar et al., 2020 ). Other implementation-based sources of segmentation heterogeneity could include differences across operating systems and software versions. For example, estimates of segmentation volumes are inconsistent across versions of FreeSurfer ( Gronenschild et al., 2012 ), and differences in floating point arithmetic across operating systems and FreeSurfer versions also influences analyses ( Glatard et al., 2015 ). Practical solutions, such as the use of Docker and Singularity containers, enable the standardisation of computational resource, yet they cannot address inherent differences in image acquisition which could be alleviated by using multi-modal imaging contrasts.

Nucleus Mean volume (mm 3 ) Standard deviation Minimum (mm 3 ) Maximum (mm 3 ) t -statistic p-value N Morel volume (mm 3 )
More broadly, the biological plausibility of measures should also be considered when applying automated segmentation processes to study structure and function. Differences, such as those that are clinically relevant or that change across developmental trajectories, should be observed at a level that is greater than noise that could be due to withinindividual variance or computational set-up. These differences, even if small, should be shown to be at least qualitatively reproducible. Additional considerations regarding biological plausibility include the application of automatic registration tools. As an example, the original pipeline in this study involved the use of FLIRT and FNIRT in FSL to convert images from subject space to MNI space. Yet, we found that these algorithms were unsuitable in terms of their accuracy for providing transformations within the thalamus. We therefore amended our pipeline to use ANTs, rather than FSL for normalisation. Additionally, it is worth considering the utility of applying these methods to studying functional changes using fMRI. For instance, the use of these segmentation techniques in studies with typical 3 tesla acquisition and analysis parameters (3mm 3 voxels, 8mm smoothing kernel) would mean that it is likely that functionally relevant signal is not localised to specific thalamic nuclei. Therefore, we suggest that those wishing to study the relevance of thalamic nuclei to functional processes carefully consider  Table 1 for an overview of abbreviations).

Table 3
Lowest Average Hausdorff Distance between each group-level segmentation and its corresponding volume within the Morel atlas. Higher Average Hausdorff Distances shows there is greater dissimilarity and that segmented regions and volumes in the Morel atlas are less isometric (see Table 1 for an overview of abbreviations).   Table 1 for an overview of abbreviations).  Table 1 for an overview of abbreviations). The shaded region denotes 95% confidence intervals (bootstrapped, 1000 iterations).
acquisition and processing pipelines with respect to the metrics provided here to restrict the integration of signal across regions of the thalamus. For instance, investigators may find it beneficial to combine segmentations from regions with high overlap and functional homogeneity, such as the subnuclei of the mediodorsal thalamus, to improve the anatomi-cal definition of their region of interest. Furthermore, consideration of the size of segmented nuclei in relation to acquisition and processing parameters are important to determine whether it could be considered feasible to acquire clean signal from a given region.
In conclusion, this paper aimed to systematically validate the thalamic nuclei segmentation approach developed by Iglesias et al. (2018) using Human Connectome Project data and the Morel thalamic atlas derived from histological staining post-mortem ( Krauth et al., 2010 ). Using volumetric, overlap, and isometry measures we show that the automated segmentation approach delineates between the anterior, lateral, posterior, and medial portions of the thalamus, and that there is mixed segmentation efficacy for individual nuclei within these groups. Importantly, we find that the volumes for segmented nuclei are significantly different from those defined in the Morel atlas, calling for caution when adopting this segmentation method. We also suggest important considerations related to the functional relevance of these segmentations, describe potential pitfalls researchers may face when using these segmentations within their own research, and recommend investigators undertake further validation work in addition to the results presented here when using this approach.
Parcellations of thalamic nuclei in subject space generated by FreeSurfer, group-level mean probability maps in MNI space, plus linear transformations and non-linear warps from subject space to MNI_T1_1mm_brain space are available publicly at https://doi.org/10.17864/1947.000339 .

Funding
Brendan Williams was funded by the Magdalen Vernon PhD Studentship of the School of Psychology and Clinical Language Sciences, Table 4 Lowest Average Hausdorff Distance between volumes within the Morel atlas and its corresponding group-level segmentation. Higher Average Hausdorff Distances shows there is greater dissimilarity and that segmented regions and volumes in the Morel atlas are less isometric (see Table 1 for an overview of abbreviations). University of Reading. Funding for article processing charges was provided by the University of Reading.

Declaration of Competing Interest
None Credit authorship contribution statement