Methodological challenges of measuring brain volumes and cortical thickness in idiopathic normal pressure hydrocephalus with a surface-based approach

Identifying disease-specific imaging features of idiopathic Normal Pressure Hydrocephalus (iNPH) is crucial to develop accurate diagnoses, although the abnormal brain anatomy of patients with iNPH creates challenges in neuroimaging analysis. We quantified cortical thickness and volume using FreeSurfer 7.3.2 in 19 patients with iNPH, 28 patients with Alzheimer's disease (AD), and 30 healthy controls (HC). We noted the frequent need for manual correction of the automated segmentation in iNPH and examined the effect of correction on the results. We identified statistically significant higher proportion of volume changes associated with manual edits in individuals with iNPH compared to both HC and patients with AD. Changes in cortical thickness and volume related to manual correction were also partly correlated with the severity of radiological features of iNPH. We highlight the challenges posed by the abnormal anatomy in iNPH when conducting neuroimaging analysis and emphasise the importance of quality checking and correction in this clinical population.


Introduction
Idiopathic Normal Pressure Hydrocephalus (iNPH) is a neurological condition that affects approximately 0.3-3%, of individuals aged 60 and above (Jaraj et al., 2014).It is characterized by alterations in cerebrospinal fluid dynamics, leading to the enlargement of the ventricles to maintain a stable intracranial pressure (Carswell, 2022).A triad of symptoms; gait apraxia, urinary incontinence, and cognitive deficits, result from this compensatory ventricular expansion, which stretches and distorts the surrounding parenchyma (Carswell, 2022).Therapeutic redirection of cerebrospinal fluid to an area of lower pressure (i.e., shunting) can dramatically improve symptoms (Carswell, 2022).
iNPH occurs in the elderly population in which traditional neurodegenerative diseases are common (Jaraj et al., 2014), and identifying iNPH-specific clinical and imaging features is paramount to being able to distinguish these disorders.The anatomical features of iNPH introduce methodological challenges in neuroimaging analysis.Reduced callosal angle, ventriculomegaly, and disproportionately enlarged subarachnoid space hydrocephalus (DESH) are some of such distinctive features of iNPH seen on brain imaging (Hashimoto et al., 2010).Here, we would like to address potential limitations associated with the use of FreeSurfer, 1 a software used for the analysis and visualization of brain imaging data, in this specific patient group.
One notable advantage of FreeSurfer is its ability to employ a fully automated pipeline, enabling the segmentation of the brain into regions of interest.It is freely available, widely used and there is extensive experience within the field in implementing it within analysis pipelines aiding reproducibility.FreeSurfer registers the volume with the MNI305 atlas.It performs a surface-based reconstruction of the cortex, which classifies voxels as either white or non-white matter based on voxel intensity and neighbour constraints, and a volume-based stream for volume labelling of each point (voxel) of the brain mask (Dale et al., 1999;Fischl et al., 2002).It derives the white matter surface as the interface between the white and gray matter, and the pial surface as the boundary between the pial and cerebrospinal fluid (CSF).Cortical thickness and volumes can then be quantified in 34 different regions derived from the Desikan-Killiany atlas.This automated process is considerably less laborious and less prone to bias than manual regions of interest segmentation.
Quality control and manual editing can be performed to rectify errors related to skull stripping, grey-white matter segmentation, and intensity normalization. 2 Several studies have compared the outputs of the FreeSurfer's pipeline with and without manual edits in groups of healthy adults, individuals with genetic disorders, and severe head injuries and found mixed results (McCarthy et al., 2015;Guenette et al., 2018;Waters et al., 2019).There is also limited research investigating the significance of the manual editing step in clinical populations with extremely abnormal brain morphology, which can impact the registration and segmentation analysis stages.

Methods
We evaluated the importance of manually correcting the segmentation output produced by FreeSurfer 7.3.2 3 on the MRI scans of 19 patients with iNPH, 28 patients with Alzheimer's disease, and 30 healthy controls (HC).To improve the readability of the results and reduce multiple comparisons, the 34 regions segmented by FreeSurfer where clustered to derive cortical thickness and volumes for the frontal, temporal, parietal, occipital and cingulate lobes (see Footnote 1).Betweengroup differences in age and gender were analysed using Kruskal-Wallis test and Chi-Square test, respectively.All scans were visually checked to ensure their quality met appropriate research standards.We then ran the FreeSurfer recon-all command using the -bigventricles flag.Of the 19 iNPH patients, 12 were classified as probable, 4 as possible and 3 as asymptomatic iNPH, as defined by international criteria (Relkin et al., 2005).Among the 16 symptomatic iNPH patients, 15 received a lumbar puncture and had their CSF samples analysed to determine the presence of comorbid AD pathology.Amyloid deposition was detected in two 1 http://surfer.nmr.mgh.harvard.edu/ 2 https://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/TroubleshootingData3 https://surfer.nmr.mgh.harvard.edu/fswiki/CorticalParcellationpatients.Radiological features of iNPH were assessed and calculated by a neuroradiologist.Participants were scanned on a 3 T Siemens scanner as part of a wider ongoing study run by the UK Dementia Research Institute, Care Research & Technology Centre focused on using sensor technology to monitor behaviours of people living with dementia.We visually inspected each output and performed manual editing when necessary (Figure 1).Wilcoxon signed-ranked tests were used to compare the FreeSurfer's measurements (volumes and cortical thickness) before and after manual edits while accounting for non-normally distributed data.Between-group differences in these changes were assessed via repeated measures ANOVA, followed by two-tailed t-tests with FDR correction for post-hoc comparisons.Finally, exploratory Spearman correlations were conducted between changes in cortical thickness/volumes pre and post manual correction and radiological features of iNPH (i.e., Radscale score, Evan's index, callosal angle and DESH score).To assess the potential for rectifying FreeSurfer's inaccuracies through alternative pre-processing software, we conducted two additional evaluations.First, we integrated the HD-BET tool for skull stripping before executing the FreeSurfer recon-all command.Notably, HD-BET has exhibited superior performance compared to various widely used brain extraction algorithms, even in the presence of brain pathology (Isensee et al., 2019).Additionally, we experimented with running the FreeSurfer recon-all command using a combination of T1 and FLAIR scans. 4his study was approved by the Health Research Authority's London-Surrey Borders Research Ethics Committee (19/LO/0102) and the Health Research Authority's London-Central Research Ethics Committee (18/LO/0249).All participants gave written and/or electronic consent.

Results
HC (14 females, mean age = 75.58years, SD = 6.07),AD patients (12 females, mean age 75.25 years, SD =7.64 years) and NPH patients (7 females, mean age = 71.58years, SD = 5.92 years) did not differ significantly in terms of gender.No significant age difference was found between HC and AD patients.Conversely, iNPH participants were significantly younger than AD and HC (p = 0.01).The iNPH patients had a mean Evan's Index of 0.38 (SD = 0.04), mean callosal angle of 75.7 (SD = 15.83),mean Radscale score of 9.3 (SD = 1.51) and mean DESH score of 7.06 (SD = 1.77).Out of the 19 scans of patients with iNPH, 3 failed the segmentation step (Figure 2) and 15 required extensive manual corrections (Figure 1).Of the 3 patients whose Freesurfer segmentation failed, 2 were asymptomatic.Of the 28 patients with Alzheimer's disease, one failed the segmentation and 4 required manual corrections.In the HC group, only 2 participants needed manual editing of the segmentation output.No corrections of the white matter surface were required in any study group.In the iNPH group, manual edits aimed to improve the removal of skull and rectify inaccuracies in defining the pial surface, which had extended into the dura and skull.Following manual correction, the parietal, frontal and temporal regions exhibited the most substantial differences; with volume and cortical thickness measures decreasing bilaterally across the group (Table 1).Wilcoxon signed-ranked tests comparing these measurements before and after manual edits did not reach significance, although we may have been underpowered by small participants' number.Repeated measures ANOVA indicated an effect of group on the delta values of the volumes (F (19,630) = 2.84, p < 0.001), but not cortical thickness (p > 0.05), which suggest potential higher reliability of this measure relative to volumes.Between group differences were observed for the delta values of the frontal, parietal, temporal and cingulate volumes (Table 1).In Supplementary Table S1, we also report the differences in cortical thickness and volumes before and after manual correction for all the 34 individual regions segmented by FreeSurfer and the between-group comparisons of the delta values.Spearman correlations showed that, in the left and right temporal lobe, cortical thickness changes significantly correlated with the Radscale score (rho = 0.61/60, p = 0.01), and the left temporal lobe volume also correlated with the callosal angle in isolation (rho = −0.53,p= 0.03).We also found a significant correlation between DESH scores and the change in volume of the right occipital lobe (rho = −0.51,p = 0.04).
The additional evaluations of FreeSurfer's accuracy using HD BET in the pre-processing step revealed 7 segmentation failures and the necessity of multiple manual edits in 11 scans.Similarly, employing a combination of FLAIR and T1 images also led to seven segmentation failures and required manual corrections in five outputs.

Discussion
The higher proportion of scans requiring correction within our sample of iNPH patients relative to the AD and HC groups underlines the importance of conducting and reporting this quality check in this group -which is not consistently done (Cogswell et al., 2021).
Whilst we acknowledge that the overall effect of the correction in these data is minor, it is important to note that this is a relatively small sample size and that we employed a conservative manual correction approach to mitigate the risk of bias associated with human judgement; the effect of this correction process might become substantial enough to influence results significantly when conducting larger studies.Our findings also reveal a statistically higher proportion of volume alterations attributed to manual edits in individuals with iNPH compared to both healthy controls and patients with AD.Changes in cortical thickness were in part correlated with the severity of radiological features of iNPH and underline the importance of exercising caution when using FreeSurfer with severe hydrocephalus.It is important to underline that one significant limitation of this study is the subjectivity of the visual inspections and manual corrections, which are prone to human error.However, we have followed the methodology and guidelines provided by   the developers to mitigate bias and maximise consistency in our approach. 5 The challenge for the field lies in establishing brain biomarkers that can differentiate between iNPH and other dementia types with overlapping clinical presentations and radiological features, such as ventriculomegaly, in order to identify patients to target with therapeutic shunting.Previous studies have demonstrated abnormal cortical thickening in the parietal lobe, and in the high convexity of the frontal, parietal, and occipital lobes in iNPH patients compared to healthy individuals and patients with Alzheimer's disease (Moore et al., 2012;Kang et al., 2020;Bianco et al., 2022).Studies have suggested that cortical thickening may be characteristic of iNPH and related to the ventricular expansion, which leads to compression and stretching of the brain tissue, which may then reduce the cerebrospinal fluid space in the high convexity regions (Kang et al., 2020;Han et al., 2022).We cautiously suggest that increased cortical thickness and tightness of the highconvexity space increase the likelihood of FreeSurfer failing to delineate the pia from the dura and hence erroneously classifying extra voxels to grey matter.If not corrected, these inaccuracies may provide even further and exaggerate evidence of increased cortical thickness and volumes in these areas.Interestingly, segmentation errors did not affect the white matter surface.FreeSurfer's failures seems to specifically impact the delineation of the pial surface.Since this is measured as the interface between the pial and the CSF, these inaccuracies could arise from the reduced CSF space and the tight high-convexity regions resulting from ventricular expansion.
In light of the challenges discussed above, we propose that researchers consider the likely lengthy process of manual correction that is required when using FreeSurfer in this clinical group and encourage the reporting of the completion of this step so that readers can have confidence in any associated results.However, there is a need for further, large-scale iNPH studies to reliably identify disease-specific biomarkers.In this case, conducting laborious manual corrections which can take several hours per subject (Lotan et al., 2022) may be unfeasible and introduce the likelihood of bias, especially given the challenges in blinding raters to the clinical group each scan comes from, given such apparent structural abnormalities.
With this in mind, alternative automatised software and analysis techniques with superior accuracy have been developed and may be preferential (Carass et al., 2017;Shao et al., 2019;Billot et al., 2023).Nevertheless, as shown above, FreeSurfer is still being widely used in current studies.This may be due to some limitations of these alternative tools.These in fact do not always provide segmentation of the individual compartments of the ventricles or are validated in small subsamples of iNPH patients (Shiee et al., 2011;Roy et al., 2015;Shao et al., 2019), do not improve the required processing time relative to FreeSurfer (Ellingsen et al., 2016), are not always freely available (Shao et al., 2019) or easily accessible as FreeSurfer (Ellingsen et al., 2016), or need manual delineation of new atlases when employed with new scanners (Roy et al., 2015).

FIGURE 1 Left:
FIGURE 1Left: Output of the Freesurfer's recon-all command before the manual editing step for one subject.DESH features (i.e., enlarged ventricles, widened sylvian fissure and tight high convexity) are marked in red.Right: Output of the manual editing step for the same subject showing reduced cortical thickness..

FIGURE 2
FIGURE 2Example of failed segmentation for one iNPH patient.Due to the presence of oedema, the pial and white matter surface are wrongly estimated around the ventricles and extend into the CSF space.

TABLE 1
Values of cortical thickness and volumes before and after manual correction for the 5 main lobes (left and right), and between-group comparisons of the delta values (FDR corrected).Delta values for each group have been calculated averaging the difference between pre and post cortical thickness and volume values for each subject.*p ≤ 0.05, **p ≤ 0.01, ***p ≤ 0.0001.