Identifying and reverting the adverse effects of white matter hyperintensities on cortical surface analyses

The Human Connectome Project (HCP)-style surface-based brain MRI analysis is a powerful technique that allows precise mapping of the cerebral cortex. However, the strength of its surface-based analysis has not yet been tested in the older population that often presents with white matter hyperintensities (WMHs) on T2-weighted (T2w) MRI (hypointensities on T1w MRI). We investigated T1-weighted (T1w) and T2w structural MRI in 43 healthy middle-aged to old participants. Juxtacortical WMHs were often misclassified by the default HCP pipeline as parts of the gray matter in T1w MRI, leading to incorrect estimation of the cortical surfaces and cortical metrics. To revert the adverse effects of juxtacortical WMHs, we incorporated the Brain Intensity Ab-Normality Classification Algorithm into the HCP pipeline (proposed pipeline). Blinded radiologists performed stereological quality control (QC) and found a decrease in the estimation errors in the proposed pipeline. The superior performance of the proposed pipeline was confirmed using an originally-developed automated surface QC based on a large database. Here we showed the detrimental effects of juxtacortical WMHs for estimating cortical surfaces and related metrics and proposed a possible solution for this problem. The present knowledge and methodology should help researchers identify adequate cortical surface biomarkers for aging and age-related neuropsychiatric disorders.


Introduction
Surface-based analysis of brain MRI can more accurately delineate complicated cortical ribbons and more precisely map the functional neuroanatomy of the brain than volume-based analysis (Anticevic et al., 2008;Fischl et al., 2008;Frost and Goebel, 2012;Tucholka et al., 2012;Van Essen et al., 2012;Glasser et al., 2016;Coalson et al., 2018).Among surface-based analysis methods, the Human Connectome Project (HCP) pipeline is a widely used workflow of advanced surface-based analyses of multimodal brain MRIs (Glasser et al., 2013).The usefulness of the HCP-style approach has been shown with the young adult HCP dataset (YA-HCP) and the subsequent developing and aging connectome datasets (Elam et al., 2021).A human brain atlas was proposed based on YA-HCP multimodal MRI data, including myelin and thickness maps, and functional connectivity (Glasser et al., 2016).Taking advantage of its registration accuracy, the HCP-style approach has now been adopted in cross-scanner harmonization projects worldwide, including the UK biobank (Williams et al., 2023) and the Brain/MINDS-beyond (Koike et al., 2021).Next, the application of HCP-style analysis to the older population is warranted to gain new insights into the pathophysiological mechanisms of age-related neurodegenerative diseases, including dementia and Parkinson's disease (Bookheimer et al., 2019;Li et al., 2021;Wakasugi and Hanakawa, 2021).
However, caution must be exercised when applying surface-based analysis to MRI data from older populations, since they often present with age-related structural changes in the brain, including periventricular and deep white matter (WM) hyperintensities (WMHs) on T2 weighted (T2w) MRIs.WMHs are caused mainly by small vessel vasculopathy and resultant ischemic changes (Barkhof and Scheltens, 2002).WMHs are observed not only in neurological conditions such as vascular dementia and parkinsonism (Gootjes et al., 2004;Bohnen and Albin, 2011) but also at a variable degree in healthy seniors (Scott et al., 2015;Phuah et al., 2022).The prevalence of WMHs increases with age; at least 80-90 % of the population over the age of 60 have WMHs (Launer, 2003;Kruit et al., 2004;Liao et al., 1997;de Leeuw et al., 2001;Caunca et al., 2019;Atwood et al., 2004;Silbert et al., 2008).Furthermore, age-related WM abnormalities cannot only be visualized as WMH on T2w MRI but also as low-intensity areas on T1 weighted (T1w) MRI, making the signal intensity of the gray matter (GM) similar to that of the WM.This can be problematic in segmenting between the GM and WM based on the contrast of signal intensity of T1w MRIs.A recent study showed that an automated segmentation algorithm may misclassify parts of WMHs as GM, resulting in an incorrect estimation of GM volumes (Dadar et al., 2021).
Thus, the existence of WMHs may also degrade surface estimation in HCP-style analysis.In the HCP pipeline, the FreeSurfer software suite 6.0 (Fischl, 2012; https://surfer.nmr.mgh.harvard.edu/)segregates the cortical GM and WM based on the contrast of the signal intensity of T1w MRI and estimates the boundary between the WM and GM (the WM surface hereafter) along the cortical ribbon.Notably, because the intensity of WMHs in T1w MRI is close to that of the GM, voxels in juxtacortical WMHs may be misclassified as parts of the cerebral cortex, resulting in an incorrect estimation of the WM surface.As the WM surface is also used as an input to the pipeline when the cortical outer surface (pial surface hereafter) is subsequently estimated, the error in the WM surface estimation can easily propagate to the erroneous pial surface estimation.In contrast to commonly used surface analysis methods that use only T1w MRI, the HCP pipeline uses information from T2w MRI as well as T1w MRI for tuned estimation of the pial surfaces to reduce the adverse effects of the dura mater signals (Glasser et al., 2013).The possibility of surface estimation errors due to WMHs has not yet been thoroughly examined.Thus, it is worth investigating the likelihood of surface estimation errors due to WMHs and the propagation of tissue misclassification into the inaccurate computation of thickness and myelin contrast, particularly when analyzing MRIs with WHMs.
A solution for solving the surface estimation errors caused by WMHs is to correct the WM segmentation by manually editing the misclassified voxel label in the WMHs.(https://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/WhiteMatterEdits_freeview).Rerunning FreeSurfer after manual editing should revert the erroneous WM surface estimation, followed by better estimation of the pial surfaces, thickness, and myelin contrast in the HCP pipeline.However, manual correction relies on timeconsuming hand-editing operations by human experts.These laborious processes may make manual correction impractical for application to large population studies.Another possible solution is to take advantage of recently developed machine learning (ML) algorithms for the automated detection of WMHs (Griffanti et al., 2016) and integrate the algorithms as a module into the HCP pipeline.
Herein, we aimed to elucidate the potentially detrimental effects of juxtacortical WMHs for estimating cortical surfaces and related metrics and to implement a module for correcting surface estimation errors in the HCP pipeline.To automatically detect WMH labels, we used an MLbased Brain Intensity AbNormality Classification Algorithm (BIANCA) (Griffanti et al., 2016).Furthermore, we incorporated BIANCA-derived WMH masks into FreeSurfer to update the WM segments of the Free-Surfer pipeline, followed by re-estimation of the pial surface, thickness, and myelin in the HCP pipeline.The applied pipeline was validated by comparing the results with those obtained using manually edited WMHs in the same dataset.Using the outputs of the cortical surface analysis (WM and pial surfaces, cortical thickness, and myelin contrast), we compared the surface estimation errors of the automated ML WMH-adapted HCP pipeline (ML pipeline) with those of the manually delineated WMH-adapted HCP pipeline (manual pipeline) and the default HCP pipeline without considering WMHs (default pipeline).To evaluate the effect of the proposed pipeline, we developed two quality control (QC) methods identifying surface reconstruction errors because past QC methods have limitations in terms of inter-rater variability and difficulty of quantification (Backhausen et al., 2016;Monereo-Sánchez et al., 2021).A stereological QC method was developed for semi-quantitative visual estimation of the pial and white matter surface errors.We also developed a fully automated QC algorithm for detectings extreme outliers of surface metrics.We find these QC accurate and reliable for verifying cortical surface reconstruction by the proposed preprocessing pipelines.

Participants
We used MRI data registered in the MRI database of the Integrative Brain Imaging Center of the National Center of Neurology and Psychiatry (NCNP).This MRI database was originally created to serve as control data for neurodegenerative disorders including Parkinson's disease (Togo et al., 2023) and spinocerebellar degeneration (Bando et al., 2019).The research protocol was approved by the Institutional Review Board of NCNP (A2018-086) and was performed in accordance with the Declaration of Helsinki.For the present study, we retrieved MRI data from participants with both 3D T1 and FLAIR images.Resultantly, the present data were derived from 43 individuals who did not report any previous neuropsychiatric disorders (64.4 years old [SD 11.1], age range 41-83 years, 29 males).The exclusion criteria were as follows: a Mini-Mental State Examination (MMSE) score < 24 or local brain lesions (e.g., brain tumor or cerebral infarction) incidentally identified on MRIs.

Preprocessing of MRI data
All structural MRI data were converted from Digital Imaging and Communications in Medicine files to Neuroimaging Informatics Technology Initiative (NIfTI) files and then preprocessed using the HCP pipeline implemented with Connectome Workbench ver.1.5.0,FMRIB Software Library (FSL) 6.0.4 (Smith et al., 2004) and FreeSurfer 5.3-HCP (Fischl, 2012).The HCP pipeline consists of three steps: PreFreeSurferPipeine, FreeSurferPipeline, and PostFreeSurferPipeline (Fig. 1a).The initial PreFreeSurfer Pipeline step corrected the T1w and T2w structural MRIs for image distortions related to the gradient nonlinearity inherent to the scanner type, registered MRIs into the anterior-posterior commissural coordinate space of the Montreal Neurological Institute (MNI) templates using functional magnetic resonance imaging of the brain's (FMRIB) Linear Image Registration Tool (FLIRT) algorithm, and resampled the data at a 0.7-mm isovoxel.The pipeline then performs brain extraction, fine-tuned registration of T1w and T2w with boundary-based registration (Greve and Fischl, 2009), biasfield correction of T1w and T2w (Glasser and Van Essen, 2011), and nonlinear registration to the MNI template.The nonlinear registration was conducted using FMRIB's nonlinear Image Registration Tool (FNIRT), and both T1w and T2w volumes were resampled using spline at 0.7-mm and 2-mm isovoxels in the MNI space.

Cortical surface estimation and calculation of T1w/T2w myelin contrast
Subsequently, the FreeSurferPipeline performed cortical surface reconstruction using both T1w and T2w volumes.We used T2-weighted images to minimize the impact of the skull or dura signal on the surface reconstruction.In brief, the process included skull stripping of T1w volume using the standard brian mask template and the deformation field calculated in the PreFreeSurferPipeline (Glasser et al., 2013), classification of brain voxels into subcortical and cortical GM and WM segments (Fischl et al., 2002), intensity normalization (Sled et al., 1998), and extraction of the WM segment of the cerebrum.Next, the WM segment was used for the initial estimation of the tentative GM and WM boundary, forming a 'WM surface' in each hemisphere, followed by fine-tuning of the WM surface placement using the 0.7-mm isovoxel T1w volume (Glasser et al., 2013), the estimation of GM and cerebrospinal fluid (CSF) boundary forming a 'pial surface', fine-tuned registration of T2w to T1w with a boundary-based registration (Greve and Fischl, 2009), and fine-tuning of the pial surface using high-resolution (0.7 mm isovoxel) T2w volume (Glasser et al., 2013).We found that juxtacortical WMHs were often mislabeled as parts of cortical GM because of relatively low-and high-intensity WMHs in T1w and T2w MRIs, respectively, resulting in errors in the white and pial surface estimation.These phenomena are detailed in Section 3.2 and the correction methods in Section 2.3.5.Cortical thickness was calculated as the distance between the white and pial surfaces at each vertex of the cortex (Fischl and Dale, 2000).In the last step, the PostFreeSurferPipeline performed surface registration using a multimodal surface matching (MSM) program (Robinson et al., 2018) based on a metric of the folding pattern, 'sulc', generated by FreeSurfer (MSMsulc).The white and pial surfaces were then resampled into standardized mesh surfaces using 164k and 32k vertices after symmetrization between the left and right hemispheres (Van Essen et al., 2012).Next, the pipeline created a volume with a ratio of the T1w and T2w signals (myelin mapping) and mapped the values of the voxels between the inner and outer cortical surfaces onto the mid-thickness surface in the participants' native space.The surface metrics including the T1w/T2w myelin contrast and cortical thickness were resampled onto the 164k and 32k mesh surfaces (Glasser and Van Essen, 2011).The result of this analysis was based on the default setting of the HCP pipeline, which we call the "default pipeline".

Manual delineation of WMH masks
In this study, we created manually defined WMH masks on the 2-mm isovoxel image in the MNI space in each participant using FSL eyes in FSL (McCarthy, 2022).In addition, manual delineation of WMHs was achieved through a consensus between two board-certified neurologists (Y.O. and M.H.) who had access to both 0.7-mm and 2-mm versions of the T1w and T2w-FLAIR MRIs.The manually defined WMH masks were used as the training data for BIANCA and were directly fed into the HCP pipeline for surface re-estimation (see Section 2.3.5).The total volume of the WMHs was calculated for each participant after the WMH masks were registered back into the participant's native space using the inversion warp field.

Machine learning prediction of WMHs
Automated supervised ML-based segmentation of WMHs was performed using BIANCA (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/BIANCA). BIANCA is based on a k-nearest neighbor algorithm and computes the probability of each voxel being a WMH according to the voxel intensity and its spatial features.Furthermore, BIANCA has the flexibility to achieve accurate WMH segmentation in different MRI acquisition protocols by adapting to data.
To run BIANCA, bias-corrected T1w and T2w-FLAIR MRIs, which are the interim products of the default HCP pipeline, were used as features, and a manually delineated binary WMH mask as a label feature.The leave-one-out cross-validation scheme was used for training the model and predicting WMHs for each participant.The bias-corrected T1w and T2w-FLAIR volumes in the MNI nonlinear space were resampled to a 2 mm isovoxel.The WMH masks, brain mask, and T2w FLAIR volume were fed into the BIANCA.To run BIANCA, the following options were applied: spatial weighting = 1; no patch; selection of the non-lesion Fig. 1.Overview of processing pipelines for cortical surface analysis in the current study.
Y. Oi et al. points = no border (excluding three voxels close to the lesion's edge); the number of lesion points to use = 2000; the number of non-lesion points to use = 10,000.BIANCA outputs a volume file in which the intensity of each voxel represents the probability of being a WMH, ranging from zero to one (WMH probability map).To make a binary WMH mask from the WMH probability map, an appropriate threshold was chosen by using a region-by-region threshold optimization technique, LOCally Adaptive Threshold Estimation (LOCATE) (Sundaresan et al., 2019).LOCATE refines the estimation accuracy in three steps.First, the lesion probability map was divided into subregions based on Voronoi tessellation.Second, local features within these subregions were extracted.Finally, based on the extracted features, the optimal local threshold was estimated through a supervised learning method using the manually delineated WMH masks as the training data and the leave-one-out cross-validation scheme.Using the optimized threshold by LOCATE, BIANCA yields a binary mask for each individual (ML-predicted WMH mask).The ML-predicted WMH mask was validated using the dice similarity index (SI) with reference to manual WHM masks.The SI was calculated as 2*(|manually defined WMHs ∩ ML-predicted WMH mask|)/(|manually defined WMHs| + |ML-predicted WMH mask|).

Cortical surface re-estimation
The manually defined or ML-predicted WMH masks were both resampled into 0.7-mm isovoxel resolution using FLIRT with spline interpolation and binarized again by thresholding at 0.5, for each participant.The cortical surfaces were re-calculated by running a customized FreeSurferPipeline to rerun the surface reconstruction program ('recon-all' of FreeSurfer), according to 'Manual-Intervention Workflow for WM edit'.In brief, the user-edited, binarized WMH masks (derived from either the manual or the ML pipeline) were labeled with a value of "255", by following the WM editing procedure of FreeSurfer (https://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/WhiteMatterEdits_freeview).The labels were transferred to the corresponding voxels of the WM segmentation file (wm.mgz) as the WM representing the WM mask to be used for subsequent white surface reestimation.Next, the 'recon-all' command was rerun by specifying the workflow to reestimate the cortical surface (with flags of -autorecon2-wm and -autor-econ3).The products of the cortical surface re-estimation, such as the pial and WM surfaces, thickness, and 'sulc', were used to rerun the PostFreeSurfer pipeline, as described above.With the new surface information, the pipeline re-calculated the symmetrization, resampled the surfaces to 164k and 32k meshes, re-registered the surfaces with MSMSulc, and re-calculated the myelin mapping and thickness.
Cortical surfaces are reconstructed using three pipelines: (1) the original HCP pipeline (default pipeline), (2) the white matter (WM) hyperintensity (WMH)-adapted pipeline with manually-drawn WMHs (manual pipeline), and (3) the WMH-adapted pipeline with the machine learning (ML) algorithm (ML pipeline).T1w and T2w-fluid-attenuated inversion recovery (FLAIR) images are used as inputs of the HCP pipeline consisting of PreFreSurferPipeline, FreeSurferPipeline, and Post-FreeSurferPipelines in the default pipeline (the upper row).In the manual pipeline, manually-drawn WMHs are used for editing WM, and FreeSurfer is run with -autorecon2 and -wm options to re-estimate cortical surface (middle row).The ML pipeline (lower row) used an ML algorithm for WMH estimation (FSL BIANCA), trained using manually-made WMH masks, followed by optimized thresholding (LOCATE) and the same edits of WM for re-estimation.The WM surface, pial surface, and bias-corrected T1w/T2w myelin and thickness maps are used to evaluate the validity of the customized pipelines.
2.3.6.Quality control (QC) of cortical surface 2.3.6.1.Stereological QC by Experts.Two expert radiologists (T.O. and T.A.) were asked to independently evaluate the quality of the pial and WM surfaces of the three pipelines and to count the number of surface errors in a blinded manner with respect to the difference in pipelines.For the semi-quantification of errors, we applied a stereological method, which is commonly used in quantitative neuroanatomy, for example, to count the number of neurons with a microscope (Saper, 1996;Zhao and van Praag, 2020).The coronal sections of T1w images and pial and WM surfaces in the MNI nonlinear standard space are displayed in a 4 × 7 matrix (a total of 28 sections at an interval of 7 mm in the y-direction) with grid lines overlaid at 7-mm intervals (Fig. 2).The images were imported into the background image and grids using Microsoft Excel (Microsoft Corporation 2019).The aforementioned radiologists were asked to identify the surface errors in each grid, depending on the type of surface errors: pial surface error only, WM surface error only, and both errors.The radiologists were asked to exclude medial temporal regions (including the hippocampus and amygdala) from their evaluation, based on anatomical location on structural MRIs.This policy was set because the cortical surfaces were not necessarily well estimated in these regions in some cases, regardless of the WMHs in the default pipeline.The number of each category was computed cell by cell across the 43 datasets separately for the manual, ML, and default pipelines, yielding two sets of each category data for each pipeline.
2.3.6.2.QC using surface defect score.Automation or quantification of surface QC has not yet been achieved.To date, QC of cortical surface estimation has commonly been performed by visual inspection of the cortical surface boundaries overlain over the orthogonal sections of T1w images (e.g., https://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/OutputData_freeview).
In this study, we developed an original QC method for the cortical surfaces.We assume that if there are cortical surface defects, then the values of both T1w/T2w myelin and thickness at a surface vertex deviate from the standard values.One may expect that cortical surface defects may cause an error exclusively in thickness, but this is unlikely because T1w/T2w values should take much lower (CSF) or higher signals (WM) than those in the cortical ribbons.The deviation from the standard values can be estimated using robust Z-scores (RZ) based on the standard values.The HCP pipeline can internally define a "medial wall" (representing the inner/medial surface of 3rd ventricles, hippocampus, amygdala and thalamus, and the midsection of the corpus callosum) so as to automatically exclude the medial temporal regions.With the use of the medial wall, therefore, the defect scores were not calculated in the medial temporal regions to be consistent with the policy for the stereological QC.
We defined a vertex-wise surface defect when absolute RZ values for both T1w/T2w myelin (RZm) and thickness (RZt) took outlier values from the standard.Therefore, the surface defect at surface vertex i can be expressed as follows: where Z 0 is the Z-threshold for outliers.RZ can be estimated using the following equation: where X(i) is the measured T1w/T2w myelin (i.e., MyelinMap_BC) or thickness at a vertex, i, in 164k Connectivity Informatics Technology Initiative (CIFTI) dscalar format in a participant, Q2 is the median of the standard values at vertex i, and NIQR is the normalized interquartile range at vertex i: where Q3 and Q1 are 75 percentile and 25 percentile of the standard values, respectively, and F(0.75) and F(0.25) are the probability density functions of the normal distribution at the probability of 0.75 and 0.25, respectively.
For the Q2, Q3, and Q4 of standard values, we used 164k MyelinMap_BC and thickness data in a CIFTI dscalar format, which was pre-generated from the YA-HCP data S1200 (n=1096) and released at BALSA database (https://balsa.wustl.edu/reference/pkXDZ).Next, the vertices with the surface defect were mapped onto the surface in each individual, which we called the 'surface defect map' (SDM).For the group-wise evaluation, we created a surface defect frequency map.The total number of vertices with surface defects at Z 0 =5 was termed the 'Surface Defect Score' (SDS5) and was used for the subsequent analysis.That was, to define outliers, we used the robust Z score of = 5, which corresponds to the extreme outliers in the literature (Tukey, 1977).We furthermore checked the validity of this threshold based on preliminary analyses in which Z score thresholds between 2 and 5 were applied to small data subsets sampled randomly from the entire datasets.Note, however, that optimization of thresholding based on the robust z-scoring is recommended for future studies since an optimal threshold is dependent on the distribution and size of the data.To calculate SDM and SDS5, we used a quality check and reporting tool, hcppipe_qc, (htt ps://github.com/RIKEN-BCIL/bcil/blob/master/bin/hcppipe_qc),which can be applied to the data analyzed using the HCP or non-human primate-HCP pipeline.Note that the current SDM and SDS5 are based on the standard values of the surface metric distribution of young adults (YA-HCP), as well as on a threshold corresponding to extreme outliers.These operational parameters may be optimized in the future depending on the research questions, protocols of data acquisition, and data quality (see Discussion).We tested whether the SDS5 values were correlated with the WMH volume in each pipeline.In addition, we computed the degree to which SDS5 was improved by the proposed pipeline, thereby defining the cases in which the proposed pipelines improved the surface analyses above chance.We defined the improved cases as participants whose differences in SDS5 (between the default pipeline and ML pipeline) were outliers (Q3+1.5IQR) in the distribution of the SDS5 changes.

Statistical analysis
To estimate how the number of surface errors was affected by the raters and pipelines, a three-way analysis of variance (ANOVA) was applied to the number of errors, with factors of the rater (raters 1 and 2), the pipelines (default, manual, and ML pipelines), and surfaces (pial and WM).For the comparison between the pipelines, a post-hoc comparison was performed using the Wilcoxon matched-pairs signed-rank test, with p-values adjusted by Bonferroni correction for multiple comparisons.To assess inter-rater agreement, error detection in each grid was analyzed between the two raters using Cohen's kappa.To confirm the effect of age on WMH volume in previous studies (Sachdev et al., 2007;Ylikoski et al., 1995;Atwood et al., 2004;Silbert et al., 2008), the correlation between the logarithm of WMH volume and age was tested using Spearman's rank correlation.We also analyzed the correlation between WMH volume and SDS5 in the default, manual, and ML pipelines using Spearman's rank correlation.
Furthermore, we adopted a data-driven approach to define the improvement in the surface analysis of WMH-adapted pipelines.A substantial portion of the SDS5 changes after the application of WMHadapted pipelines was distributed around zero, indicating that many MRI data with low WMH load did not show changes by the pipeline.There were, however, asymmetric changes in SDS5, that is, some data showed considerable decreases, while no data showed such increases.These interpretations were supported by the correlation analysis between the WMH volume and SDS5, as well as by the visual inspection of representative cases.
This information was used to define cases in which cortical analyses were improved by the WMH-adapted pipelines; the upper inner fence (Q3+1.5*IQR of the data) was adopted as the criterion for the meaningful SDS5 decrease (thus, the improvement of cortical analysis).We then performed a receiver-operating characteristic (ROC) curve analysis to determine the optimal cutoff value of WMH volume, above which cortical estimation procedure likely benefits from the implementation of the proposed procedure.With the determined cutoff WMH volume, the sensitivity, specificity, AUC, and data with improvement were computed.
To gain insight into the potential effects of WMHs on the group comparison, the myelin and thickness maps were parcellated into 180 parcels using an HCP multimodal parcellation atlas (HCP-MMP, Version 1.0) (Glasser et al., 2016), upon completion of the cortical surface re-estimation process.We compared the myelin maps and cortical thickness maps of the default and ML pipelines.A paired t-test was applied as implemented in the Permutation Analysis of Linear Models (PALM) tool, which provided a non-parametric family-wise error (FWE) correction over multiple parcellations (Winkler et al., 2014(Winkler et al., , 2016)).

WMHs in the dataset
During the manual delineation procedure, WMHs were observed in all 43 participants.In the group-level frequency map of the WMHs, they were distributed most densely in the periventricular zones but also in the deep WM just beneath the frontal, temporal and parietal cortices, corresponding to the juxtacortical WMHs (Supplementary Fig. S1-a).The number of WMHs (combining all the periventricular WMH and deep WMH) ranged from 7-268 (median = 80.2), with a mean volume of 6.1 ±8.4 cm 3 .Hereafter, unless otherwise noted, the term "WMH volume" refers to the volume of WMH including both periventricular WMH and Fig. 2. A stereological method used in visual quality control (QC) of cortical surfaces.
A) The coronal sections of T1w images (in gray color) as well as the pial (blue line) and WM surfaces (lime-colored line) in the MNI nonlinear standard space are displayed in a 4 × 7 matrix (a total of 28 sections at an interval of 7 mm in the y direction) with grid lines overlain at 7-mm intervals (in the x and z directions).B) A zoomed panel showing grids over the medial frontal area.Two qualified radiologists identified and color-coded the surface errors in each grid: the pial surface error only (red), WM surface error only (light blue), and both errors (yellow).
Y. Oi et al. deep WMH.The age of the participants linearly correlated with the volume of manually defined WMHs after logarithmic transformation (r = 0.63, p < 0.001; Supplementary Fig. S1-b).The ML-predicted WMH volume showed a strong correlation with the manually defined WMH volume (Spearman's ρ = 0.95, p < 0.001).The ML-predicted WHMs had a sensitivity of 0.63, specificity of 0.999, and SI of 0.51 ± 0.16, using the manually defined WHMs as the standard.Overall, from visual inspection, the ML-predicted WMH agreed well with the manually defined WMH.Moreover, a mismatch between the manually defined WMH and the ML-predicted WMH was present at the periphery of WMHs in the deep or periventricular WM; however, the mismatch was not evident in the juxtacortical area.

Characteristics of surface estimation errors
We observed an extreme case in which a small WMH caused extensive cortical surface errors when analyzed using the default pipeline (Fig. 3).In such cases, WMHs were mislabeled as parts of the GM because of the relatively low and high intensities of WMHs in T1w and T2w MRIs, and the WM surface was incorrectly estimated.The WM surfaces often invaded the WMHs, which was accompanied by pial surface errors (Fig. 3A, B).Across the participants, the WM surface errors often led to overestimation and sometimes underestimation (Fig. 3C) of the cortical thickness.Consequently, the bias-corrected myelin maps, which were calculated based on an inadequate definition of the cortical ribbon, often yielded a mosaic of exceptionally high/ low myelin signals (Fig. 3D) in the vertices near the WMHs (Fig. 3B).SDM clearly revealed these cortical surface abnormalities (Fig. 3E).
The ML-predicted WMH masks reasonably overlapped with the WMHs (yellow outline in Fig. 3B, lower panel).After re-estimating the surfaces using either the manual or ML pipeline, the pial and WM surfaces were corrected (Fig. 3A and B, lower panels).Subsequently, the abnormalities in the myelin and thickness maps, as well as the SDM surface defect, disappeared (Fig. 3C-E, lower panels).Overall, at the individual level, the existence of WMHs increased errors in the estimation of both WM and pial surfaces in some cases, resulting in abnormalities in the myelin and thickness maps and SDM.Both the manual and ML pipelines appeared to reduce estimation errors.

Effects of WMH-adapted pipelines in surface QC
Two raters (T.O. and T.A.) blindly performed visual QC for the cortical surfaces across the three pipelines using the stereological method (Figs. 2 and 4).We assessed the inter-rater reliability of stereological QC by independently treating each grid-wise rating.The result showed a fair agreement between the two raters; Cohen's kappa was 0.27 and 0.34 in the WM and pial surface, respectively.Moreover, when the error count was pooled in a grid across the error types, the agreement reached Cohen's kappa of 0.57.
The SDS5 analysis indicated that both the manual and ML pipelines significantly reduced outliers compared with the default pipeline (Fig. 5).The median SDS5 values were 11 (44.5 IQR), 4 (14), and 4 (15.5) in the default, manual, and ML pipelines, respectively (Fig. 5A).For reference, the median (IQR) of SDS5 computed from the 1113 participants of the YA-HCP dataset was 0 (3).The ANOVA results showed significant differences in SDS5 across the pipelines (p = 0.001).Wilcoxon signed-rank tests with Bonferroni correction revealed a significant reduction in SDS5 (p<0.001) in both the manual and ML pipelines compared to the default pipeline.
The difference in SDS5 between the default and ML pipelines was not normally distributed (see method 2.3.7);hence, we used a value of 42.5, corresponding to the upper inner fence (Q3+1.5*IQR of the data) as the criterion for improvement with the proposed pipeline.With this criterion, eight individuals showed improvement.In the ROC curve analysis, the detection of the improved cases was maximum with a cutoff WMH volume of 5.6 cm 3 with an AUC of 0.85 (Supplementary Fig. S2).We identified twelve individuals with WMH volumes greater than this cutoff, among which seven showed a reduction in SDS5 with the ML pipeline.
When we compared the group-wise surface defect frequency map between the default and ML pipelines, it was obvious that the application of the proposed pipeline substantially reduced the outliers likely caused by WMHs (Supplementary Fig. S3a).However, there might also be other types of surface errors, which were not reduced even after the application of our proposed pipeline, in the medial surface of the brain (cingulate areas) and pre-and post-central gyri.Although such non-WMH causes of the surface error were outside the scope of the present study, we examined individual cases caused by apparently non-WMH causes to explore the future application of the SDS method.The surface errors in the pre-and post-central gyri were potentially because of the residual B1 bias (Supplementary Fig. S3b) as suggested by the hemispheric asymmetry (Glasser et al., NeuroImage 2022).To summarize, the SDS seemed useful to detect non-WMH-related surface errors as well, but its potential should be addressed in future studies.
The impact of WMHs on surface estimation can be extended even to the HCP-style group-level parcellation analysis that is widely performed in the field.We compared cortical thickness and myelin maps before and after consideration of WMHs.In the paired t-test of the parcellation-wise thickness map the differences between the default and ML pipelines reached statistical significance after the correction for multiple comparisons (PALM, p < 0.05, family wise error corrected) in several surface  B A scatter plot of the SDS5 against the manually defined WMH volume at the individual level (filled circles).In many cases, the SDS5 decreased in the manual (green) and ML (blue) pipeline compared with the default pipeline (red).The three data points of the same individual are connected with colored lines so that the line color corresponds to the color of the data symbol with a higher SDS5 value.The WMH volume had a correlation with SDS5 only in the default pipeline (corrected p = 3.6*10 − 5 , rho = 0.61), but this correlation disappeared in the manual (corrected p = 0.39, rho = 0.23) and ML pipeline (corrected p = 0.45, rho = 0.23).The surface estimation errors due to WMH and their improvement by the proposed pipeline were found mainly in the cases with WMH volume > 5.6 cm 3 when the improvement of SDS5 of > 42.5 (Q3+1.5IQR)was adopted as a criterion (sensitivity of 0.80, specificity of 0.86 and accuracy of 0.84).Among twelve individuals with WMH volume greater than this cutoff, seven showed the reduction of SDS with the ML pipeline.Among eight individuals showing the reduction of SDS5 with the ML pipeline, seven individuals showed WMH volume greater than this cutoff.areas (R_6a, R_46, R_9-46d, R_9a, R_8BL, R_9m, R_p32, R_p32pr, R_v2, R_PIT, R_IFSa, R_a47r, R_a10p, R_11l, R_10pp, R_OFC, R_FOP4, L_10r, L_VMV1, Supplementary Fig. S4-a).As for the myelin BC map, there was no statistically significant difference between the default and ML pipelines.

Discussion
Herein, we report that cortical surface estimation errors are algorithmically caused by age-related WMHs and propose automated correction and QC methods for cortical surface errors due to WMHs.Surface estimation errors occurred because WMHs were mislabeled as parts of the GM because of the relatively low and high intensities of WMHs in T1w and T2w MRIs, respectively.We used previously proposed ML algorithms (BIANCA and LOCATE) to automatically generate WMH masks to correct mislabeled WM segments and re-estimate cortical surfaces.The correlation between age and the logarithm of WMH volume is consistent with previous reports in which the WMH volume increased monotonically as a function of age (Atwood et al., 2004;Silbert et al., 2008).Two blinded raters (T.O. and T.A.) warranted a decrease in the surface estimation errors in the proposed pipeline.Furthermore, SDS5, a novel surface QC metric, sensitively detected a reduction in outliers in the proposed pipeline compared with the default pipeline.Finally, the adverse effects of WMHs on the surface estimation accuracy can be observed even in HCP-style group-level parcellation analysis, which is widely performed in the field.The proposed pipeline improves the reliability of surface-based MRI analyses in middle-aged to older people with WMHs and may contribute to disentangling the effects of WMHs and cortical abnormalities following aging and pathological processes.The present methodology may help researchers find reliable imaging markers for aging and age-related neuropsychiatric disorders.
Hunting imaging markers have become increasingly important in the clinical application of neuroimaging for neurodegenerative and psychiatric disorders.Recent studies have used cortical-surface analysis to identify neurobiological changes in the cortex during health and disease (Bethlehem et al., 2022;Cho et al., 2013;Lemaitre et al., 2012;Salat et al., 2004Salat et al., , 2009)).WM often affects various brain disorders related to aging; however, it has been difficult to uncover the contribution of WMHs to their pathophysiology.Alzheimer's disease and Parkinson's disease accompanying WMHs likely form a continuum with vascular dementia and vascular parkinsonism, respectively, making it difficult to disentangle the effects of WMHs on pathophysiology.Future neuroimaging studies in neurodegenerative disorders should consider the effects of vasculopathy, including the disconnecting effects of WMHs, overlain by those of proteinopathy and neurocircuitopathy (Wakasugi and Hanakawa, 2021).WMHs are frequently found on brain MRIs in older populations (Sachdev et al., 2007;Ylikoski et al., 1995;Atwood et al., 2004;Silbert et al., 2008).Therefore, a detailed investigation of WM and cortical integrity and their relationship with WMHs is important for a mechanistic understanding of the pathophysiology of cognitive and motor disturbances, which are cardinal symptoms of Alzheimer's disease and Parkinson's disease, respectively.
In this study, the current ML algorithm for predicting WMH worked reasonably well, and the validation was comparable to those in the literature.Furthermore, we created a WMH probability map using BIANCA and adopted LOCATE for threshold optimization, yielding MLpredicted WMHs.The ML-predicted WMH volume correlated with the manually defined WMH volume, supporting the hypothesis that BIANCA plus LOCATE reasonably predicted WMHs.In a previous study, the performance (SI) of BIANCA plus LOCATE varied across the datasets (from 0.64±0.23 to 0.73 ± 0.13) (Sundaresan et al., 2019).The present BIANCA plus LOCATE showed poorer sensitivity (0.63) and better specificity (0.99) than the sensitivity (0.81) and specificity (~0.98) of the Vrije Universiteit Amsterdam dataset (Sundaresan et al., 2019) which showed WMH volumes similar to ours.A reason behind the relatively low performance of BIANCA plus LOCATE in our setting may be that we were not able to fully optimize the BIANCA plus LOCATE mainly because the WMHs in our MRI dataset were limited.Still ML-generated WMHs were effective in reducing surface errors, thereby fulfilling our study purpose.Therefore, although ML performance can further be improved, we were able to demonstrate the utility of the proposed method to reduce errors in cortical surface estimation.
Overall, the stereological visual assessment successfully detected a reduction in cortical surface errors using the ML-assisted HCP pipeline relative to the default pipeline.To date, many attempts have been made to establish a QC workflow of cortical surfaces estimated by surface reconstruction algorithms.Most have proposed QC workflows based on visual information, for which many documents are available online (Supplementary Table S1).Visual QCs rely on the visual inspection of preprocessed outputs through graphical user interfaces and are considered to be the gold standard for surface-based analysis of neuroimaging data.These visual QC approaches allow assessment by qualitative grading (Backhausen et al., 2016) subject to raters' experience and time limitations (Monereo-Sánchez et al., 2021).Visual QCs have obvious limitations in terms of inter-rater variability and quantification difficulty.Several studies have proposed semi-quantified visual QCs.A QC rating framework, the 'VisualQC' (Raamana et al., 2021;Raamana et al., 2018), by creating a visualization system that allows easy assessment of cortical parcellations by raters.However, the 'VisualQC' uses relatively small numbers and orientations of image slices (two rows by six slices by default), which can be customizable by the users; thus, visual input information is not necessarily standardized.In addition, the rater's assessment in VisualQC is not based on stereology; thus, rater responses are recorded for each individual's subject of interest.Therefore, previous visual QC methods may not be fully quantitative, and no previous QC methods provide specific types of surface errors (e.g., errors in the WM, pial matter, or both).With this background, we applied a stereological approach for quantitative estimation of the surface errors and validation of the WMH-adapted pipeline.
The present stereological techniques allow for visual QC with a reasonable level of quantification and standardization.Stereological assessment is an unbiased quantification method for neuroanatomy (Saper 1996;Zhao and van Praag 2020).It allows the standardized evaluation of microscopic neurobiological units (e.g., the number of neurons, synapses, and axons) to be compared across a large tissue space.However, if the stereological approach is not well-standardized, the quantified results may be highly variable across laboratories or raters by as much as 300 % (Herculano-Houzel et al., 2015).The bias is caused by various factors, including differences in sampling strategies across raters.Therefore, we developed a stereological QC system by standardizing the evaluation units across different MRIs, reducing arbitrariness, and increasing the spatial uniformity of the error counts.Unexpectedly, however, the results from the stereological QC showed only a fair level of inter-rater agreement of the error count when analyzed separately for the white and pial surfaces (kappa = ~0.3)or a moderate level of agreement when both error types were pooled (kappa = 0.57).In addition, we found a significant interaction effect among the rater, pipeline, and surface type factors.This means that error detectability is influenced by many factors known for radiological diagnosis, such as spectrum bias and misclassification bias (Pavlou et al., 2021).The low agreement between the two raters can be attributed to the limitation of subjective visual assessment, and our stereological QC method should be further validated in terms of intra-rater agreement.Together, our results indicate that the stereological method applies to the visual QC of cortical surface errors, but further elaboration is needed to achieve fully reliable stereological QC.
Furthermore, in this study, we developed an alternative automated QC algorithm that calculates the SDM and derived score (SDS5) for vertex-wise surface error detection.We found that SDM/SDS5 was useful for assessing the presence of surface errors and their reduction using the elaborated pipelines.There has been an attempt to automate quantifiable QC for cortical surface errors using quality metrics, such as Y. Oi et al.FreeSurfer's topological defect count and curvature smoothness (Ségonne et al., 2007;Tian et al., 2021).However, these topological errors are counted during the initial estimation of WM surfaces; thus, it is not clear whether such errors at an intermediate process are directly associated with the errors in the final results, including the thickness or myelin maps.Moreover, all thickness or myelin errors may not necessarily accompany topological defect count.The SDM in our study allows the joint assessment of thickness and myelin maps quantitatively based on the reference values of cortical metrics in the YA-HCP1200 database.The vertices on the abnormally estimated cortical surfaces often have extreme outlier values in both thickness and myelin values; thus, we defined the statistical threshold of SDS5 in reference to the SDS5 computed from the thickness and myelin maps of YA-HCP1200.Both SDM and SDS5 can be calculated automatically as long as an adequate structural MRI dataset is analyzed using HCP pipelines.The positive correlation of SDS5 with the WMH volume in the default pipeline indicated that SDS5 indeed reflected errors in the presence of WMHs (see Fig. 4b).Moreover, this correlation was abolished after the application of WMH-adapted pipelines (see Fig. 4d).The SDM also proved useful for visualizing the location of the surface errors and their reduction using the proposed WMH-adapted pipelines.Accordingly, SDS and SDM5 provide useful information for automatic and quantitative estimation of surface QC.
Mapping the surface estimation error by SDM is also likely useful for identifying the various error sources.SDM detected errors related to not only WMHs but also B1 transmission bias contamination in spin echobased T2-weighted images.The errors in the pre-and post-central gyri would primarily be explained by the residual B1 bias.The B1 transmission bias remains even though T1w-and T2w-based B1 biasfield corrections are applied in the PreFreeSurfer pipeline.This is primarily because T1w-and T2w-based B1 biasfield correction removes the bias of the B1 receive field rather than the transmission field (Glasser and Van Essen, 2011;Glasser et al., 2022).The presence of hemispheric asymmetry supports the possibility of B1 transmission bias (Glasser et al., NeuroImage 2022).Future studies may be required to address this issue by estimating the B1 transmission field in volume space from add-on scanning sequences such as Sa2RAGE (Eggenschwiler et al., 2012) and spin-echo and gradient-echo EPI (Glasser et al., 2022) and by feeding them into the FreeSurfer-based re-estimation of the cortical surface.We also note that we could not exclude the possibility that the surface errors in the pre-and post-central gyri reflected, at least in part, age-related changes in the cortical thickness and myelin maps because of the age-gap between the HCP-YA and the current dataset.
The present study had some limitations.First, the generalizability of the proposed WMH-adapted pipeline is limited.The current method was built on a small amount of data; thus, further investigation using a larger dataset may be required.Current MRI data were collected using a legacy MRI protocol (spatial resolution of 1 mm isovoxel), which was not optimized for surface reconstruction; thus, the results of the surface errors might have been overestimated.Future studies with MRIs acquired with a higher spatial resolution, such as the HCP-style protocol (spatial resolution of 0.8 mm), would be less likely to have surface estimation errors.
Second, the differences in scanning protocol and populations may bias the results of SDS5 because SDS calculation is based on the YA-HCP dataset.Recent studies indeed suggest that even using identical scanning protocols, measured cortical thickness and myelin map are affected by the differences in the sites and MRI scanners (Koike et al., 2021).We also found SDS5 was sensitive to the B1 biasfied in the pre-and post-central gyri, which remained unchanged by the proposed pipeline.Further elaboration is needed for qualifying surface metrics and their errors in a more generalized manner, for example, by implementing harmonization methods across the MRI protocols and scanners (Maikusa et al., 2021 andSun et al., 2022) and by using an age-matched MRI database.The parameters and threshold of SDS5/SDM may need to be adjusted depending on the future optimization for each study.That being said, we consider that the SDS5/SDM providing a relative yet quantitative vertex-by-vertex QC value is a useful method among the currently available surface QC methods.The proposed pipeline can be applied to HCP-style MRI protocols in the Brain/MINDS-beyond project (Koike et al. 2021) to test the adverse effects of WMHs.That said, our proposed pipeline should be valid because the reconstruction algorithm itself is subject to segmentation errors due to altered contrast in WMH (see Section 3.2) and is not directly related to the spatial resolution.The validity of the training of the WMH discriminator machine (BIANCA) may be evaluated more rigorously using independent datasets.
Third, it has not yet been investigated whether the proposed pipelines can be generally used for other neurological conditions accompanying hyperintensities in the WM, such as multiple sclerosis (Wattjes et al., 2015), progressive multifocal leukoencephalopathy, neuromyelitis optica spectrum disorder (Pache et al., 2016), cerebral amyloid angiopathy (Subotic et al., 2021), cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) (De Lucia et al., 2022), Binswanger disease (Yin et al., 2014), schizophrenia, and mood disorders (Zanetti, et al 2018;Serafini et al., 2014;Lyoo et al., 2002).Indeed, some of these studies applied cortical surface analysis to understand cortical pathophysiology, but its association with WMHs has not been well characterized.The WMH discriminator machine and surface re-reconstruction algorithm may need to be optimized for each disease population because the pattern of the spatial distribution of WMH is often different between diseases, for example, those related to small vasculopathy vs. leukoaraiosis (Rosenberg et al., 2016).
Fourth, the application of our proposed method to the medial temporal regions was outside the scope of this study.Indeed, the medial temporal regions were excluded from both stereological QC (by means of the instructions) and SDS evaluations (by means of the "medial wall") due to potential inaccuracies in the estimation of cortical surfaces in these regions, regardless of the WMHs.Poor segmentation performance is especially problematic in aged patients with medial temporal atrophy.To address the effects of WMHs on the surface analysis of the medial temporal lobe, one possible approach would be to incorporate a correction for volumetric measures for, both WMHs and atrophy (Griffanti et al., 2022 Neuroimage:Clinical). Therefore, this combined correction method should be tested in the future to address the surface analysis of the medial temporal lobe in aged patients with WHMs and atrophy.
Given the ensured generalizability, the proposed pipeline will allow for reliable assessment of cortical indices in large-scale lifespan cohort studies.Although recent literature suggests that aging is an independent factor affecting cortical thickness (Salat et al., 2004;Bethlehem et al., 2022) and myelin (Grydeland et al., 2013), it remains unclear how such age-related cortical structural changes perturb cortical functions and the resultant behaviors.Structural disconnection in WM might be more specifically linked to the disorganization of functional connectivity responsible for behavioral deficits than the disruption of GM areas (Griffis et al., 2019).Disintegrity in WM may disconnect cortico-cortical and subcortical fiber bundles, thus causing dysfunction of the connected brain structures (Thiebaut de Schotten, Foulon, and Nachev 2020).Therefore, future studies are needed to investigate the effects of age-related WMHs on functional activity or connectivity in cortical and subcortical brain areas.In such studies, it may be worth testing the usefulness of the proposed pipeline for segregating subcortical from cortical views and refining the pathophysiology of neuropsychiatric disorders.

Conclusion
In conclusion, the current study revealed that surface-based analysis is prone to surface estimation errors due to WMHs and that these errors can be corrected by automated WMH prediction followed by surface reestimation based on multimodal MRI datasets.The expected correction was confirmed by both stereological visual QC and automated surface QC with SDM/SDS5; however, the latter seemed more sensitive to subtle changes in quality.Validation of the proposed WMH-adapted pipeline warrants further investigation into its usefulness and refinement in a larger population study.

Funding and acknowledgments
This study was in part supported by a grant (Brain/MINDS-beyond) from the Japan Agency of Medical Research and Development (AMED) (JP18dm0307006, JP19dm0307004, JP22dm0307002 to Takuya Hayashi and JP18dm0307003 and 18dm0207070 to Takashi Hanakawa) and JSPS KAKENHI (19H05726 and 19H03536) to Takashi Hanakawa.

Fig. 3 .
Fig. 3.A representative of white matter (WM) hyperintensity (WMH) in T1w and T2w FLAIR MRI and cortical surface errors.The top row shows the default findings in a representative case in whom the left frontal subcortical WMH (yellow arrows) caused errors of WM surfaces (lime line) and pial surfaces (blue line) overlain on T1w MRI (A) and T2w MRI (B).The surface analysis yields errors at the corresponding surface vertex (white arrows in C, D, E).Cortical thickness of the corresponding vertex is likely underestimated as compared with those in the surrounding vertices (C), whereas bias-corrected myelin (myelin BC) is likely over-and underestimated (D).The surface defect map (SDM) detects outliers (red areas in E).The middle and bottom rows show the corrected WM and pial surfaces after the reanalysis with the manual pipeline and the machine learning (ML) pipeline, respectively.The yellow line in (B) shows the contour of WMH defined by manually or estimated by ML.The corrected thickness (C) and myelin BC (D) maps, and SDM (E).The white spheres in C, D, E show the identical vertex corresponding to the tip of the yellow arrow in panel A and B.

Fig. 4 .
Fig. 4. Effects of WMH-adapted pipelines on the visual QC by the two raters in the rain cloud plots.The number of errors at the WM surface (A) and the pial surface (B) per participant is shown for each rater across the default (red), manual (green) and ML (blue) pipelines (see also Fig. 2).Shown are data points (each participant), box plot (w/ median (Q2), Q1 and Q3 with whiskers showing the largest and smallest data point excluding outliers), and smoothed histogram (gray).**Corrected P < 0.01 and *corrected P < 0.001 in Wilcoxon Signed rank test.

Fig. 5 .
Fig. 5. Effects of WMH-adapted pipelines in the automatic QC of cortical surfaces.A Comparison of surface defect scores (SDS5) between three groups of default, manual and ML pipeline.SDS5 in 3 pipelines are shown by the datapoint, boxplot and histogram in the rain cloud plots.Wilcoxon signed rank test,**corrected p < 0.001.B A scatter plot of the SDS5 against the manually defined WMH volume at the individual level (filled circles).In many cases, the SDS5 decreased in the manual (green) and ML (blue) pipeline compared with the default pipeline (red).The three data points of the same individual are connected with colored lines so that the line color corresponds to the color of the data symbol with a higher SDS5 value.The WMH volume had a correlation with SDS5 only in the default pipeline (corrected p = 3.6*10 − 5 , rho = 0.61), but this correlation disappeared in the manual (corrected p = 0.39, rho = 0.23) and ML pipeline (corrected p = 0.45, rho = 0.23).The surface estimation errors due to WMH and their improvement by the proposed pipeline were found mainly in the cases with WMH volume > 5.6 cm 3 when the improvement of SDS5 of > 42.5 (Q3+1.5IQR)was adopted as a criterion (sensitivity of 0.80, specificity of 0.86 and accuracy of 0.84).Among twelve individuals with WMH volume greater than this cutoff, seven showed the reduction of SDS with the ML pipeline.Among eight individuals showing the reduction of SDS5 with the ML pipeline, seven individuals showed WMH volume greater than this cutoff.