Diffusion MRI harmonization enables joint-analysis of multicentre data of patients with cerebral small vessel disease

Highlights • RISH harmonization is applicable to multicentre dMRI of elderly subjects with SVD.• Harmonization removes site-differences in dMRI metrics in matched controls.• Associations between dMRI metrics and SVD markers in patients are preserved.• Harmonized scans can be pooled to into a single large dataset to increased power.


Introduction
Combining data from multicentre studies is becoming increasingly important in neuroimaging, with the aim to increase statistical power and provide outcomes that are more generalizable than those obtained at single-centre level (Lorca-Puls et al., 2018;Van Horn & Toga, 2014). However, joint analysis of multicentre magnetic resonance imaging (MRI) can be challenging if inter-site variability due to acquisitionrelated inconsistencies is not taken into account (Vollmar et al., 2010).
Inter-site variability is particularly problematic in diffusion MRI (dMRI) and can be caused by a range of factors, including scanner hardware (e.g., scanner manufacturer, magnetic field strength, gradient strength, field inhomogeneities), software, acquisition parameters (e.g., voxel size, number of gradient directions, echo time) (Helmer et al., 2016). All these factors may affect the measured diffusion signal intensity and metrics derived from the data. In prospective multicentre studies, this variability can be controlled using standardized acquisitions and scanners from the same manufacturer . However, when retrospectively combining data form different cohorts, differences in acquisition can be substantial. Even with phantoms, dMRI metrics have shown more than 7% variability across sites (Palacios et al., 2017;Teipel et al., 2011;Timmermans et al., 2019). In the human brain, this variability is even more pronounced and non-uniform across tissues with acquisition-related differences reaching the same order of magnitude as case-control differences (e.g., traumatic brain injury vs. controls, Kumar et al., 2009). In such scenarios, if multicentre data were naively pooled into a single analysis, true biological effects would likely be masked by acquisition-related differences. It is therefore crucial that multicentre dMRI is harmonized prior to joint-analysis .
Two main kinds of retrospective dMRI harmonization techniques have been developed to date. The first category operates on each diffusion metric individually (e.g., fractional anisotropy -FA, and mean diffusivity -MD) by using statistical approaches such as meta-analysis, or by modelling the difference between sites with covariates during analysis (e.g., ComBat, Fortin et al., 2017). By contrast, the second category of harmonization operates directly on the raw dMRI data rather than on each diffusion metric Koppers et al., 2019;Mirzaalian et al., 2015). This type of harmonization is more general since the raw diffusion signal is harmonized in a modelindependent manner, theoretically allowing any type of subsequent analysis. In this study, we focus specifically on the second type of harmonization with the rotation invariant spherical harmonics (RISH) methods (Mirzaalian et al., 2015).
The core idea of the RISH method is to map the dMRI signal from a 'target' site to a 'reference' site, using groups of healthy subjects matched for factors such as age, sex, etc. This signal mapping is possible because the dMRI signal intensity can be represented in a spherical harmonics (SH) basis with a given number of parametrization coefficients (Tournier et al., 2004). From this representation, RISH features can be extracted and scaled to harmonize the dMRI signal between two sites . Applications of RISH harmonization have been presented using synthetic data and with data of healthy young subjects, with recent work showing that acquisition-related differences are removed while preserving age-and sex-related effects . Recently, this method has also been applied to harmonize a large dataset of patients with Schizophrenia and investigate changes across the lifespan (Cetin- . However, the applicability of the RISH method to older individuals exhibiting brain atrophy or to patients with (diffuse) white matter lesions remain unclear.
In this work, we evaluate the RISH harmonization framework in the context of a retrospective multicentre analysis of individuals with brain lesions due to cerebral small vessel disease (SVD). SVD is a leading cause of cognitive impairment and dementia and it is often investigated with dMRI (Baykara et al., 2016;Lyoubi-Idrissi et al., 2017;Wiegertjes et al., 2019). The patterns of diffusion change in SVD are well documented using the diffusion tensor model, with patients typically exhibiting widespread increase of MD, peak width of skeletonized MD (PSMD) and decrease of FA, often related to white matter hyperintensity (WMH) burden (Tuladhar et al., 2015;Van Norden et al., 2012). Hence, this patient population is well-suited to investigate the efficacy of harmonization methods. Using scans from five SVD cohorts acquired on different systems (Philips Healthcare, NL, and Siemens Healthineers, DE) and with different protocols, we aimed to establish if application of the RISH method removes acquisition-related differences in dMRI of elderly subjects, while preserving the sensitivity to disease effects in SVD. Finally, we show proof of concept of how multicentre harmonized data can be pooled to perform robust inference of the relation between WMH burden and dMRI metrics.

Datasets and inclusion criteria
For this retrospective analysis, we obtained scans from five cohorts including healthy elderly subjects and patients with SVD. These cohorts differed in study design and inclusion criteria (described below), comprising four samples with sporadic SVD and one sample with genetically defined SVD (Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy, CADASIL). For the present study, we used a harmonized definition for patients and controls. Patients with sporadic SVD had symptomatic SVD defined as a) history of stroke, with a corresponding small subcortical infarct visible on MRI or b) cognitive complaints and presence of WMH burden on MRI (Fazekas score ≥ 2, Fazekas et al., 1987). The presence of CADASIL was confirmed by molecular genetic testing of the NOTCH3 gene or ultrastructural analysis of a skin biopsy (detection of pathognomonic granular osmiophilic material, Wollenweber et al., 2015). Patients were excluded if they had other major neurological or psychiatric conditions (e.g., multiple sclerosis, epilepsy, Parkinson's disease). Controls had no history of stroke or cognitive complaints for which they sought medical advice, and their MRI did not show signs of lacunes or WMH with Fazekas score ≥ 2. All subjects had a structural MRI (T1-weighted) and a dMRI scan. Characteristics of the study samples included in this study (397 patients and 175 controls) are provided in Tables 1. All studies included in this analysis were approved by the ethics committees of the respective institutions and all participants provided written informed consent.

Utrecht2
A second dataset from the UMC Utrecht consisted of patients (n = 34) and controls (n = 18) from an ongoing prospective observational cohort study Zoom@SVDs (van den Brink et al., 2021). MRI scans were acquired using the same scanner system and acquisition parameters as the Utrecht1 dataset. However, since multiple scanner software and hardware (coil) updates occurred between the two studies, scans from the Zoom@SVDs study are treated as a separate site.

MRI data pre-processing
All datasets were pre-processed using ExploreDTI version 4.8.6  and the Functional Magnetic Resonance Imaging of the Brain (FMRIB) software library (FSL, v6.0.1). Images were corrected for signal drift (Vos et al., 2016) , eddy currents, subject motion with rotation of the B-matrix (Leemans and Jones, 2009) , and susceptibility induced distortions (Veraart et al., 2013) . dMRI data were nonlinearly registered to the T1 and resampled to an isotropic resolution of 2 × 2 × 2 mm 3 and brain masks were generated using brain extraction (BET) tool from FSL. All images were visually inspected to exclude the presence of major artifacts and misregistration. Since the dMRI data were acquired with different b-values in the different cohorts, we adjusted for differences in b-values as part of the harmonization pipeline. We estimated the signal of all dMRI data to a common b-value (b = 1000 s/mm 2 ) using a linear scaling of the signal decay (S/S0) in the logarithmic domain (Jensen et al., 2005;Steven et al., 2014). This signal decay has been validated in phantoms and healthy controls and shown to be robust for datasets with closely spaced b-values centred around b = 1000 s/mm 2 where the diffusion signal is not heavily weighted towards non-Gaussian effects (Magin et al., 2019) . This approach was also utilized by  for the RISH harmonization method. In our case, b-value scaling was applied to Utrecht1, Utrecht2 and Singapore, which had original b-values of 1200 s/mm 2 and 1150 s/mm 2 .
WMH volumes were segmented from the FLAIR images using an automated pipeline (coroflo) and registered the MNI152 template (Kuijf et al., 2019). All volumes were normalized to the percentage of intracranial volume (ICV) of the MNI brain.

Harmonization with RISH features
Harmonization of dMRI with rotation invariant spherical harmonics (RISH) features was first proposed by Mirzaalian et al. 2015, with recent improvements allowing harmonization of scans with different acquisition parameters , which is the case in our study. This type of harmonization is based on the fact that dMRI signal along   unique gradient directions can be represented with a basis of spherical harmonics (SH). From this representation, RISH features that describe different aspects of the signal can be calculated. RISH features can be viewed as the total energy at a specific angular frequency (order) in the SH space. The core assumption of this method is that two groups of healthy subjects matched for age, sex, lesion burden, etc., are expected to have similar diffusion profiles on a group level and thus none of the RISH features should be statistically different between sites. Under this assumption, eventual group differences observed in diffusion measurements such as FA and MD are attributed to scanner-related inconsistencies, as previously shown in healthy controls (Ning et al., 2020). To ensure that the average of RISH features captures site properties on a group level and not characteristics of individuals, a minimum number of training controls (15-20) is required from each site . Subsequently, a scaling is determined between the average of RISH features such that scanner-related differences are removed between sites. This mapping is linear in the SH domain, but non-linear in the original diffusion signal domain. We provide a detailed description of RISH harmonization in the supplementary information, and further theory can also be found in the original method papers Mirzaalian et al., 2015) In short, RISH harmonization pipeline consists of two parts: 1) learning inter-site differences in the form of scale maps between RISH features of the reference and target site (Fig. 1, part 1); and 2) applying the learned scale maps to harmonize all dMRI datasets of the target site ( Fig. 1, part 2). The learning part is performed using age-and sexmatched controls as training data. From the dMRI signal, the RISH features are calculated and registered to a common spatial template generated from training subjects with ANTs (Avants et al., 2010). In the template space, the expected values of RISH features are defined as the sample mean over the number of training subjects and voxel-wise scale maps of RISH features are estimated between the target and reference site. Next, in the application part, the scale maps are warped to the subject space and used to harmonize the SH coefficients of the target site. Finally, the harmonized diffusion signal can be reconstructed.

Experimental design and analysis 2.4.1. Effectiveness of RISH harmonization
Here we assess if acquisition-related differences in diffusion metrics between sites can be removed by RISH harmonization. The first step was to select Training Controls from every cohort that were as similar as possible between sites to minimize sources of variability other than scanner. The Utrecht1 cohort was used as a reference site because the age range of the controls allowed matching with all other sites. This was done on a site-by-site basis, generating four sets of Training Controls with participants from every site matched for age and sex to participants from Utrecht1 (demographics in supplementary Table S1). Tract-basedspatial-statistics (TBSS, Smith et al., 2006) and voxel-based analysis were used to compare the FA and MD between Training Controls of the reference and target sites, before and after harmonization. For the TBSS pipeline, FA and MD were estimated using the diffusion tensor model (dtfit from FSL). Next, FA maps were aligned to the MNI152 template and a white matter skeleton representing the centers of major bundles was computed. Subsequently, FA and MD of the skeleton were compared between reference and target sites in a voxel-wise fashion using t-tests with threshold-free cluster enhancement (5000 permutations). This comparison was also extended to the whole brain to ensure that acquisition-related differences in grey matter regions and other structures are also removed.
We also evaluated the generalizability of effectiveness of harmonization beyond the Training Controls. This was done by creating a group of Validation Controls with data from Utrecht1 (n = 15), Munich (n = 15) and Singapore (n = 15), since those sites had a sufficient number of controls outside the Training Controls to generate separate sets of matched groups (demographics in supplementary Table S2). Similar to the analysis with training controls, TBSS and voxel-based analysis were used to compare validation controls between each target site and the reference, before and after harmonization. Furthermore, one-way ANOVA was performed to compare the average FA and MD of the TBSS skeleton across these three sites.

Sensitivity to disease effects in SVD
For this objective we included all patients from all sites Fig. 1. Harmonization steps using RISH features. Part 1) All scans are pre-processed to correct for artefacts, followed by b-value mapping to a common b-value of 1000 s/mm 2 . Voxel-wised scale maps are computed using a set of Training Controls matched for age and sex between the reference and the target site. Part 2) The scale maps are then applied to harmonize the remaining scans of the target site.
(demographics in Table 1). We made a selection of controls that included both training and validation controls, while ensuring that the average age was similar across sites (demographics in supplementary  Table S3). We matched controls for age across sites in order to have a common reference to calculate effect sizes against patients. Thus, after harmonization controls are expected to have similar diffusion measures while the contrast with their respective patient groups should not be affected. To assess sensitivity to disease effects, general linear model (GLM) adjusted for age and sex was performed to examine the contrast (effect sizes of FA, MD and PSMD from the TBSS skeleton) between patients and controls within each site, before and after harmonization.
In patients, we also explored the sensitivity to disease effects by relating WMH volume with FA, MD and PSMD, before and after harmonization. Linear regression adjusted for age and sex was performed. Since the WMH volumes was non-normally distributed, a Box-Cox transformation was applied (Box & Cox, 1964). We determined the R 2 and standardized regression coefficients (β), before and after harmonization.

Proof of concept of data pooling
We evaluated if disease effects were similar if patients were compared to Internal Controls or to a pooled set of matched controls derived from external centres only. GLM adjusted for age and sex was performed to compare patients from Utrecht1 and from Munich versus External Controls pooled from other sites. We compared effect sizes of FA, MD, PSMD obtained with External Controls to the original effect sizes with Internal Controls.
Finally, we demonstrated proof of concept of data pooling by relating WMH volume with FA, MD and PSDM on pooled data of patients from multiple sites, and compared the fit of the curve before and after harmonization. Similar to the analysis within sites, linear regression was performed. We observed widespread differences in RISH features between the reference and target sites that were dependent on the tissue type. Small differences were observed in regions with prevalently single fiber populations (e.g., corpus callosum) for all orders of RISH features, whereas more peripheral white matter and grey matter regions showed bigger differences across sites (see scale maps). Furthermore, data from Hong Kong, Munich and Singapore sites showed larger differences from the reference (Utrecht1) than Utrecht2.

Effectiveness of RISH harmonization
Before harmonization, significant differences in FA and MD were found between the reference and each target site across the entire brain (Fig. 3), especially for Hong Kong Munich and Singapore (p < 0.05). After harmonization, all significant differences in the white matter skeleton between the target sites and the reference were removed. When analysing the whole brain, FA differences were still seen for the Hong Kong site (p < 0.05), mainly in subcortical grey matter and near tissue interfaces with cerebrospinal fluid, probably due to misregistration (Fig. 3, top right panel). A map of effect sizes further clarifies that differences (positive or negative Cohen's d) are removed after harmonization (i.e., effect sizes become closer to zero, Supplementary Figure S1).
Regarding Validation Controls (Fig. 4), voxel-wise differences in FA and MD in the white matter skeleton and across the whole brain were removed after harmonization, with exception of minor differences at tissue interfaces, probably due to misregistration. When comparing the average FA and MD of the skeleton, significant differences in FA were found across sites before harmonization (F (1,43) = 18.2, p < 0.001, Fig. 5A). All differences in FA were removed after harmonization. The average MD of the skeleton did not differ across sites before or after harmonization (Fig. 5B).  Table 2. Before harmonization, patients had a significantly lower FA (Fig. 6A), higher MD (Fig. 6B) and higher PSMD (Fig. 6C) than controls in all sites except Hong Kong: FA (d = -0.96 to − 2.07, p < 0.001); MD (d = 1.02 to 1.99, p < 0.001); PSMD (d = 0.93 to 1.71, p < 0.001). After harmonization, all effect sizes were preserved, regardless of if they were small (0.2), medium (0.5) or large (0.8). On average, the relative change in effect size from pre-to post-harmonization was 3.9 % (Table 2). Voxelwise analysis of one of the target sizes (Munich) shows that regional differences between patients and controls are preserved after harmonization (Supplementary Figure S2).

Sensitivity to disease effects in SVD
Before harmonization, WMH volume was significantly associated with all dMRI metrics in all sites, FA (R 2 = 0.37 to 0.68, p < 0.001); MD (R 2 = 0.57 to 0.70, p < 0.001); PSMD (R 2 = 0.49 to 0.76; p < 0.001), except Hong Kong where associations MD were not significant (Fig. 7). After harmonization, all associations were preserved regardless of the strength, with R 2 , standardized β coefficients being marginally affected. The relative change in R 2 after harmonization was 2.8%.

Proof of concept of data pooling
When comparing Utrecht1 patients to External Controls before harmonization, differences in FA were close to twice as large as the original effect size obtained with Internal Controls from Utrecht1 (d = -1.87, compared to − 0.96, Fig. 8A). After harmonization, the effect size of FA between Utrecht1 patients and External Controls was more comparable to the original effect sizes (Cohen's d = -1.1, compared to − 0.96). Results were similar when we performed the same analysis using patients from Munich (Fig. 8B)): effect sizes between patients and the External Controls were more similar to the original effect size after harmonizing the data (original effect size: d = -2.07; effect size with External Controls before harmonization d = -1.67, after harmonization d = -2.1). For MD and PSMD, the effect sizes between patients and External Controls were similar to the original effects for both Utrecht1 and Munich, even before harmonization (Fig. 8 C-F).
Regarding associations between WMH and FA on the pooled data before harmonization, FA values from different sites were clustered in separate clouds (Fig. 9A). This non-harmonized data still described a significant association between WMH volume and FA but with weaker correlations than some individual sites due to the clustering effect (R 2 = 0.33; p = 2 × 10 -31 ). After harmonization, data points were more aligned around the fitted curve, with the measurements behaving as a single center data (Fig. 9B). This resulted in stronger associations between WMH volume and FA (R 2 = 0.62; p = 2 × 10 -75 ). For MD (Fig. 9 C-D) and PSMD (Fig. 9 E-F), the clustering of points was less prominent, but associations with with WMH volume also became stronger after harmonization. Before harmonization, MD (R 2 = 0.61; p = 7 × 10 -74 ); PSMD (R 2 = 0.56; p = 6 × 10 -64 ); after harmonization, (MD: R 2 = 0.64; p = 7 × 10 -89 ); PSMD (R 2 = 0.60; p = 5 × 10 -71 ).  Table S2). The yellow-red colormap shows voxels where statistical differences were observed after multiple comparison corrections (p-value < 0.05). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Discussion
We investigated the applicability of RISH harmonization to remove acquisition-related differences in multicentre dMRI of elderly subjects with SVD while preserving disease-related effects. Before harmonization, we observed significant differences in FA and MD across sites, which were removed after harmonization, both in the Training Controls and in Validation Controls not involved in the training step. Importantly, effect sizes of FA, MD and PSMD for group differences between patients and controls as well as for associations with WMH volume within each site were preserved after harmonization. The harmonized controls could be effectively considered as a single-site dataset. The pooled data of patients covered a wide range of WMH burden, allowing to demonstrate a strong relation between WMH volume and dMRI metrics.
The RISH method has been previously implemented using scans of healthy young subjects without apparent brain lesions for the training step (Cetin- Mirzaalian et al., 2015) . Here, we have evaluated whether peculiar characteristics of the elderly brain, such as presence of lesions as WMH brain atrophy and larger ventricles, which are present to some extent even in control subjects, could affect the computation of the scale maps of RISH features. In our study, the selection criteria for controls was based on low burden of SVD. Thus, the training controls were minimally affected by WMH and had relatively similar brain volumes. The largest differences in RISH features were observed in grey matter and peripheral white matter areas, where partial volume effects might play a role on the diffusion profile (Vos et al., 2011(Vos et al., , 2012. After harmonization differences in FA and MD between the target sites and the reference were removed across the brain, except for  Table S3). Results are displayed for the reference site and for each target site before and after harmonization. Corresponding p-values and effect sizes are displayed in Table 2. The dashed line indicates the mean value of controls of the reference site. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Hong Kong where minor differences in MD still persisted in deep grey matter structures and at tissue interfaces with cerebrospinal fluid. This is likely due to residual inaccuracies in image registration or differences in WMH burden and brain volumes in Training controls, given these were not explicitly matched for these markers (Supplementary Figure S7). Accordingly, we suggest that when dealing with data of elderly subjects it might be beneficial to match Training Controls not only for age and sex (Hsu et al., 2008) , but also in terms of WMH lesion distribution, brain volumes or other demographics that contribute for variation in diffusion (e.g., handedness, race, etc., Büchel et al., 2004), although this might be challenging to achieve in practice in most studies. This is particularly important when the inclusion criteria for controls are not based on MRI markers of SVD but rather on variables such as being cognitively healthy. Another important consideration for studies implementing the RISH method is that imaging artefacts specific of one site or few subjects (e.g., ghosting, incomplete fat saturation) might be learned as part of the harmonization features, and propagate into the harmonized dataset. In our study, an example of these artefacts can be seen as rings due to incomplete fat suppression on the L4 scale maps shown in Fig. 2. Nevertheless, their impact on the harmonized data was deemed minimal and it did not significantly affect any subsequent result. The RISH method does not assume that two groups of healthy older subjects are completely identical, but as shown by our work, if groups are matched for major factors, differences in RISH features can be learned on a grouplevel, without major influence of individual properties. This is further supported by our results on the generalizability of RISH harmonization with Validation Controls. We demonstrated not only that dMRI metrics of subjects not involved in the training step are harmonized, but also the transitivity of harmonization, e.g., that the independent harmonization of two target sites to the reference also implies harmonization between target sites.
Next, we demonstrated that RISH harmonization does not affect the sensitivity of dMRI to effects of SVD. Well-known differences in FA, MD and PSDM between patients and controls observed within each site before harmonization were preserved (Baykara et al., 2016;Wiegertjes et al., 2019). Effect sizes were unaffected after harmonization (relative change = 3.9 %.), regardless of whether the magnitude was small, medium or large. We believe such small change in effect size can be deemed negligible, and is likely caused by registration and interpolation inaccuracies . We also repeated the same analysis with the statistical harmonization method ComBat (Fortin et al., 2017) for comparison. RISH harmonization outperformed ComBat in this dataset, which was not able to preserve effect sizes within all sites (see supplementary Information, Part 6). Recent work has shown that application of the RISH method does not alter the relation between dMRI metrics and biological effects such as age-related changes . Here, we extended such finding by showing that RISH harmonization also preserved the relation between WMH volume and dMRI metrics, a well-established relation in this kind of patients (Van Leijsen et al., 2017), regardless of the strength or the sample size available within each site to test such associations.
To date, most studies of SVD with dMRI have been single-site based, and the inclusion of cohorts from other sites, which can differ substantially in terms of acquisition, has been limited to external validation only. Our results with External Controls indeed show that multicentre data without standardized acquisition across centres cannot be simultaneously analysed, as their integration before harmonization would result in biased effect sizes when comparing patients and controls. Effect sizes obtained with External Controls before harmonization were biased up to 1 standard deviation, which is on the same order of magnitude as typical differences between patients and controls. Conversely, after harmonization External Controls behaved as single-site dataset that could be used as reference for patients from all sites with minimal bias in effect sizes. An important implication of this result is that harmonization can potentially address data obsolescence. Diffusion scans are routinely acquired at single-site level for the purpose of testing specific Table 2 Effect sizes of FA, MD and PSMD between patients and controls within sites, before and after harmonization. hypotheses, and discarded afterwards as hardware updates are implemented or the acquisition protocols are adjusted. Being able to account for such differences might allow to include previously acquired data across multiple studies, thus valorising previous investments and reducing the burden of scanning new controls in prospective studies. Another potential benefit of harmonization is in longitudinal studies where upgrades in scanner systems complicate the comparison of data at different time points (Takao et al., 2012). Still, for prospective multicenter studies and especially clinical trials, which typically lack a healthy control group needed for post-hoc data harmonization, standardization of the acquisition should still have a high priority. After establishing that RISH harmonization allows to integrate data from different cohorts, we demonstrated associations between WMH volume and dMRI metrics on the pooled dataset of patients, which resulted into improved statistical power due to the larger sample size.
Since the cohorts included in this study had different disease burdens, the pooled data covered a larger spectrum of WMH volumes, allowing to test associations with more confidence than what the individual sites would allow. Since WMH volume already has strong correlations with dMRI metrics (Bendlin et al., 2010), even the non-harmonized pooled data resulted in significant associations. However, R 2 were lower than some individual sites, showcasing again the risk of performing inferences using non-harmonized data. After harmonization, data points of different sites behaved as a single-center data and more aligned with the fitted curve, resulting in and increase of R 2 . The impact of harmonization is likely to be even more important in the study of other correlates of SVD with more subtle effect sizes than WMH (e.g., relation between dMRI metrics and cognition, Du et al., 2020) or when assessing disease progression over time .

Limitations and future directions
Despite its advantages, RISH harmonization does not come without limitations. To minimize differences in diffusion weighting across sites, we used a linear scaling to map the diffusion signal to a b-value of 1000 s 2 /mm, which is only applicable to a limited range of b-values. For prospective multicentre studies, it is crucial to minimize differences in acquisition parameters such as b-values, but for retrospective studies with already acquired data, harmonization of scans with largely different b-values needs further investigation. As clinical protocols continuously improve and more complex sequences are implemented (e. g., multi-shell data), future work should also investigate whether RISH harmonization is suitable for such advanced dMRI applications in SVD (De Luca et al., 2018;Konieczny et al., 2020;Rydhög et al., 2017). Moreover, since we focused our analyses on the dMRI metrics most commonly associated with SVD (FA, MD, PSMD), further analyses are required in order to generalize our conclusions to other metrics obtained from higher level analysis such as fiber tractography (De Luca et al., 2020) and network theory in SVD (Reijmer et al., 2015).

Conclusions
Despite the limitations, our study is the first to prove the feasibility of RISH harmonization of multicentre dMRI scans in the context of SVD. We showed that harmonizing the raw dMRI signal is effective in removing acquisition-related differences, while preserving the sensitivity to disease effects. This ultimately allowed us to directly pool scans acquired at different sites into a single analysis and increase the power of  supplementary  Table S3). Data from each site is color coded as follows: red = Utrecht1; green = Hong Kong; blue = Munich; black = Utrecht2; magenta = Singapore. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) dMRI inferences. Our work paves the way not only for the validation of dMRI markers of SVD in large scale multicentre studies, but also for studies that aim to answer new research questions where statistical power is critical, such as untangling underlying aetiologies in SVD populations (e.g., free-water imaging, Finsterwalder et al., 2020). When translated to even larger scales, harmonization could significantly improve the sensitivity and specificity for studies attempting to identify tract specific damage and network connections related to cognitive dysfunction .

Data availability statement
The data used in this study is available upon contact and agreement with the respective investigators of each cohort. The harmonization method used in this work was originally developed by Mirzaalian et al., 2015 and the code is publicly available at https://github.com/pnlbwh /dMRIharmonization.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence Fig. 9. Scatter plots of associations between WMH volume, FA (top), MD (middle) and PSMD (bottom) on the pooled data, before and after harmonization. MD and PSMD values are given in mm 2 /s. *p < 0.05 for β coefficients. Data from each site is color coded as follows: red = Utrecht1; green = Hong Kong; blue = Munich; magenta: Singapore. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) the work reported in this paper.