MICRA: Microstructural image compilation with repeated acquisitions

We provide a rich multi-contrast microstructural MRI dataset acquired on an ultra-strong gradient 3T Connectom MRI scanner comprising 5 repeated sets of MRI microstructural contrasts in 6 healthy human participants. The availability of data sets that support comprehensive simultaneous assessment of test-retest reliability of multiple microstructural contrasts (i.e., those derived from advanced diffusion, multi-component relaxometry and quantitative magnetisation transfer MRI) in the same population is extremely limited. This unique dataset is offered to the imaging community as a test-bed resource for conducting specialised analyses that may assist and inform their current and future research. The Microstructural Image Compilation with Repeated Acquisitions (MICRA) dataset includes raw data and computed microstructure maps derived from multi-shell and multi-direction encoded diffusion, multi-component relaxometry and quantitative magnetisation transfer acquisition protocols. Our data demonstrate high reproducibility of several microstructural MRI measures across scan sessions as shown by intra-class correlation coefficients and coefficients of variation. To illustrate a potential use of the MICRA dataset, we computed sample sizes required to provide sufficient statistical power a priori across different white matter pathways and microstructure measures for different statistical comparisons. We also demonstrate whole brain white matter voxel-wise repeatability in several microstructural maps. The MICRA dataset will be of benefit to researchers wishing to conduct similar reliability tests, power estimations or to evaluate the robustness of their own analysis pipelines.


Introduction
The primary aim of this work was to collect, and disseminate to the neuroimaging community, the MICRA (Microstructural Image Compilation with Repeated Acquisitions) data set -a unique and rich multivariate (diffusion, relaxometry, magnetisation transfer) microstructural MRI archive that allows variance and co-variance of measures to be estimated between tracts, between multiple time-points and between different individuals.
To provide just one example of the utility of such a dataset, we present estimates of sample size calculations that could inform current or planned future microstructural imaging experiments. With the movement towards "open science " practices ( Allen and Mehler, 2019 ;Munafò et al., 2017 ), there is increasing demand to demonstrate a priori that study designs are adequately powered to answer a targeted question. In turn, this requires an assessment of test-retest repeatability as input to the sample size estimations. There is also a trend to complement diffusion-based microstructural measurements with additional measures that have enhanced sensitivity to myelin, including those derived from relaxometry and magnetisation transfer-based approaches ( Ercan et al., pute reliability statistics (intra-class correlation and coefficient of variation) across three example tracts and individual measures from each microstructural imaging approach. Although these are illustrative examples, these data could be reprocessed and used to compute other parameters from diffusion, relaxometry and quantitative magnetisation transfer (QMT) models. We provide protocols that can be used for power calculations highlighting the utility of the resource to researchers wishing to conduct similar reliability tests/ power calculations. However, this rich, high quality data resource, acquired on an ultra-strong-gradient Connectom 3T system that may not otherwise be readily accessible, will also be of value to those developing and evaluating new data-processing approaches (e.g., denoising, clustering, segmentation, joint-estimation and tractography algorithms).

Participants
Six neurologically healthy adults (age range 24-30, 3 males and 3 females) were recruited from Cardiff University's staff and student panels. Screening for safety eligibility to undergo MRI scanning was conducted and participants received monetary compensation for participation. All participation was contingent upon prior written informed consent and ethical approval for this study was granted by Cardiff University's School of Psychology ethics committee.

MRI hardware: ultra-strong gradient 3T
Whole brain MRI data were acquired using an ultra-strong gradient (300mT/m) 3T Connectom research only MRI scanner, a modified 3T MAGNETOM Skyra (Siemens Healthcare, Erlangen, Germany). Compared to standard MRI gradients (45-80mT/m), the Connectom gradients allow for shorter diffusion times for a given diffusion weighting resulting in shorter minimum TEs (greater signal to noise ratio) and increased sensitivity to small water displacements Setsompop et al., 2013 ).

MRI data acquisition
Each MRI session lasted approximately 45 min (CHARMED = 18 min, QMT = 12 min, McDESPOT = 11 min) and was repeated 5 times within a two-week period. Care was taken to avoid potential diurnal effects by performing scans for each participant at approximately the same time of day (i.e., within 1-2 h of the same scan start time-of-day).
The MRI protocol included the following sequences: (i) Multi-shell diffusion-weighted MRI: single-shot spin echo, echo planar imaging data were acquired with both anterior-posterior (AP) and posterior-anterior (PA) phase-encoded directions. The APencoded data comprised of two shells of 20 directions (uniformlydistributed according to Jones et al., 1999 ) at b = 200 s/mm 2 and 500 s/mm 2 , one shell of 30 directions at b = 1200 s/mm 2 and three shells of 61 directions at each of b = 2400s/mm 2 , 4000 s/mm 2 and 6000 s/mm 2 , with two leading non-diffusion-weighted images and a further 11 non-diffusion-weighted images, starting at the 33rd volume, and repeating every 20th volume thereafter. In the PA-encoded data, two non-diffusion-weighted images were acquired. The field of view was 220 × 220 mm in plane, the matrix size was 110 × 110 × 66, reconstructed to a 110 × 110 × 66 image resulting in 2 × 2 × 2 mm 3 isotropic voxels. The TR and TE were 3000 ms and 59 ms, respectively (for all b-values), and the diffusion gradient duration and separation were 7 ms and 24 ms, respectively. (ii) Multi-component relaxometry: data were acquired thanks to prototype sequences implementing the McDESPOT protocol ( Deoni et al., 2008 ) ( Cercignani and Alexander, 2006 ).
For McDESPOT data, motion correction was applied to the SPGR and SSFP data using FSL mcFLIRT ( Jenkinson et al., 2002 ) followed by brain extraction ( Smith, 2002 ). The QUIT toolbox ( Wood, 2018 ) was utilised for all subsequent fitting. The DESPOT2-FM model was fitted to estimate a B0 map ( Deoni, 2009 ), which was used as input for a final fitting to the 3-pool mcDESPOT model ( Deoni et al., 2013 ), modelling myelin, extra-cellular and CSF contributions using the 'qimcdespot' function in QUIT.
QMT data were processed using the QUIT ( Wood, 2018 ) toolbox using the Ramani model ( Ramani et al., 2002 ). For QMT, the MT-weighted volumes were aligned to the non-MT contrast for motion correction and bias correction with B1 maps were applied by computing the B1 field correction based on the field estimate from the fifth MT volume, which was subsequently applied to all MT volumes (FSL FAST, Zhang et al., 2001 ).

White matter microstructure measures
The following microstructural measures were computed in each voxel: restricted diffusion signal fraction (RSF) fitted from CHARMED (nonlinear regression routine employing the Levenberg-Marquardt opti- mization algorithm, Assaf et al., 2004 ), fractional anisotropy (FA), mean diffusivity (MD) and radial diffusivity (RD) from diffusion tensor MRI; myelin water fraction (MWF) and longitudinal relaxation rate (R 1 ) from the McDESPOT pipeline ( Deoni and Kolind, 2015 ), the macromolecular proton fraction (MMPF) fitted from the QMT pipeline ( Wood, 2018 ;Ramani et al., 2002 ) and magnetisation transfer ratio (MTR) computed using home-grown code. Quantitative maps were subsequently linearly registered to the space in which the diffusion MRI data were acquired ('native space') using FSL FLIRT (see Fig. 1 for illustration of all maps). Selections of models here are illustrative. The magnitude-reconstructed raw data are also included in the MICRA dataset, enabling researchers to explore other modelling options.

Virtual dissection of tracts
To assess test-retest repeatability, a white matter projection tract (cortico-spinal), association tract (arcuate fasciculus) and the fornix were virtually dissected from whole brain white matter maps for each participant at each time point with probabilistic tractography (MR-Trix iFOD2, 1000 seeds x 5000 streamlines, step size = 0.5 × 2 mm 3 voxel size, angular threshold = 90°x step size/voxel size, fODF threshold = 0.05, Jeurissen et al., 2014 , Fig. 2 ). The fornix was virtually dissected by placing region of interest masks in the anterior hippocampus and fornix body. The CST and the arcuate fasciculus were dissected using TractSeg ( Wasserthal et al., 2018 ) using code available at https:   Calamante et al., 2010 ) of the resultant tracts were computed and thresholded to exclude voxels through which streamlines passed less than 20 percent. As an a priori choice, the analysis was restricted to three tracts in order to show a demonstration of repeatability in one association, one projection and one commissural pathway.

Repeatability at the tract-level
Measures were extracted for each vertex in each streamline and averaged along each tract for statistical comparison. The intra-class correlation coefficient (two-way mixed, absolute agreement) and coefficient of variation were computed for average assessment of test-retest repeatability ( Table 1 ) across all repeated scans.
Moreover, to ascertain whether there was an effect of time on reproducibility (i.e., do those measurements that are more closely-spaced in time agree better than those spaced further apart in time?), intra-class correlation coefficients were also computed for individual time point pairs across all scan sessions.

Repeatability at the voxel level
While our strong preference for microstructural comparisons is to use a 'tractometry'-based approach, ( Bells et al., 2011 ;Chamberland et al., 2019 ) to perform individual/ group microstructural comparisons, we recognise the prevalence of voxel-based analyses. We therefore conducted a separate analysis of the reproducibility of each metric at the voxel-level within white matter across the whole brain. This was done by adopting the white matter skeletonisation approach popularised in the TBSS (Tract-Based Spatial Statistics, Smith et al., 2006 , framework, part of FSL ( Smith et al., 2004 ). First, and as above, the MMPF, R 1 , MTR and MWF maps for a given participant and timepoint were first registered to the individual's native diffusion space using FLIRT ( Jenkinson and Smith, 2001 ).
Then, FA maps from all subjects at all time-points were aligned into a common space using the nonlinear registration tool FNIRT ( Andersson et al., 2007 ;Andersson et al., 2007 ) which uses a b-spline representation of the registration warp field ( Rueckert, 1999 ). Next, the mean FA image was created and thinned to create a mean FA skeleton which represents the centres of all tracts common to the group. Each subject's aligned FA map was then projected onto this skeleton in MNI space. The nonlinear warps and skeleton projections generated for FA were applied to the corresponding non-FA maps (already in diffusion space) to create white matter skeletons in MNI space for these additional metrics. Prior to analysis, a further thresholding step was applied. Specifically, each voxel in the skeletonised data was only retained for further analysis if, in that voxel, all 5 participants at all six time-points had an FA > 0.2. This was to provide enhanced assurance that the analysis was restricted to white matter. For each metric, the Pearson correlation was then conducted across all voxels in the thresholded skeleton between each possible pair of time-points to assess the repeatability across whole brain white matter.

Demonstration of sample size estimation
To further illustrate the utility of the reproducibility data, we consider the minimum number of subjects needed to reach statistical power of 0.8 and significance of = 0.05 for two different types of statistical tests routinely carried out in the neuroimaging literature, as outlined below (power calculations were computed using G * Power, see Supplementary 2, Faul et al., 2009 ): (i) Independent groups t-test (e.g., for comparing a group of patients to a group of healthy controls). Here, we evaluate the numbers Fig. 3. Intra-class correlation coefficients (two-way mixed, absolute agreement) for test-retest repeatability of microstructure measures measured 5 times in 6 participants. ICC = intra-class correlation, CST = corticospinal tract, FA = fractional anisotropy, MD = mean diffusivity, RD = radial diffusivity, RSF = restricted diffusion signal fraction, MMPF = macromolecular proton fraction, MWF = myelin water fraction, R 1 = longitudinal relaxation rate, MTR = magnetisation transfer ratio. of subjects needed to detect a 1% and 5% group difference in each microstructural metric. This was done for all measures and tracts according to means and standard deviations presented in Table 1 ( Fig. 4 ). The minimum n was computed by inputting the percentage change from the mean and standard deviations into G * Power.
(ii) Group (2) x Time (2) between-within groups ANOVA (e.g. , for showing that there is a difference in the longitudinal evolution of a metric between two groups). This was estimated across all measures and tracts at small, medium and large effect sizes ( Fig. 5 ). Pearson correlation coefficients were used to account for the correlation amongst repeated measures for sample size estimation ( Table 1 ).

Results
Microstructural maps computed for one representative participant are presented in Fig. 1 . Fig. 2 shows a typical reconstruction of the fornix, arcuate fasciculus and cortico-spinal tracts, which were successfully dissected bilaterally for each MRI session for each participant. For one participant, calculation of a robust estimate of MWF failed for one session, while for another participant, calculation of the MMPF was not robust in one session. Thus, these values were not included in the analyses.

Repeatability at the tract level
The coefficients of variation (CV) were overall low, ( Table 1 ), ranging from 0.2 to 4.2%. Intra-class correlations ranged from 0.78 to 0.98 with all demonstrating a high degree of repeatability ( Table 1 , Fig. 3 ). Estimated sample sizes for an independent groups t -test to measure a 1% and 5% group difference are presented in Fig. 6 and for a 2 × 2 betweenwithin ANOVA to measure small, medium and large effect sizes in Fig. 7 .
The CV values in Table 1 represent the averaged within-subject coefficients of variation. ICC values presented in Table 1 and Fig. 3 represent the two-way mixed effects, absolute agreement with multiple measurements ( Mcgraw and Wong, 1996 ) Fig. 4 shows the analysis of repeatability at the voxel-level, for all voxels on the white matter skeleton (see Methods). For each metric, voxelwise whole brain white matter Pearson correlations are presented between individual time points for voxels pooled across all subjects, with a colour scale denoting the number of voxels in each joint histogram bin. Additionally, univariate histograms show the distributions of voxels of a given metric across all voxels.

Repeatability at the voxel level
In terms of the Pearson correlation coefficient, the most reproducible metric is FA with all pair-wise r > 0.95 (which is unsurprising given that the FA was used to drive the skeletonisation process). The heatmap representation of the joint histograms show that, despite some considerable scatter, the vast majority of data points lie along the line of identity. For the other metrics: R 1 ( r > 0.93) shows superb reproducibility. RSF ( r > 0.85), MMPF ( r > 0.84), MTR ( r > 0.81), MWF ( r > 0.88) and RD ( r > 0.84) also show good performance. The lowest reproducibility is for MD ( r > 0.62).
To further ascertain whether there was an effect of time on reproducibility (i.e., do those measurements that are more closely-spaced in time agree better than those spaced further apart?), intra-class correlation coefficients were computed for individual time point pairs across all scan sessions ( Fig. 5 ).

Demonstration of sample size estimation
Returning to the tract-based estimates, Fig. 6 shows estimated sample sizes for statistical designs required to reach a power of 0.8 and significance of 0.05 for independent groups t -test in the fornix, corticospinal tract and arcuate fasciculus for the different metrics. Clearly the number of subjects required varies by an order of magnitude depending on which pathway is examined and which metric is used. A similar het-   6. Estimated sample sizes for statistical designs required to reach a power of 0.8 and significance of 0.05 for independent groups t -test in three white matter tracts across several microstructure measures. Sample sizes were estimated for 1% and 5% group differences according to means and standard deviations presented in Table 1 . The standard deviations were assumed to be constant across groups. erogeneity of required sample sizes is seen when powering for ANOVA analyses ( Fig. 7 ).

Discussion
We present this paper as an introduction to MICRA -a multi-variate microstructural dataset collected on an ultra-strong gradient Connectom 3T MRI scanner. We offer the MRI community access to this MRI archive as a test-bed for conducting specialised analyses where access to repeated measures of multi-contrast MRI data may help to inform current and future research. As a demonstration of a possible application of our MRI archive, we explored the reproducibility of microstructural measures, including intra-class correlations and coefficients of variation, across multiple white matter tracts. Additionally, we presented estimates of samples sizes required for an independent groups t -test and a Group(2) x Time(2) ANOVA to reach statistical power of 0.8 and significance of = 0.05 for various effect sizes across white matter measures and tracts.
Virtual dissections performed with probabilistic tractography (iFOD2, Jeurissen et al., 2014 , MRTrix) demonstrated the fornix, the arcuate fasciculus and cortico-spinal tracts in all six participants and were replicated for all of the five repeated MRI scans. The overall low coefficients of variation within participants and the high correlations among repeated measures suggest a high degree of consistency of microstructure measures across repeated tracts and scans.
Sample size estimations performed for an independent group comparison ( t -test) across microstructure measures and tracts demonstrated similar patterns of required sample sizes for 1% and 5% increase changes, with expectedly larger samples required for demonstrations of 1% change. The differences in the standard deviation of measures are reflected in the different sample sizes required. Notably, MWF in the fornix and in the cortico-spinal tract, and MMPF in the Arcuate Fasciculus demonstrated the largest sample sizes required. Conversely, the RSF, MTR, RD and R 1 in the fornix, the MTR, FA, MD, RD and R 1 in the cortico-spinal tract, and the RD, MTR and R 1 in the arcuate fasciculus demonstrated the smallest required sample sizes. Power analyses to estimate sample sizes for a 2 × 2 between-within ANOVA demonstrated that the measures showed a similar pattern for sample size requirement for the fornix and the cortico-spinal tract. In these tracts, measures requiring the smallest sample size were the MWF and R 1 . Diffusion measures (FA, RD, MD) and the MMPF required larger sample sizes, whereas the MTR and the RSF required the largest sample sizes to reach a given effect size. In contrast, the arcuate fasciculus demonstrated a pattern in which the diffusion measure required larger sample sizes compared to R 1 and MTR, with MWF requiring the smallest sample size.
To conclude, we present a rich multivariate archive of microstructural MRI data acquired on a Connectom 3T MRI scanner. It is important for researchers to take into consideration that the reproducibility statistics reported here are directly applicable only to scans and analyses that follow conditions unique to the present study conducted on a high gradient Connectom MRI scanner. Although this is unique to the present study, the Connectom-acquired diffusion data offers the highest quality diffusion data available, offering researchers an indication of what might be possible in a 'best case scenario'.
Data from this study demonstrate that microstructure measures derived from multi-shell diffusion, multi-component relaxometry and quantitative magnetisation transfer acquired on an ultra-strong gradient 3T MRI scanner are reliable as demonstrated by low coefficients of variation and high intra-class correlation coefficients across measures and tracts. Fig. 7. Estimated sample sizes for statistical designs to reach a power of 0.8 and of 0.05 in three white matter tracts for a Group (2) x Time Point (2) ANOVA. Pearson correlations between all 5 sessions were averaged by transformation to Fischer's Z ( Fisher, 1915 ) to obtain an average correlation among repeated coefficients for each metric ( Table 1 ). Correlation coefficients were used to estimate required sample sizes for each metric.

Data access
Raw data and processed maps are available on the Open Science Framework -please visit https://osf.io/z3mkn/ . See Supplementary 1 for outline of data included.
(680-50-1527) from the Netherlands Organisation for Scientific Research (NWO). ER was supported by a Marshall-Sherfield Fellowship from the Marshall Aid Commemoration Commission. We would like to sincerely thank Thomas Witzel for his help in establishing the diffusion MRI sequence on the Connectom and Tobias Wood for his help with the QUIT Toolbox. Finally, we would like to thank each participant for their invaluable contribution provided by taking part in this study.

Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.neuroimage.2020.117406 .