Effects of MP2RAGE B1 + sensitivity on inter-site T1 reproducibility and hippocampal morphometry at 7T

Most neuroanatomical studies are based on T1-weighted MR images, whose intensity profiles are not solely determined by the tissue's longitudinal relaxation times (T1), but also affected by varying non-T1 contributions, hampering data reproducibility. In contrast, quantitative imaging using the MP2RAGE sequence, for example, allows direct characterization of the brain based on the tissue property of interest. Combined with 7 Tesla (7T) MRI, this offers unique opportunities to obtain robust high-resolution brain data characterized by a high reproducibility, sensitivity and specificity. However, specific MP2RAGE parameter choices - e.g., to emphasize intracortical myelin-dependent contrast variations - can substantially impact image quality and cortical analyses through remnants of B1+-related intensity variations, as illustrated in our previous work. To follow up on this: we (1) validate this protocol effect using a dataset acquired with a particularly B1+ insensitive set of MP2RAGE parameters combined with parallel transmission excitation; and (2) extend our analyses to evaluate the effects on hippocampal morphometry. The latter remained unexplored initially, but can provide important insights related to generalizability and reproducibility of neurodegenerative research using 7T MRI. We confirm that B1+ inhomogeneities have a considerably variable effect on cortical T1 estimates, as well as on hippocampal morphometry depending on the MP2RAGE setup. While T1 differed substantially across datasets initially, we show the inter-site T1 comparability improves after correcting for the spatially varying B1+ field using a separately acquired Sa2RAGE B1+ map. Finally, removal of B1+ residuals affects hippocampal volumetry and boundary definitions, particularly near structures characterized by strong intensity changes (e.g. cerebral spinal fluid). Taken together, we show that the choice of MP2RAGE parameters can impact T1 comparability across sites and present evidence that hippocampal segmentation results are modulated by B1+ inhomogeneities. This calls for careful (1) consideration of sequence parameters when setting acquisition protocols, as well as (2) acquisition of a B1+ map to correct MP2RAGE data for potential B1+ variations to allow comparison across datasets.


a b s t r a c t
Most neuroanatomical studies are based on T 1 -weighted MR images, whose intensity profiles are not solely determined by the tissue's longitudinal relaxation times (T 1 ), but also affected by varying non-T 1 contributions, hampering data reproducibility. In contrast, quantitative imaging using the MP2RAGE sequence, for example, allows direct characterization of the brain based on the tissue property of interest. Combined with 7 Tesla (7T) MRI, this offers unique opportunities to obtain robust high-resolution brain data characterized by a high reproducibility, sensitivity and specificity. However, specific MP2RAGE parameter choices -e.g., to emphasize intracortical myelin-dependent contrast variations -can substantially impact image quality and cortical analyses through remnants of B 1 + -related intensity variations, as illustrated in our previous work. To follow up on this: we (1) validate this protocol effect using a dataset acquired with a particularly B 1 + insensitive set of MP2RAGE parameters combined with parallel transmission excitation; and (2) extend our analyses to evaluate the effects on hippocampal morphometry. The latter remained unexplored initially, but can provide important insights related to generalizability and reproducibility of neurodegenerative research using 7T MRI. We confirm that B 1 + inhomogeneities have a considerably variable effect on cortical T 1 estimates, as well as on hippocampal morphometry depending on the MP2RAGE setup. While T 1 differed substantially across datasets initially, we show the inter-site T 1 comparability improves after correcting for the spatially varying B 1 + field using a separately acquired Sa2RAGE B 1 + map. Finally, removal of B 1 + residuals affects hippocampal volumetry and boundary definitions, particularly near structures characterized by strong intensity changes (e.g. cerebral spinal fluid). Taken together, we show that the choice of MP2RAGE parameters can impact T 1 comparability across sites and present evidence that hippocampal segmentation results are modulated by B 1 + inhomogeneities. This calls for careful (1) consideration of sequence parameters when setting acquisition protocols, as well as (2) acquisition of a B 1 + map to correct MP2RAGE data for potential B 1 + variations to allow comparison across datasets.

Introduction
Magnetic resonance imaging (MRI) at 7 Tesla (7T) allows characterization of the brain with a level of detail that cannot readily be obtained at lower field strengths without increasing scan time ( U ğurbil, 2018 ). But despite its promises, several data quality issues, limiting data interpretation and reproducibility, are still hindering complete acceptance of 7T MRI in clinical practice. Quality assessment, standardization and individual institutions, as well as those focusing on rare diseases, will benefit by allowing data pooling across multiple imaging sites.
In essence, to improve reproducibility, sequences and corresponding parameters need to be chosen in such a way that they provide robust data characterized by comparable temporal and spatial signalto-noise (SNR) and contrast-to-noise (CNR) ratios, as well as intensity profiles, independent of acquisition site, scanner vendor and/or time point ( Voelker et al., 2016 ). Quantitative MRI (ideally) overcomes potential, non-biochemical inter-site, intra-subject biases that are present in weighted MRI data ( Haast et al., 2016 ;Okubo et al., 2016 ;Weiskopf et al., 2013 ), which hinder the direct comparison across studies and between patients and healthy controls. There are numerous options for quantitative imaging in terms of sequences, depending on the tissue property of interest. The MP2RAGE (magnetization -prepared two rapid gradient echo) sequence gained significant popularity during the last decade ( Marques et al., 2010 ). It is widely used for anatomical imaging as it provides a synthesized T 1 -weighted (T 1 w) image (i.e. derived from two inversion images) and allows quantification of the longitudinal relaxation time (T 1 ), ideally free of T 2 * , M 0 and B 1 − effects. These images support analyses using conventional analysis tools, such as FreeSurfer ( Fischl, 2012 ) or FSL-FIRST ( Patenaude et al., 2011 ), for assessment of the brain's morphology, along with T 1 relaxometry to study biochemical (mostly myelin-dependent, Stüber et al. (2014) ) changes due to learning, aging and/or disease.
While the MP2RAGE approach eliminates potential biases present in non-quantitative imaging methods, residual biases related to the radio frequency transmit field (B 1 + ) may persist. Importantly, when setting up an imaging protocol, MP2RAGE parameters can be set to render images minimally sensitive to spatial variations in B 1 + efficiency or to sensitize images for (e.g., myelination-dependent) contrast variations across subcortical and/or cortical tissue while risking B 1 + residues ( Forstmann et al., 2014 ;Keuken et al., 2017 ). Earlier work on a similar two-contrast, unbiased 3D T 1 w sequence emphasized the residual impact of B 1 + and its strong dependency on the two small flip angle values ( Van de Moortele et al., 2009 ). These B 1 + variations -too strong to compensate for using dielectric pads ( Teeuwisse et al., 2012 ) -introduce image artifacts that hamper accurate T 1 estimation and subsequent analyses in the affected regions. For example, we have established earlier that the accuracy of automatic cortical thickness estimation using FreeSurfer based on 7T MP2RAGE data suffers from severe local B 1 + field inhomogeneity effects near inferior temporal and frontal lobes . As a result, tedious manual corrections of the automatic image segmentations would be necessary to correct for these tissue classification errors. Post-hoc removal of the residual B 1 + inhomogeneities using a separately acquired B 1 + map ( Eggenschwiler et al., 2012 ;Marques and Gruetter, 2013) , showed to be capable to reduce the thickness quantification errors substantially by optimizing cortical T 1 estimates. However, it remained unclear how these cortical T 1 measures directly compared against an independent dataset obtained within a setting that would provide MP2RAGE images already minimally sensitive to B 1 + variations. Therefore, in the current work, we first aim to compare our results using an MP2RAGE dataset acquired at a separate site ( Lau et al., 2020 ) to evaluate potential biases in cortical T 1 between datasets (i.e., inter-site comparison) because of differences in MP2RAGE B 1 + sensitivity. These data were acquired using sequence parameters more closely matching those proposed in Marques and Gruetter (2013) , lowering B 1 + -sensitivity, and a parallel transmit head coil to increase B 1 + homogeneity.
In addition, previous analyses were restricted to cortical gray matter (GM) and did not investigate potential B 1 + effects on non-cortical GM delineation, such as the hippocampus and basal ganglia. In particular, the hippocampus is one of the most studied structures of the brain and plays an important role in the functioning of the brain's learning and memory system ( Small et al., 2011 ). Morphological (e.g., volume and shape) changes of the hippocampus are well-established features across multiple neurological ( Duncan et al., 2013 ;Jack et al., 2011 ) and neu-ropsychiatric diseases ( Geuze et al., 2005 ). For example, hippocampal volume is an important biomarker for Alzheimer's disease to predict the patient's neurocognitive trajectory ( Davatzikos et al., 2011 ). Lately, more precise assessment of hippocampal changes have become possible with the growing number of neuroscience centers equipped with 7T scanners ( de Flores et al., 2015 ). Therefore, insight into the magnitude of volume and shape differences induced by B 1 + -inhomogeneities may have important implications for the reproducibility and interpretation of hippocampal research. So, finally, we aim to characterize the effect of the residual B 1 + field on FreeSurfer's hippocampal segmentations by analyzing volume and shape differences between original and the post-hoc corrected data.

Subject recruitment
A total of 64 healthy subjects were included in this study. Subjects were recruited from two separate acquisition sites after providing written informed consent in accordance with the Declaration of Helsinki. For both acquisition sites, ethical approval for the experimental procedures was provided by their institutional ethics review boards (i.e., Faculty of Psychology and Neuroscience, Maastricht University, the Netherlands, and Health Sciences Research Ethics Board of Western University, London, Canada), respectively. For the following sections, we refer to these datasets as the 'Maastricht' ( N = 32, age = 46.59 ± 13.12, between 20 and 69 years old, 15 males) and 'London' ( N = 32, age = 46.22 ± 13.33, between 20 and 70 years old, 20 males) dataset.

MRI acquisition
MR images from both acquisition sites were acquired on a Siemens 7T scanner (Siemens Healthineers, Erlangen, Germany), but differed in their gradient system type (i.e., head-only vs. whole-body), as well as RF head coil (see Table 1 ). Sub-millimeter MP2RAGE anatomical (0.7 mm isotropic nominal voxel size), as well as lower resolution Sa2RAGE (2 mm isotropic nominal voxel size) data were acquired to quantify T 1 and map B 1 + (see Supplementary Fig. 1 for an example B 1 + map for each acquisition site) across the brain. The time resampled frequency offset compensated inversion (TR-FOCI) pulse was implemented for the MP2RAGE sequence at both acquisition sites to improve inversion efficiency and T 1 quantification ( Hurley et al., 2010 ). See Table 1 for further details on the acquisition set up and sequence parameters.
For the parallel transmit (pTx) system, mapping of the default excitation mode (B 1 + ) was performed with Actual Flip Angle Imaging (AFI) with optimized RF and gradient spoiling ( : 70°, TR 1 /TR 2 : 30/150 ms, resolution: 3.75 mm isotropic, matrix: 64 × 64 × 48, orientation: sagittal, partial Fourier sampling: 6/8 in both phase-encoding directions, Yarnykh (2007) and Nehrke (2009) ). This was complemented by low flip-angle GRE images of the same geometry using the Fourier encoding scheme with a TR of 7 ms in order to generate absolute calibrated flip angle maps ( Brunner and Pruessmann, 2009 ;Nehrke and Börnert, 2008 ;Tse et al., 2014 ). To simultaneously obtain a B 0 map, the AFI sequence was acquired with five echoes (echo-times: 1.9, 3.4, 4.9, 6.3, and 7.8 ms, total acquisition time: 5 min). Shimming of the transmit field was accomplished using a magnitude least squares optimization of the field intensity over the specified adjustment volume using the calibrated flip angle maps and optimizing using only the phase of each transmit channel. As part of this procedure, reference voltage was adjusted so that the desired flip angle was equivalent to the 90th percentile of the measured flip angle distribution ( Curtis et al., 2012 ). For the single channel system, B 0 maps were acquired during the prescan provided by the vendor with two echoes and a fixed reference voltage across subjects.

Data pre-processing
Before the B 1 + correction, datasets from both acquisition sites were pre-processed as described in Haast et al. (2018) . This included brain extraction using an optimized skull-stripping workflow, and coregistration of the Sa2RAGE to the MP2RAGE data (also part of Haast (2019) ). After coregistration, MP2RAGE data were corrected for B 1 + inhomogeneities as described in the original paper ( Marques and Gruetter, 2013 ) resulting in 'corrected' T 1 w (i.e., UNI) and quantitative T 1 maps. In the following sections, we will refer to this dataset as 'corrected', while 'original' data denotes the uncorrected dataset. Numerical simulations using Bloch equations with the site-specific sequence parameters demonstrate the inter-site difference in the B 1 + dependence of the T 1 maps ( Fig. 1 ). See Supplementary Data (Section 1) for additional simulations highlighting T 1 error (%) and contrast-to-noise ratio (CNR) as function of B 1 + and flip angle, for both sites' set of MP2RAGE parameters.

Cortical segmentation analysis
The pre-processed MP2RAGE T 1 w images were used as input for the sub-millimeter longitudinal processing workflow implemented in the FreeSurfer (v6.0, http://surfer.nmr.mgh.harvard.edu/ ) image analysis suite to obtain brain tissue segmentations and white matter (WM) and pial surfaces reconstructions ( Dale et al., 1999 ;Reuter et al., 2012 ). Longitudinal analyses of the data based on either the original or B 1 +corrected MP2RAGE T 1 w images were necessary to allow direct (i.e., vertex -by -vertex) comparison of cortical T 1 surface maps and hippocampal segmentations between the surface reconstructions and cortical thickness surface metric. See Reuter and Fishl (2011) for more details. Reconstructed cortical surfaces for both acquisition sites were processed as described in the 'postprocessing pipeline' in Haast et al. (2018) to quantify cortical T 1 differences by comparing: (1) original vs. corrected dataset, as well as (2) Maastricht vs. London acquisition sites.

Subcortical segmentation analysis
In addition, FreeSurfer's subcortical (i.e., 'aseg') output ( Fischl et al., 2002 ), pooled from both acquisition sites, were used to study hippocampal segmentation differences between original and corrected data, as well as acquisition sites. In addition to the aseg output, we also included the labels obtained by running the automatic hippocampal subfield segmentation implemented in FreeSurfer ( Iglesias et al., 2015 ). Segmentations were compared before and after B 1 + correction. This was based on the volumetric labels using total volume (in mm 3 ), label overlap (i.e., Dice) and distance between label boundaries (i.e., Hausdorff distance) as in Gulban et al. (2018) , and in surface space, based on shape differ-

Fig. 2. Cortical and subcortical analysis pipeline.
For each subject, MP2RAGE T 1 w and T 1 maps (A) were corrected for B 1 + homogeneities using the coregistered Sa2RAGE B 1 + map following the procedure described in Marques and Gruetter (2013) (B). Skull-stripped original and corrected T 1 w volumes were then used as a single data (i.e., 'time') point for FreeSurfer's longitudinal analysis pipeline to reconstruct cortical surfaces with matching topology (C). Differences in cortical T 1 between original and corrected datasets were calculated as described in Haast et al. (2018) (D). In addition, morphometry was performed using the LDDMM algorithm ( Beg et al., 2005;Khan et al., 2019 ) to quantify differences in subcortical segmentation after B 1 + correction for each ROI (E). ences using large deformation diffeomorphic metric mapping (LDDMM) as in Khan et al. (2019) . See also Fig. 2 for a schematic overview of the processing workflow.

Volumetric assessment
First, we estimated for each subject within the Maastricht and London datasets their expected subcortical volumes, based on age, gender, estimated total intracranial volume and the scanner characteristics using the model presented in Potvin et al. (2016) . Their averages were then used as estimates for FreeSurfer's hippocampal volume output. Second, to assess the correspondence between segmentation labels in terms of global shape and boundaries, we used the Dice coefficient and Hausdorff distance, respectively. The Dice coefficient is a common metric to quantify volumetric correspondence between two segmentation labels -the original and corrected data, in our case -and a Dice score of 1 indicates perfect overlap ( Taha and Hanbury,2015 ). The Hausdorff distance score is a distance metric sensitive to boundary errors and thus can be used to quantify the similarity between the two boundaries. Here, a Hausdorff distance represents the average number of voxels by which the two boundaries deviate from one another Taha and Hanbury (2015) . Both metrics have been used as implemented in the Nilearn package (v.0.5.0, Abraham et al. (2014) )

Surface-based assessment
Surface-based comparisons were performed following the procedure described previously ( Khan et al., 2019 ) and using openly available image processing scripts developed in-house ( https://github.com/ khanlab/surfmorph ). Fuzzy labels for each of the subject's hippocampal labels (ROIs) were obtained by smoothing the binary ROI label with a 1 × 1 × 1 mm kernel size. Here, left and right hemispheres were combined into a single volume and treated as a single label. The fuzzy (i.e., smoothed) labels for each of the ROIs, for each subject, were transformed to MNI space ( Fonov et al., 2011 ) using linear transformations based on the subject's corrected MP2RAGE T 1 w to MNI volume transformation. As for FreeSurfer's longitudinal workflow, these linearly aligned labels were used to generate unbiased averages for surface generation. These averages were computed by iterating through steps of (1) template generation by averaging across subjects, and (2) registration of each segmentation image to this template using LDDMM registration ( Beg et al., 2005 ). The resulting fuzzy segmentation was then used to generate the ROI's template surface through a 50% probability isosurface. The 3D volume of the ROI's template was then fit to each subject's segmentation using LDDMM, with affine initialization to provide vertex-wise correspondence between all surfaces of that specific ROI. The template surface was then propagated to each subject's ROI, to provide surfaces with common indices for performing vertex-by-vertex morphometry analyses and mapping of tissue contrast near the vertex' positions.

Generation of subcortical surface maps
First, to allow shape analyses, in-/outward displacements at each vertex location were computed between the template surface and the injected subject surface, using the projection along the surface normal. Importantly, the mean displacement across a spherical neighbourhood (10 mm radius) was computed for each vertex and subtracted from the local vertex-wise displacement. This effectively ensures local displacements are not affected by residual positional differences that could remain after the linear alignment between template and subject.
Second, to obtain rough estimates of contrast changes near the ROI boundaries that may affect automatic segmentation, and thus surface placement, gradient magnitude maps were computed for the original and corrected FreeSurfer white matter normalized (i.e., 'T1.mgz') input. This was done using the ' -volume-gradient ' function within the Connectome Workbench command-line tool ( Marcus et al., 2011 ). The local change in gradient magnitude (i.e., corrected − original gradient maps) were then sampled at the original vertices' coordinates and smoothed using the structure's surface geometry (i.e., across neighbors). Resulting maps were added as scalar data to the surface meshes VTK files for visualization and vertex-wise analyses.
Finally, the minimal geometrical distance (in mm) for each vertex on the ROI's template surface to the CSF based on MNI's CSF probability map was calculated for follow-up analyses. Code for running the analyses described across Sections 2.3 -2.5 are available at https: //github.com/royhaast/b1corr-smk .

Statistical analysis
Vertex-wise cortical T 1 was compared between acquisition sites using the SurfStat toolbox ( http://www.math.mcgill.ca/keith/surfstat/ ) for Matlab (R2018b, The Mathworks, Natick, MA, USA). Here, vertexwise T-tests were used for identification of vertices where T 1 significantly varied between sites, while correcting for age and sex. Similarly, hippocampal vertices were identified that had larger absolute surface displacement, and gradient change for the Maastricht dataset. Resulting statistical maps were corrected for multiple comparisons using random field theory for non-isotropic images ( Worsley et al., 1999 ) and mapped as scalar data to the surface meshes VTK files.
For analyses of hippocampal volume data, we used a mixed model analysis of variance (ANOVA) to test the main effect of B 1 + correction (within-subject) on volume (in mm 3 ) and related volumetric metrics (i.e., Dice coefficient and Hausdorff distance). Potential differences between acquisition sites (between-subjects) and/or interactions with the main effect were statistically tested by including them in the mixed ANOVA model.

Inter-site comparison of cortical T 1
Site-averaged cortical T 1 data are displayed in Fig. 3 . Identical data scaling across datasets were used as much as possible for inter-site com-parison purposes. A pronounced discrepancy in T 1 can be observed between the Maastricht and London data sets with average (and median) cortical T 1 of 1878.05 (1915.34) vs 1627.67 (1705.12) msec, respectively, based on their histograms ( Fig. 3 A), as well as cortical patterns ( Fig. 3 B). In line with the numerical simulations, Fig. 3 C highlights a stronger effect of the B 1 + correction for the Maastricht data: − 206.70 ± 180.51 ( − 191.05 median) vs 0.37 ± 24.70 ( − 5.82) msec for the London data. Fig. 4 demonstrates the increased similarity between acquisition sites, improving from an average (and median) inter-site T 1 difference (i.e., Maastricht − London, A) of 250.31 (224.11) msec before to 43.31 (36.78) msec after B 1 + correction. Initially, significant biases with higher T 1 in the temporal and frontal lobes (i.e., low B 1 + regions, see Supplementary Fig. 1) are observed in the original Maastricht data ( Fig. 4 B). However, differences between acquisition sites become more homogeneous, and centered more closely around 0 msec, after correcting the MP2RAGE data for B 1 + inhomogeneities based on the histograms and statistical surface maps.  Hausdorff distance (bottom) for both acquisition sites (x-axis), and right (green) and left (orange) hemispheres. Box and whisker extent demarcate interquartile ranges and distribution (excluding outliers), respectively, while diamonds and dots represent group means and individual subjects data, respectively. Fig. 5 shows a comparison of the hippocampal aseg labels after running the original and corrected MP2RAGE T 1 w image through the longitudinal FreeSurfer pipeline. Visual inspection of the labels' reconstructed surface mesh boundaries ( Fig. 5 A) shows local differences in surface placement between the original and corrected labels (i.e., yellow vs. red). Note that surfaces are taken from a single subject within the Maastricht dataset, as -in line with the observations in the cortex -these subjects were characterized by larger differences in hippocampal volumes as well. Clear examples of in-and outward displacement of label boundaries near the head and tail are indicated using dotted  (orange) hemispheres. Box and whisker extent demarcate interquartile ranges and distribution (excluding outliers), respectively, while diamonds and dots represent group means and individual subjects data, respectively. and solid arrows, respectively, for this particular example. We quantified the label overlap and label boundary distances using the Dice and Hausdorff scores for all subjects (see Fig. 5 B). These reveal significantly larger overlap ( F (1,62) = 100.34, p < .001), and smaller boundary distances ( F (1,62) = 42.32, p < .001) between original and corrected labels for the London data. Also, we observe that changes after B 1 + correction are stronger for the right hemisphere labels ( F (1,62) = 40.87, p < .001), as indicated by the lower Dice coefficient scores compared to the left hemisphere across both acquisition sites. However, this trend varies across sites for the average Hausdorff distances between label boundaries ( F (1,62) = 8.61, p < .01). Fig. 6 A quantifies hippocampal volume (in mm 3 ) across sites (columns), for original and corrected data (x-axes) and per hemisphere (green vs. orange) separately. Averaged across hemispheres, and before B 1 + correction, significant lower hippocampal volumes ( F (2,60) = 6.12, p < .005, corrected for age and sex) are observed for the Maastricht dataset (3251.16 ± 324.97 mm 3 ) compared to the London dataset (3416.79 ± 385.27 mm 3 ). The latter also aligning more closely with the 'expected' 1 volumes (horizontal dashed lines at 3641.46 ± 156.52 mm 3 , Fig. 6 A). In line with the changes in cortical T 1 , B 1 + correction has a smaller effect on global hippocampal volume in the London dataset, while larger changes are observed for the Maastricht data ( F (1,62) = 4.96, p < .05, see Fig. 6 B). This is especially after taking into account differences across left and right hemispheres as indicated by a significant interaction effect ( F (1,62) = 6.09, p < .05), see Fig. 6 B. This corresponds with the lower Dice scores for the right hemisphere (see Fig. 5 B). . Variations in B 1 + have a smaller impact on the automatic anal- ysis of hippocampal subfields, but see Supplementary Data (Section 2) for an evaluation of these results. While the data shown so far allowed us to compare the global changes in hippocampal volume after B 1 + correction, between hemispheres and acquisition sites, the surface displacement measures shown in Fig. 7 allow us to quantify and localize the changes in label boundaries placement more precisely. As these surface displacements were computed in MNI space, we were able to perform group-wise comparisons. After averaging across subjects for each acquisition site, the extent of surface displacement is clearly larger in the Maastricht dataset (s.d. = 0.10, across both hemispheres, left column) compared to the London dataset (0.04, right), with averages both centering around 0 ( Fig. 7 A). Distributions for left and right hemispheres are shown using solid and dashed lines, respectively. Especially for the latter, the hippocampal surfaces were positioned more inwards for the original data. Similar scaling was used for the surface maps shown in Fig. 7 B, from both an anterior (top row) and posterior (bottom) perspective. Dotted patterns indicate the vertices which were characterized by significant larger surface displacement ( p < .05, multiple-comparison-corrected) for the Maastricht dataset compared to the London dataset. Again, these tend to localize more towards the tail and head regions, in line with the observations for the single subject data shown in Fig. 5 A.

The effect of B 1 + correction on automatic hippocampal segmentation
As we found changes in surface placement due to the B 1 + correction ( Fig. 7 ), we mapped the change in gradient magnitude at the vertices coordinates onto the surface meshes to detect whether changes in tissue contrast colocalize with changes in surface displacement. First, Supplementary Fig. 2A highlights the distributions of changes in the gradient for both acquisition sites. Again, the Maastricht data is characterized by a wider distribution ( s.d. = 2.46 vs. 0.65) with larger changes occurring at the right hippocampus (solid lines). Statistical testing reveals that the differences between acquisition sites are spatially widespread, as indi-cated by the white dotted pattern, but tend to localize more towards the lateral (i.e. 'outside') and longitudinal extents (i.e., head and tail, Supplementary Fig. 2B). The tail region seems to be affected most considering the overlap (see purple patches in Supplementary Fig. 2C) of the statistical maps based on both surface displacement (red), as well as gradient change (blue).
Based on visual inspection our results, we reasoned that surface displacement and/or change in gradient may correlate with the distance to CSF (see Supplementary Fig. 3A), which is characterized by strong changes in intensity due to the B 1 + -bias removal procedure (see Fig. 1 A). Results in Supplementary Fig. 3B do not show a direct relationship between surface displacement (averaged across subjects, orange line) and distance to CSF (x-axis), but do reveal that the gradient changes (scatter plot) become less strong moving away from CSF. In addition, we observe that changes in the vertices' placements are more variable across subjects (green line) the closer it is to CSF.

Discussion
We have shown before that B 1 + residuals in MP2RAGE data affect performance of the brain's cortical analyses by means of cortical T 1 and thickness biases . Cortical T 1 values were artificially high, and as a result thickness estimates too low in regions characterized by low B 1 + , and vice versa in regions with high B 1 + . However, as advocated in the original MP2RAGE papers ( Marques et al., 2010;Marques and Gruetter, 2013 ), B 1 + sensitivity -that is, the degree of B 1 + -related image inhomogeneity that still resides in your MP2RAGE data -greatly depends on the sequence setup, and scanner hardware. As such, the work presented in the current paper validated this dependency by extending our analyses to an independent dataset acquired at a different 7T MRI site. Here, MP2RAGE data were (1) acquired using sequence parameters that rendered the images minimally B 1 + sensitive; and (2) combined with favorable MR hardware for achieving increased B 1 + homogeneity.

Inter-site cortical T 1 variability
Our results show substantial non-biological variability in cortical T 1 , as a result of differences in B 1 + sensitivity and B 1 + field homogeneity, between acquisition sites. These discrepancies in T 1 (and underlying MP2RAGE signal intensities) are more pronounced in the critical brain regions such as temporal and (medial) frontal lobes which are typically characterized by strong B 1 + offsets at 7T. Most importantly, while a striking difference between acquisition sites (Maastricht − London, corrected for age and sex differences) was visible before B 1 + correction in terms of cortical T 1 (250.31 ms difference on average), this substantially reduced (43.31 ms) after removing the B 1 + bias. These insights are of high importance when comparing or pooling MP2RAGE-based cortical T 1 data between or across subjects acquired as part of different studies and/or at different sites. Note that this is not only true for the MP2RAGE sequence but would apply for other sequences, based on the same principle -i.e., acquisition of different steady -state conditions using varying excitation angles -as well ( Deoni et al., 2004 ;Venkatesan et al., 1998 ). In terms of cortical longitudinal relaxation times, we observed an average cortical T 1 (or longitudinal relaxation rate 1/T 1 ) of ~1718 ms (0.58 s − 1 ) across both data sets after B 1 + correction. This is slightly lower than the ~1900 ms (0.53 s − 1 ) in the motor and temporal cortices measured at 7T by Marques et al. (2010) , but closer to those observed by Metere et al. (2017) . Interestingly, we detected a similar offset between left and right hemispheric T 1 across acquisition sites, with higher values observed for the left hemisphere. This follows the inter-hemispheric differences with lower R 1 (i.e., higher T 1 ) values for the left hemisphere observed by Shams et al. (2019) , across different field strengths ( Kim et al., 1994 ;Marques et al., 2017 ;Wansapura et al., 1999 ), and which could not solely be explained by the asymmetric B 1 + field. In fact, our results show that this difference intensifies after cleaning up remaining B 1 + residuals. Learning-and aging-induced cortical myelination changes have been reported and could introduce region-specific T 1 differences between hemispheres ( Callaghan et al., 2014 ;Natu et al., 2019 ). It is worth noting that these differences in T 1 do not extrapolate towards hemispheric cortical thickness differences . Further research, e.g. by disentangling the relationship of the subject's functional lateralization of the brain and handedness on regional T 1 ( Toga and Thompson, 2003 ), would be necessary to investigate this apparently more systematic inter-hemispheric T 1 bias. However, the effect of handedness may be negligible considering the fact that all subjects within the London dataset were right-handed while an inter-hemispheric difference in T 1 was still visible.

Effect of B 1 + inhomogeneities on hippocampal morphometry
Beside the effects on cortical T 1 , residual B 1 + inhomogeneities may also affect performance of hippocampal segmentations. Accurate segmentation of brain structures such as the hippocampus and basal ganglia are of essence for neurodegenerative research, which heavily relies on accurate assessment of these structure's sizes across their patient populations. In practice, segmentation of these structures is often achieved by deforming the subject's anatomical scan to a common atlas ( Keuken and Forstmann, 2015 ;Mazziotta et al., 2001 ;Xiao et al., 2017 ) or using voxel-based neuroanatomical labeling tools Fischl et al., 2002 ). However, the former becomes problematic in case the study population deviates from the population used for generating the atlas (i.e., patients vs. healthy controls). In these cases, use of segmentation algorithms to label the different tissue types based on the subject's anatomical (e.g. T 1 w) image(s) is preferred. However, as we previously observed for the cortical gray matter, residual B 1 + inhomogeneities can substantially affect image contrast and may therefore introduce variability in the performance of automated methods to precisely define the regions and their borders, which may affect study replicability . Here, we used volume-and shape-based analyses to examine the extent of a potential B 1 + -related bias on hippocampal segmentation within and across acquisition sites. Structural changes of the hippocampus, such as reduced volume (i.e. atrophy), are well-established biomarkers in numerous neurodegenerative and psychiatric diseases linked to memory loss ( Small et al. 2011 ). Previously it was shown that proton density (i.e. M 0 ) effects in MPRAGE T 1 w data modulate subcortical results by lowering GM -WM contrast. Small artificial deviations in the accuracy of subcortical boundary definitions led to spurious brain morphological changes ( Lorio et al., 2016 ). As such, similar methodological-related biases, in the form of image inhomogeneities related to B 1 + , or due to use of different segmentation tools vs. manual tracing ( Morey et al., 2009 ;Wenger et al., 2014 ) may introduce artificial variability across individuals surpassing the volume differences observed between normal and diseased populations ( Lupien et al., 2007 ).
While the output of FreeSurfer's segmentations from the London dataset were relatively stable, hippocampal volumes changed significantly after removal of residual B 1 + inhomogeneities from Maastricht's MP2RAGE data. As a result, estimated hippocampal volumes shifted more closely towards the expected range of hippocampal volumes, based on (1) the averages obtained using the model by Potvin et al. (2016) ; (2) earlier observations across a wide age-range and gender-mixed population of healthy subjects ( Lupien et al., 2007 ); and (3) the decreased inter-site difference. Moreover, we observed a bias towards stronger hippocampal volume changes, predominantly increases, in the right hemisphere leading to a comparable left-right volume ratio across acquisition sites, i.e., characterized by slightly larger right hippocampi. This finding could possibly be attributed to the fact that the right hemisphere is usually larger than the left in right-handed individuals, and leads to a larger hippocampus as such ( Xu et al., 2000 ). The hippocampus, however, is not a single uniform structure, but is composed of several components, or subfields, that are significantly distinct in terms of their cytoarchitectonic, vascular, and electrophysiological properties ( Duvernoy, 1988 ). More advanced segmentation and/or optimization procedures, respecting these different hippocampal subfields, would therefore be beneficial to improve segmentation robustness. Indeed, optimization of the hippocampal labels using Bayesian inference and a statistical atlas within FreeSurfer's automated hippocampal subfield segmentation tool ( Iglesias et al., 2015 ) reduced the effect of B 1 + -related inhomogeneities on segmentation performance (see Supplementary Data) based on the improved Hausdorff distance scores between original and corrected MP2RAGE datasets.
We performed surface-based analyses by computing shape-and tissue contrast (i.e., gradient)-based metrics to localize and characterize these volume changes more precisely. In line with the inter-site differences in volume change due to the B 1 + correction, the Maastricht dataset was characterized by more pronounced changes in hippocampal shape as well. Here, significant differences in shape between London and Maastricht were found to be located closer to the head and tail. Inter-site differences in hippocampal gray matter and white matter contrast were more widespread but tended to localize near structures characterized by the strongest T 1 adjustments. Since changes were strongest for the Maastricht setup, we zoomed in on this dataset to more closely assess the relationship between the observed shape and contrast changes. As such, definition of the hippocampal boundary seems to become more variable (i.e., erroneous) in the original data where the hippocampus is neighbored by thin strands of white matter and CSF interfaces. However, as illustrated in Fig. 5 A, white matter tissue near the hippocampal tail, or arterial voxels near the body are correctly excluded from the hippocampal label, while hippocampal gray matter is correctly included in the head after correction. Although these findings do not warrant overall improvement in hippocampal segmentation accuracy by solely removing residual B 1 + inhomogeneities, it does highlight the importance of careful consideration of the sequence parameters, taking into account the observed variability in hippocampal volume and shape due to B 1 + . In the following section, we will therefore compare both MP2RAGE setups used in this study and reiterate the importance of some of its parameters ( Marques et al., 2010;Marques and Gruetter, 2013 ).

Effect of MP2RAGE parameters and MRI hardware
In case of the MP2RAGE sequence, the resulting T 1 map is calculated using a lookup table based on the combination (UNI, i.e., synthetic T 1 w) image of two gradient-recalled echo datasets (GRE 1 and GRE 2 ) with different excitation angles ( ɑ 1 and ɑ 1 ). However, due to the variable B 1 + field, the flip angles can spatially vary, requiring posthoc corrections to improve T 1 quantification. This is particularly true at ultra-high field (UHF) strengths such as 7T ( Marques and Gruetter, 2013;Van de Moortele et al., 2009 ). The extent to which variations of the B 1 + field affects MP2RAGE T 1 accuracy will depend on the TR, TI's and flip angles ( Marques and Gruetter, 2013 ). Numerical simulations based on each set of parameters confirmed the increased B 1 + dependence (c.q. associated T 1 error) for the Maastricht protocol which deviated from the 'standard' B 1 + insensitive MP2RAGE protocol used in London in terms of TR (6000 vs 5000 ms), TI's (800/2700 vs 900/2750 ms) and flip angles (4°/5°vs 5°/3°). This resulted in substantially larger changes in T 1 between original and corrected data in vivo for the Maastricht data, in particular towards the CSF range. The latter implies that the B 1 + correction step would most strongly affect the CSF-GM tissues boundaries, which agrees with the (1) problematic delineation of the cortical GM and CSF tissue interface ; and (2) more variable hippocampal boundary definitions closer to CSF observed here.
In addition, the smaller variation in B 1 + values across the brain for the London dataset may underscore the effect of using single-transmit (Tx) in Maastricht vs. pTx in London for excitation. Instead of using a single channel RF pulse transmission, pTx makes use of the multipletransmit coil elements to optimize excitation homogeneity and reduce the B 1 + non-uniformity, particularly important at UHF ( Katscher et al., 2003 ;Zhu, 2004 ). Although our data show differences in the extent of transmit efficiency (i.e., reduced width of B 1 + map histograms using pTx), their spatial pattern (i.e., shape) remained relatively comparable: B 1 + is higher in the center of the brain near the limbic lobe, and lower in the temporal lobe, especially. B 1 + shimming using a magnitude leastsquares algorithm on the pTx system has the effect (by definition) of reducing the B 1 + variation across the volume of interest and even with a phase-only approach produces a much tighter B 1 + distribution than a single-channel Tx coil. However, further analyses based on two additional B 1 + -insensitive datasets acquired in Maastricht (see Section 3 in the Supplementary Data) hint towards a limited role of pTx on the reduced impact of B 1 + inhomogeneities observed in London and suggest that sequence parameters are the main drivers. Finally, each acquisition site employs a different gradient coil, corresponding to whole-body vs. head-only, respectively. However, in contrast to pTx, head-only gradients will only have a minor, if not negligible effect on the B 1 + homogeneity. Instead, higher gradient strengths are more useful for cases requiring fast gradient switching such as with functional and diffusion MRI ( U ğurbil et al., 2013 ).

Limitations
The two datasets were acquired using different subjects, and variability in sequence parameters and MRI hardware besides their components determining MP2RAGE B 1 + sensitivity, lowering our precision to pinpoint the B 1 + -related biases. However, we were mainly interested in the 'natural' variability of T 1 and segmentation results across independent, though comparable datasets. This more closely matches with typical situations, where datasets are pooled and/or compared. Nonetheless, presence of biological-relevant variability due to differences in population characteristics (e.g., sex and handedness distributions) may have attenuated or amplified the observed intra-site differences in cortical T 1 ( Callaghan et al., 2014 ;Natu et al., 2019 ) and therefore not necessarily be caused solely by differences in B 1 + sensitivity. Most importantly, the above does not invalidate our conclusions that cortical T 1 measurements and hippocampal segmentations can vary substantially due to differences in MP2RAGE acquisition strategy as these statements are based on intra-subject comparisons.

Conclusions
MRI at 7T holds great promise not only for clinical neuroscientific research but also for assessment of neuroanatomical changes in individual subjects to serve clinical diagnosis. This requires robust acquisition and analysis methods that are insensitive to non-biological variations. In this respect, quantitative imaging approaches such as the MP2RAGE sequence are promising but require extensive validation to optimize their use. Our results emphasize the importance of taking into account the presence of potential acquisition-related biases in the data, especially when interpreting changes in T 1 or morphology of the brain. Residual B 1 + effects on MP2RAGE signal intensities, acquired using 7T MRI, not only affect cortical results but impact hippocampal analyses as well. We confirm that the magnitude of these effects mainly depends on the specific set of sequence parameters. Although different parameters may be preferred in the case of a specific hypothesis or study aims (i.e., visualization of deep brain nuclei), a B 1 + -insensitive protocol greatly improves robustness of the results, promoting comparability across and within subjects and sites. However, while cortical T 1 , and hippocampal volume and shape substantially varied between such acquisition strategies initially, it is encouraging that their comparability improves after posthoc B 1 + correction, warranting pooling and/or comparison of multi-site MP2RAGE data, even when acquired using different protocols and MRI hardware.