Three dimensional MRF obtains highly repeatable and reproducible multi-parametric estimations in the healthy human brain at 1.5T and 3T

Magnetic resonance fingerprinting (MRF) is highly promising as a quantitative MRI technique due to its accuracy, robustness, and efficiency. Previous studies have found high repeatability and reproducibility of 2D MRF acquisitions in the brain. Here, we have extended our investigations to 3D MRF acquisitions covering the whole brain using spiral projection k-space trajectories. Our travelling head study acquired test/retest data from the brains of 12 healthy volunteers and 8 MRI systems (3 systems at 3 T and 5 at 1.5 T, all from a single vendor), using a study design not requiring all subjects to be scanned at all sites. The pulse sequence and reconstruction algorithm were the same for all acquisitions. After registration of the MRF-derived PD T1 and T2 maps to an anatomical atlas, coefficients of variation (CVs) were computed to assess test/retest repeatability and inter-site reproducibility in each voxel, while a General Linear Model (GLM) was used to determine the voxel-wise variability between all confounders, which included test/retest, subject, field strength and site. Our analysis demonstrated a high repeatability (CVs 0.7-1.3% for T1, 2.0-7.8% for T2, 1.4-2.5% for normalized PD) and reproducibility (CVs of 2.0-5.8% for T1, 7.4-10.2% for T2, 5.2-9.2% for normalized PD) in grey and white matter. Both repeatability and reproducibility improved when compared to similar experiments using 2D acquisitions. Three-dimensional MRF obtains highly repeatable and reproducible estimations of T1 and T2, supporting the translation of MRF-based fast quantitative imaging into clinical applications.


Introduction
Magnetic resonance imaging is one of the most powerful diagnostic techniques due to its versatility and its unparalleled soft tissue contrast. The dominant contributors to the MR contrast are longitudinal peatability and reproducibility can benefit the patient by ensuring the overall 'reliability' of the imaging procedure, which should reduce false positives or negatives. Briefly, repeatability refers to the degree of agreement between experiments repeated at the same location, using the same measurement procedure and equipment, performed under similar conditions, and repeated at separate time points. This differs from reproducibility, which refers to the degree of agreement between the results of experiments conducted at different locations and with similar but separate instruments.
Novel MRI acquisition and reconstruction techniques promise a fast and accurate estimation of relevant quantities, including but not limited to T1 and T2 relaxation times, also allowing synthesis of conventional MR contrasts ( Warntjes et al., 2008 ). Amongst these emerging techniques, methods based on transient-state acquisitions, such as MR Fingerprinting (MRF) ( Ma et al., 2013 ), are currently being investigated in healthy subjects and patient groups. These quantitative techniques use undersampled k -space trajectories in combination with a variation of sequence parameters in each TR to elicit unique tissue responses. Parametric maps are subsequently computed by enforcing consistency with a physical model, i.e. the Bloch equations that describe MRI signal behavior.
Recent studies have shown a high repeatability of T1 and T2 values with 2D steady-state free precession (SSFP) MR Fingerprinting ( Jiang et al., 2014 ) in the ISMRM/NIST phantom , as well as in human volunteers Körzdörfer et al., 2019 ). These studies found similar repeatability and reproducibility results despite the different study designs, analysis methods and MR scanners from different vendors. Although the work from ( Körzdörfer et al., 2019 ) focused on 95% confidence intervals in regions of interest and  performed a voxel-wise analysis focussed on coefficients of variation, under the assumption of Gaussian distribution both obtained comparable results, reporting high repeatability and reproducibility. One of the limitations of  was that voxels located at the interface between tissues could include partial volumes. This was noted especially in the grey matter, perhaps due to imperfect segmentation and/or registration at the 2 mm spatial resolution used.
In this work we aimed to assess repeatability and reproducibility of new isotropic MRF measurements at an increased spatial resolution. Recently, the transient-state acquisitions at the basis of MRF have been extended to three-dimensional sampling of k-space, either as stackof-spirals ( Liao et al., 2017 ;Ma et al., 2018 ), 3D spiral projections ( Cao et al., 2019 ) or music-based k -space trajectories . These methods demonstrated improved spatial resolution and diagnostic value, as well as high repeatability . We acquired test/re-test 3D MRF data in the human brain at 1.5T and 3T in a travelling head study involving a total of 12 subjects and 8 different MR scanners from a single vendor.

Study design
The data were acquired on eight separate systems from a single vendor (GE Healthcare, Chicago, IL), five operating at 1.5T and three operating at 3T and with different hardware (RF/gradient coils) and software release ( Fig. 1 a).
The study was approved by the local ethical committee. After informed consent, twelve healthy human subjects (28-43 years old, 10 males and 2 females) underwent two identical sessions, including a 3D SSFP MRF. Subjects were removed from the scanner between imaging sessions. For general linear model (GLM) analysis, each subject was scanned in multiple MR scanners with each subject scanned at least in two of the 8 systems used (the full study design can be seen in the design matrix of the GLM, shown in Fig. 2 a). Subjects also received a fast spoiled gradient echo (FSPGR) scan at 3T with the same spatial resolution as 3D MRF.

MRF acquisition and reconstruction
We used an acquisition strategy based on 3D spiral projection kspace sampling, using the same implementation as described previously . This implementation has also been validated for accuracy against phantom references and compared to other k -space trajectories . Briefly, our MRF acquisition trajectories were based on spiral projections ( Cao et al., 2019 ) (see Fig. 1 b). Acquisition parameters were: FOV = (225 mm) 3 , matrix size = 200 × 200 × 200, sampling bandwidth = ± 250 kHz, TE/TR kept constant at 0.5/11 ms, flip angle schedule as in Fig. 1 c. The flip angle schedule was composed by a ramp (with 70°maximum flip angle), where the low flip angles of the ascending ramp serve to encode T1 variations due to the inversion pulse, while the high flip-angles at the end of the ramp mostly encode for T2 . A descending ramp followed the ascending ramp to achieve a smooth flip-angle pattern. The ascending and descending ramps were followed by a series of pulses with a flip angle of 1. These had the aim of collecting further data while allowing for T1 recovery with minimal saturation effects. Our readouts achieved a resolution of 1.125 mm isotropic in under 10 min acquisition. To achieve SSFPlike signal evolution, all the trajectory interleaves were rewound and followed by a spoiler z-gradient achieving 4 /mm dephasing in z. The maximum gradient amplitude was 20mT/m and slew rate was 70T/m/s, allowing the use of the same trajectory with more scanner models.
MRF maps were obtained by inner-product pattern matching and the MRF dictionary was computed using the extended phase graphs formalism ( Weigel, 2015 ), including global inversion and excitation pulse efficiencies but without including local B 0 or B 1 effects in the model .

Analysis
To allow direct comparison with previous results in literature, the analysis pipeline matched . Briefly, the data were first pre-processed. The quantitative maps for every test were firstly co-registered to their respective re-test (step a), secondly all images were registered to the images acquired at the reference site (3T Site1, step b). This resulted in within-subject alignment. To align quantitative maps to the FSPGR data, we used MRF-derived T1 maps from the reference site (step c). We obtained all transformation matrices with rigid registration in Statistical Parametric Mapping (SPM 12) Toolbox ( Friston et al., 1994 ;Penny et al., 2007 ). After registration, each subject's FSPGR image was warped to match a custom DARTEL atlas ( Goto et al., 2013 ). Estimated transformations were used warp brain volumes (including MRF maps) from the subject space to the template space. Alignment matrices from steps a, b, c together with DARTEL transformations were combined. This allowed us to transform the raw data directly to the DARTEL space avoiding unnecessary interpolation. After applying Gaussian smoothing with 6 mm full width half maximum (FWHM), coefficients of variation (CVs) were calculated between each voxel in test and retest measurements. In addition to CVs, we calculated the signed and unsigned test/retest relative differences at each voxel all scanners as described in ( Gracien et al., 2020 ). For each subject, the segmented tissue probability maps were calculated on the SPGR T1w images and used to extract mean values of parameters in GM, WM, and CSF. To avoid interpolations, the quantitative values were read in the native spaces by transforming the segmentation masks using a nearest neighbor algorithm. To exclude voxels with partial volume effects (PVE) values were reported averaged on each tissue class, using a threshold of 70% PVE as a mask per each segmented tissue ( Fig. 2 b). Due to arbitrary receiver gain and scaling, relative M 0 values were reported self-normalized (nPD) to the average value inside the brain mask. . For each of the segments the flip angle schedule in (c) was used, preceded by an adiabatic inversion pulse. The flip angle schedule was composed by an ascending and descending ramp (with 70°maximum flip angle), followed by a series pulses with a flip angle of 1. Full details of the acquisition have been described previously .
A General Linear Model (GLM) was implemented in SPM as a secondary sensitive method to characterize the effects from several independent physical parameters, as detailed in . The multiple linear regression model at the basis of GLM is formulated as follows for each voxel of T1, T2 and nPD: where p is the number of independent variables), which represents whether that variable is present in the analysis (X ij = 1) or unused (X ij = 0). The values j represent parameters to be estimated, with 0 corresponding to the mean value, and i is the variance in the data that cannot be explained by the predictors. The values of the j maps quantify the influence of the independent variables on the measured physical parameters, with the same measurement units: T 1 T 2 and nPD at each voxel location . The covariates modeled as categorical variables were as follows: acquisition variability (test/retest data), field strength (1.5 T and 3 T), subject, and acquisition site ( Fig. 2 a).
In addition to the GLM where we estimated the biases associated with confounders, we also computed voxel-wise reproducibility CVs to assess variability. As not all subjects were acquired in all scanners, the values from the reproducibility CVs were extracted from four subjects who were acquired in all scanners. Additionally, we performed a set of ICC analyses on the 7 subjects scanned at Sites 1-4 (see Supplementary  Tables 3-4).

Data and code availability statement
In compliance with funder's policy and institutional ethics approval, the original parametric maps as well as the corresponding structural data for each subject are freely available online , see 10.5281/ZENODO.3989799 . Scripts to perform the voxelwise analyses are freely available ( https://github.com/jk619/test _ retest _ MRF ).

Results
Representative co-registered MRF images from one subject are shown in Fig. 3 a, showing fully-quantitative maps with a high anatomical detail. Average values over subjects at each system and site are reported in Fig. 3 b, as well as comparison with literature ranges in WM and GM from a host of different techniques ( Deoni et al., 2005 ;Ma et al., 2018 ;Poon and Henkelman, 1992 ;Whittall et al., 1997 ). Values were comparable between sites, and within literature ranges. The CSF values were more variable than the solid matter compartments across scanners, and site 3 had the highest average bias.
To assess repeatability, coefficients of variation were computed in each voxel between test and re-test, for each scanner and subject. The  Fig. 2. (a) The design matrix of the general linear model used to measure biases associated to each confounder. From the matrix it can also be seen that 12 subjects were scanned twice, each in at least two of the 8 systems used for the study, and that 3 systems were operating at 3.0T while 5 were 1.5T. The images in (b) are the tissue masks obtained by thresholding the probability maps from the DARTEL template.
voxel-wise repeatability CVs at two representative sites are shown in Fig. 4 a, showing uniform variability across the brain at 3T, while at 1.5T some structural bias was visible within the CV maps. Tissue-class averaged CVs associated to each scanner between test and re-test are reported in Fig. 4 b, showing solid matter repeatability CVs under 2% for nPD and T1, and 5% for T2, with the exception of site 8 with CV of T2 around 8%. In addition to the repeatability CVs, the signed and unsigned relative differences, as well as the ICC analysis are reported in Supplementary Tables 1-4.
Reproducibility biases were estimated with the GLM. Fig. 5 a shows the biases associated with field strength and each individual scanner. The biases associated with field strength on nPD were null due to the normalization process, T 1 values globally increased with field, while the values of T 2 globally decreased with field. Between sites, T1 showed uniform bias both at 1.5T and 3T, with boundaries between tissues displaying higher values, in particular for site 3. Biases in T2 values were mostly uniform across the brain in the 1.5T sites, while they displayed the pattern typical of B1 + effects at 3T . The nPD bias spatial distribution was consistent with the different receiver coil profiles.
Reproducibility variations were estimated using coefficients of variation. Fig. 5 b reports the averaged voxel-wise CVs in each compartment. Coefficients of variation associated with site were under 10% for T1 and T2 in solid matter compartments at 1.5T and 3T.

Discussion
We assessed repeatability and reproducibility of 3D MRF at eight separate MR imaging systems, which included both 1.5 T and 3 T scanners. Our study was performed on twelve subjects, and each MR acquisition included a test and a re-test session with the subject removed from the Each subject had a test-retest assessment at least on a 1.5T and a 3.0T system. Image resolution was 1.125 mm isotropic, allowing a high anatomical detail in the quantitative maps. (b) Average values for T1 and T2 for each site in the grey matter, white matter and CSF. scanner between sessions. Importantly, our assessment was not limited to regions of interest, but estimated voxel-wise performance; variability was assessed using coefficients of variation and biases using a GLM analysis.
While the biases found by the GLM were similar to those found in our previous work , the reproducibility CVs significantly improved here and the repeatability CVs were improved by a factor of two on average. Three-dimensional acquisitions could therefore not only improve upon the anatomical detail, but also on the precision of the measured parameters. The improvement seen using 3D acquisition was at a higher resolution and not at the ex-pense of scan time, as acquisition time was kept under 10 min. Improvement was also consistent across scanners operating at 1.5T or 3T field strength. In addition to showing the differences between the twodimensional and three-dimensional MRF methods, the data reported here builds upon previously-published assessments of reproducibility from our group  and others ( Körzdörfer et al., 2019 ;Ma et al., 2019 ). When directly comparing the CV values obtained, the results here reported matched or surpassed other quantitative assessments of repeatability and reproducibility in the literature using other common mapping techniques ( Bauer et al., 2010 ;Deoni et al., 2008 ;Landman et al., 2011 ). Importantly, the variability of test-retest and the biases estimated between different scanners did not inversely correlate with the number of receivers or field strength. This implies that the scanner with the highest SNR or image quality is not necessarily also the best performing as a quantitative measurement instrument. In addition to accuracy of the model and reconstruction, several aspects such as repeatable positioning, adjustments and hardware stability come into play when performing quantitative estimations, which are not usually considered at this level of detail when designing or using the systems within the normal clinical routine. More sophisticated procedures for positioning or pre-scan, aimed at optimizing quantification in addition to SNR, may be able to improve upon the current results. Such issues are not limited to different scanners, but also to hardware and software upgrades of the same system. In this context, assessing temporal persistence of associated biases would be relevant ( Friedman et al., 2017 ).
The biases from the GLM were still higher at the interfaces between different tissues than inside individual tissues, and site 3 displayed more bias than the other sites. The scanner at site 3 is from an older generation of systems, displaying the most differences with the other systems in terms of radiofrequency and gradient hardware. Here we did not aim to perform a system-wise characterization of the gradient hardware, but used a single trajectory delay to correct the k-space trajectory. Despite the absence of reconstruction-related artifacts by visual inspection, a full characterization of systems may be useful in future in order to identify and correct biases associated with specific hardware. Corrections for phase and B0 distortions are more demanding in non-Cartesian acquisi- Fig. 5. Reproducibility assessment. In (a) the local biases ( , in ms for T1 and T2, and in% for nPD, as defined in Eq. (1) ) associated with the individual sites or field strength, as estimated by the general linear model; in (b) the voxel-wise reproducibility CVs from all scanners of the same field strength is reported by tissue class.
tions than in conventional Cartesian scans. In this context, unaccounted trajectory errors and gradient delays may produce spill-over of signal between different areas ( Moussavi et al., 2014 ). It has been shown previously that these inaccuracies can corrupt images obtained with non-Cartesian acquisitions ( Buonincontri et al., 2014 ), as well as impacting the estimation of quantitative parameters with MRF ( Berzl et al., 2017 ). Full system characterization may further improve on this point, delivering higher reproducibility results.
The performance of the technique in terms of bias and variability could be improved by adding a motion correction procedure during the reconstruction of MRF maps. For this, a technique was recently demonstrated dividing the data into small segments and accounting for subject's motion in a post-processing step . However, the spatial encoding used here is not suitable for such motion correction, which requires a specific trajectory order. In addition to bulk motion, other physiological signals can also corrupt the measurements, such as blood flow and respiratory motion with potentially different effects at 1.5T and 3T. Future studies acquiring physiological signals simultaneously to the MRF signals could improve on physiological motion characterization and reduction. Further, the combination with advanced acceleration techniques and deep learning inference may have an additional impact on the efficiency of 3D MRF. Recent work in this domain have included the combination of 3D MRF with deep learning methods and parallel imaging ( Chen et al., 2020 ) or compressed sensing .
Our acquisition was based on SSFP, which greatly reduces the sensitivity to B0 inhomogeneities ( Jiang et al., 2014 ). However, T2 and nPD estimates may still suffer from sensitivity to B1 + and B1-respectively, as these cannot be easily added to the signal model with the current acqui-sition schedule ( Buonincontri and Sawiak, 2016 ). As shown by our data, the quality of T2 maps acquired with 3D sequences improved over previous 2D MRF. However, T2 reproducibility biases at 3T still displayed the typical spatial distribution of B1 + associated with the transmit coil, whereas, nPD showed typical B1-biases associated with the receiver coil. Methods including an external B1 + map ( Ma et al., 2018 or including B1 + directly in the MRF model ( Buonincontri et al., 2017 ;Buonincontri and Sawiak, 2016 ;Cloos et al., 2016 ) may improve T2 reproducibility, while methods to correct receiver biases may be useful in correcting for nPD inconsistencies ( Deshmane et al., 2017 ).
One limitation of our study was the low number of subjects scanned at all locations which could be included in our estimates of the reproducibility CVs. This did not allow to perform a full reproducibility analysis based on intraclass correlation coefficient (ICC) ( Giraudeau et al., 2003 ), but only to perform it on a subset of our data. A full ICC analysis could provide a better understanding of the variability associated with different systems ( Friedman et al., 2008 ;Brown et al., 2011 ). Irrespectively of the low number of subjects scanned at all sites, an advantage of our approach was the GLM design to assess reproducibility biases. With such a design, we were able to include many scanners and subjects from different centers, enabling the assessment of multiple biases without the need of scanning all subjects at all centers. This study design is extendable to larger numbers of sites, with each subject scanned in at least two of the sites, but with a sufficient overlap in the design matrix to make a confident fit of the model. As seen in our study, despite the high reproducibility, different scanner hardware still has a significant impact on the quantitative values.
Future studies including more sites with different scanner hardware and setup would be an important step towards standardization of mea-surements for quantitative MRI in vivo . Our study was limited to a single-vendor assessment. In order to deliver on the promise of fullyquantitative MRI, multi-vendor studies are required to achieve standardization of values. Previous studies, such as ( Lee et al., 2019 ), have looked into standardization of relaxometry values, finding significant differences between vendors. Recently, the study of magnetization transfer effects was performed in order to reduce vendor-related variability ( Teixeira et al., 2020 ) . Although in this study the same reconstruction code was used for all scanners, different vendors and software releases may implement different k-space filters, which should also be taken into account for standardization ( Friedman et al., 2006 ).
While here we focused on the brain, several groups are investigating other areas of the body, such as chest ( Cruz et al., 2018 ;Hamilton et al., 2017 ;Serrao et al., 2020 ), pelvis ( Chen et al., 2019 ;Kaggie et al., 2019 ;Yu et al., 2017 ), breast ( Chen et al., 2019 ) and musculoskeletal system ( Cencini et al., 2019 ;Cloos et al., 2019 ;Lattanzi et al., 2017 ). Despite requiring further technical sophistications mainly due to motion, chemical shift artifacts and increased B0/B1 inhomogeneities, works in other body regions also increasingly find high repeatability ( Chen et al., 2019 ;Cloos et al., 2019 ). Despite applications in the body often requiring 2D acquisitions, however, some anatomical areas, such as the breast, have benefitted from 3D acquisitions ( Chen et al., 2019 ). Effective quantitative MRI methods in different body regions are being increasingly demonstrated and characterised, such that they can become clinically relevant, and be further developed into tools for disease characterization.

Conclusion
We have reported the assessment of the test/retest repeatability with 3D MRF in the healthy human brain at 1.5 T and 3 T. Three-dimensional MRF with spiral projection k-space trajectory obtained parametric maps with high anatomical detail, as well as highly repeatable and reproducible nPD, T1 and T2 values in a short scan time. Repeatability was improved by a factor of two for T1 and T2 in GM and also for T2 in WM, when compared to previous studies using similar analysis but 2D acquisition, and reproducibility CVs were also improved. These repeatability and reproducibility results further encourage investigations of MR Fingerprinting in the view of translating it to clinical applications, especially for monitoring disease progression, in single or multi-centre studies.

Data and code availability statement
In compliance with funder's policy and institutional ethics approval, the original parametric maps as well as the corresponding structural data for each subject are freely available online , see 10.5281/ZENODO.3989799 . Scripts to perform the voxelwise analyses are freely available ( https://github.com/jk619/test _ retest _ MRF ).