Accuracy and repeatability of joint sparsity multi-component estimation in MR Fingerprinting

MR ﬁngerprinting (MRF) is a promising method for quantitative characterization of tissues. Often, voxel-wise measurements are made, assuming a single tissue-type per voxel. Alternatively, the Sparsity Promoting Iterative Joint Non-negative least squares Multi-Component MRF method (SPIJN-MRF) facilitates tissue parameter estimation for identiﬁed components as well as partial volume segmentations. The aim of this paper was to evaluate the accuracy and repeatability of the SPIJN-MRF parameter estimations and partial volume segmentations. This was done (1) through numerical simulations based on the BrainWeb phantoms and (2) using in vivo acquired MRF data from 5 subjects that were scanned on the same week-day for 8 consecutive weeks. The partial volume segmentations of the SPIJN-MRF method were compared to those obtained by two conventional methods: SPM12 and FSL. SPIJN-MRF showed higher accuracy in simulations in comparison to FSL-and SPM12-based segmentations: Fuzzy Tanimoto Coeﬃcients (FTC) comparing these segmentations and Brainweb references were higher than 0.95 for SPIJN-MRF in all the tissues and between 0.6 and 0.7 for SPM12 and FSL in white and gray matter and between 0.5 and 0.6 in CSF. For the in vivo MRF data, the estimated relaxation times were in line with literature and minimal variation was observed. Furthermore, the coeﬃcient of variation (CoV) for estimated tissue volumes with SPIJN-MRF were 10.5% for the myelin water, 6.0% for the white matter, 5.6% for the gray matter, 4.6% for the CSF and 1.1% for the total brain volume. CoVs for CSF and total brain volume measured on the scanned data for SPIJN-MRF were in line with those obtained with SPM12 and FSL. The CoVs for white and gray matter volumes were distinctively higher for SPIJN-MRF than those measured with SPM12 and FSL. In conclusion, the use of SPIJN-MRF provides accurate and precise tissue relaxation parameter estimations taking into account intrinsic partial volume eﬀects. It facilitates obtaining tissue fraction maps of prevalent tissues including myelin water which can be relevant for evaluating diseases aﬀecting the white matter.

While these approaches can have a rich sensitivity to a wide range of tissue properties, the measurements are typically made voxel-wise, compartment voxels or concern very different types of multi-parameter acquisitions.
Another recently published method for coping with different tissue types within a voxel is the Sparsity Promoting Iterative Joint Non-negative least squares algorithm that was applied to MRF data ( Nagtegaal et al., 2020 ) (SPIJN-MRF). This approach asserted joint sparsity of the number of tissue components in a region of interest, i.e. in a voxel as well as spatially. A priori no assumptions are made about the number of tissues and relaxation times while the method still proved faster than previously proposed techniques. As such it yields tissue parameters for each identified tissue component as well as tissue volume fraction maps. The SPIJN-MRF algorithm estimated brain tissue fraction maps from fully sampled MRF data with promising accuracy and precision ( Nagtegaal et al., 2020 ). These maps particularly showed components representing myelin water, white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) components. To be able to distinguish myelin water is specially relevant in the diagnosis and evaluation of multiple sclerosis (MS), since it has been shown that patients have reduced myelin water content ( Laule et al., 2008 ;Mackay et al., 1994 ;Laule et al., 2006 ;Kolind et al., 2015 ). Also, imaging the increasing myelin content of the developing brain has been done in several initial studies ( Cencini et al., 2021 ;Chen et al., 2019 ).
Although the initial results with the SPIJN-MRF method were encouraging, an extensive study into the accuracy and precision of the method has not yet been performed. Additionally, it is unknown how the obtained brain tissue fraction maps relate to existing methods for tissue segmentations.
The aim of this work is to evaluate the accuracy and repeatability of the SPIJN-MRF parameter estimations from highly undersampled MRF acquisitions. The accuracy of the method will be assessed through estimation of the relaxation times and tissue fractions on simulated data from the BrainWeb numerical phantom environment, including a comparison to conventional techniques: the Functional Magnetic Resonance Imaging of the Brain Software Library (FSL) and the Statistical Parametric Mapping Software (SPM12). Furthermore, the repeatability will be assessed through the parameter estimates in eight weekly repeated scan sessions in 5 healthy volunteers. As in the simulations, tissue fraction maps will be compared to the SPM12 and FSL segmentations.

Simulations
Numerical simulations using the 20 BrainWeb phantoms ( Aubert-Broche et al., 2006 ) were performed to test the accuracy and precision of the segmentations obtained with SPIJN-MRF and T 1 -weighted based methods (FSL,SPM12). The BrainWeb phantoms were based on multiple high resolution conventional weighted scans from 20 different healthy subjects. Simulations were performed with resolution 1 × 1 × 5mm 3 . T 2 values were as defined in the BrainWeb database, but for white and gray matter T 1 = 930 ms and 1300ms were used instead of 500 ms and 830 ms as these are more realistic relaxation times for 3T ( Körzdörfer et al., 2019 ;Buonincontri et al., 2019 ).
The MRF data was created by simulating a gradient-spoiled MRF sequence with a train of 1000 radiofrequency pulses as in ( Jiang et al., 2015 ). This was done by performing extended phase graph signal generation ( Weigel, 2015 ;Hennig, 1988 ) for each tissue after which a weighted combination of the signals based on the BrainWeb partial volume fractions yielded the MRF images. Subsequently, independent random complex Gaussian noise with standard deviation = max ( ( ) )∕100 was added to each MRF image , to yield the same noise level for all voxels and time points. No undersampling was performed in these simulations, as in ( Nagtegaal et al., 2020 ).
The input tissue parameters and partial volume segmentations from the BrainWeb database served as ground truth values.

In vivo data acquisition
In vivo acquisitions were performed on a 3.0T GE MR750 MRI scanner (General Electric, Milwaukee, WI, USA). A Head, Neck and Spine array coil was used from which the 12 channels dedicated to the head were used for imaging.
Five healthy volunteers (3 females and 2 males, between 18-25 years) participated in this Institutional Review Board-approved study. All volunteers gave written informed consent to usage of their data prior to the first scan session.
MRF imaging was performed on the same day and time ( ± 15 min) of the week for 8 consecutive weeks. As such the same gradient-spoiled MRF sequence as in the simulations was applied, varying the flip angle and the TR along a train of 1000 radiofrequency pulses ( Jiang et al., 2015 ). The acquisition was performed with a FOV of 31 cm and slice thickness of 5 mm and slice gap of 1 mm (voxel size 1.2 mm x 1.2 mm x 5 mm). Total number of slices was 27, consisting of 256 × 256 voxels. The total scan time was 5 minutes and 54 seconds.
In each session, a READYBrain sequence ( McMillan et al., 2022 ) was applied to align the MRF acquisitions to the anterior-posterior commissure (AC-PC) plane. READYBrain automatically detects the AC-PC plane for each subject and exploits this to plan the MRF with comparable imaging orientation across time and subjects.
Motion during 2D-MRF acquisitions is known to result in artifacts and errors in estimated T 1 and T 2 maps. Especially through-plane motion can lead to blurring and underestimation in T 2 ( Mehta et al., 2018 ;Yu et al., 2018 ;Cruz et al., 2019 ;Kurzawski et al., 2022 ). Slices showing such through-plane motion artifacts were visually identified in single component T 1 and T 2 maps by notably deviating values compared to neighbouring slices before performing the multi-component analysis and excluded from further analysis unless explicitly stated otherwise. T 2 underestimation can also be caused by e.g. B 1 + inhomogeneity. However, we assert that B 1 + variation cannot cause abrupt changes in T 2 from slice to slice.

Image reconstruction
The acquired MRF data was reconstructed using an in-house implemented low-rank (LR) reconstruction algorithm in which the LR images were obtained while a compression matrix was iteratively updated. The spatial L 1 norm of the L 2 norm across the component images was applied for regularization purposes. A complete description of the used method can be found in Appendix I.

Single and multi-component parameter estimation
A dictionary was precomputed with the extended phase graph algorithm ( Weigel, 2015 ;Hennig, 1988 ). This dictionary was created for T 1 values ranging from 100 ms to 3042 ms and T 2 values from 10 ms to 1030 ms, sampled with increasing step sizes by 5% (chosen as a compromise between dictionary size and resolution).
Synthetic T 1 -weighted images were calculated based on the T 1 , T 2 and M 0 maps estimated by single-component dictionary matching with the same settings as for the BrainWeb simulations. We chose to generate synthetic T 1 -weighted images from the same data to have perfect spatial correspondence of the MRF and T 1 -weighted data. As such differences in imaged volume were avoided. Based on the synthetic T 1 -weighted images a brain mask was created using FSL-BET ( Jenkinson et al., 2012 ).
Subsequently, the SPIJN-MRF algorithm ( Nagtegaal et al., 2020 ) was applied to the MRF data, to obtain multi-component estimations for the Table 1 Ranges used to categorize SPIJN-MRF tissue/material components ( Bojorquez et al., 2017;Buonincontri et al., 2019;Körzdörfer et al., 2019;Mackay et al., 1994 ). tissues present in the brain mask region. The SPIJN-MRF algorithm is based on the following optimization function: Where X are the (compressed) MRF images with J voxels per time frame, D is the (compressed) dictionary with N A atoms and C consists of N A tissue fraction maps of J voxels. The regularization parameter balances the data fidelity and joint sparsity regularization terms. Multi-component estimations were performed for all slices simultaneously. This yielded tissue parameters and magnetization fractions per voxel. Note that the number of tissues is not fixed a priori, but controlled through a sparseness constraint. A regularization value of 0.03 was used in the multi-component analysis. This regularization level was manually determined for one in vivo dataset and kept constant with all subsequent numerical and in vivo analyses.

SPM12 and FSL segmentations
The conventional methods were applied to the synthetic T 1 -weighted images (created as described above). FSL-FAST ( Jenkinson et al., 2012 ;Zhang et al., 2001 ) was applied after brain extraction (through FSL-BET) to obtain tissue segmentations while using default settings. SPM12 segmentation ( SPM12, 2019 ; Ashburner and Friston, 2005 ) was also used with default settings, but the sampling distance was set to 1 instead of 3 to produce the most accurate segmentations (at the cost of consuming more time and memory resources). Tissue volumes in a voxel were calculated by multiplying obtained tissue probabilities in a voxel from SPM12 and FSL with the voxel size.

Atlas registration
To facilitate assessment of particular brain regions, all imaging data was aligned to the ICBM 152 Nonlinear atlases version 2009 ( Fonov et al., 2011 ;Fonov et al., 2009 ). This alignment was based on the synthetic T 1 -weighted images using the Diffeomorphic Anatomical Registration through Exponentiated Lie algebra (DARTEL) algorithm ( Ashburner, 2007 ) as implemented in SPM12 ( SPM12, 2019 ). Subsequently, the obtained deformations were applied to all SPIJN-MRF, SPM12 and FSL partial volume segmentations to achieve voxel-wise alignment to the atlas. All subsequent analyses focused on a common brain region, specifically: the region for which the mean intracranial volume fraction per voxel (WM + GM + CSF) was at least 75% with all segmentation methods.

Tissue discrimination
From the SPIJN-MRF parameter estimations, different components were identified based on the relaxation times, see Table 1  : myelin water, white matter (excluding myelin water), gray matter, CSF and veins and arteries. CSF was partitioned into a short and long T 2 component after inspection of first results, as was observed in ( Nagtegaal et al., 2020 ). Simultaneously, components with T 1 and T 2 relaxation times around 1s were identified, which we associated with veins and arteries. Total tissue volumes were calculated by summing the tissue fraction estimates multiplied by the voxel size (using an effective slice thickness of 6 mm to correct for slice gap).

Analysis of accuracy
The accuracy, i.e. systematic error, of the obtained SPIJN-MRF, SPM12 and FSL segmentations was evaluated using the simulated data. The agreement between estimated total white matter (including myelin water), gray matter and CSF volumes and the reference volumes was evaluated through Bland-Altman style plots. These showed the deviation from the reference as a function of reference value. Furthermore, the voxel-wise similarity between estimated partial volume and reference was assessed using the Fuzzy Tanimoto Coefficient (FTC) ( Crum et al., 2006 ), which expresses the similarity of paired data as where A,B is a pair of tissue fraction maps of a complete volume, specific region or slice; subscript i represents a spatial index of the concerned voxels. The FTC is an adaptation of the Jaccard index or Tanimoto coefficient for non-binary segmentations.

Analysis of repeatability
The repeatability of the SPIJN-MRF method was determined for the tissue parameter estimations and partial volume segmentations on the in vivo imaging data.
First, the mean and standard deviation of the SPIJN-MRF relaxation times were calculated per subject and tissue over the eight different time points. Subsequently, the repeatability was evaluated based on the range of standard deviations across the subjects per tissue.
Second, the repeatability of tissue volume estimation of the SPIJN-MRF and conventional methods over the time points was evaluated voxel-wise, in different brain regions and for the entire brain per tissue. In each case the mean value and corresponding standard deviation was determined per subject across the time points. The repeatability was quantified using the Coefficients of Variation (CoV = , where is the standard deviation and is the mean of the tissue volume over the 8 scan sessions) and the Combined Fuzzy Tanimoto Coefficient (CFTC) ( Crum et al., 2006 ) for each subject and tissue where denotes the volume fraction of voxel i at day k for a subject and tissue of interest.

Comparison with conventional methods
Initially, comparison of the methods was performed by visual assessment of the segmentation maps. Subsequently, estimated total tissue volumes for the entire brain and per region were compared across the methods. While doing so, the volume fractions obtained with SPIJN-MRF for white matter and myelin water were summed into a single white matter tissue. Similarly, the SPIJN-MRF CSF fractions of long and short

Simulations
SPIJN-MRF yielded exact estimates of underlying T 1 and T 2 tissue parameters in all 20 datasets. Fig. 1 shows representative partial volume segmentations obtained with SPIJN-MRF, SPM12 and FSL. Observe that the SPIJN-MRF images closely resemble the ground truth with soft transitions between tissue and background, while these transitions are sharper in the SPM12 and FSL segmentations. Fig. 2 shows Bland-Altman style plots, for each subject, method and the three tissues of interest, with the deviation from the true total tissue volume (vertically) as a function of the true value (horizontally). The mean deviations from the true value are indicated by solid lines. Limits of agreement are delineated by the light blue regions.
It can be observed that SPIJN-MRF and FSL showed little bias (defined as only a small deviation from the true value), whereas marked bias was found with SPM12. All three methods had the lowest bias in gray matter compared to the other tissues (max 3.3 cm 3 for SPM12). SPM12 had the largest bias for white matter (mean deviation -78.4 cm 3 ).
The limits of agreements between the estimated volumes and the reference values were smaller for SPIJN-MRF than for SPM12 and FSL, indicating that the differences between SPIJN-MRF estimations and the true values vary less than for SPM12 and FSL. SPM12 had the largest spread in the limits of agreements, which indicates a lower precision. Fig. 3 shows the distribution of FTCs obtained with SPIJN-MRF, SPM12 and FSL. In all the tissues, the segmentations performed by SPIJN-MRF were the most similar to the true segmentations, with FTCs around 0.97. SPM12 and FSL had lower FTCs: between 0.6 and 0.7 for white and gray matter and between 0.5 and 0.6 for CSF, respectively. Segmentations made with FSL had slightly higher FTCs than segmentations with SPM12. FTCs varied across slices, following a similar trend as the total amount of tissue per slice (see Supplementary Fig. S1). A fixed SNR level (100)

In vivo data analyses
A total of 10 MRF acquisitions (out of 40) were affected by motion artifacts in one or more slices. It concerned 3% of the total number of slices. Computation times were approximately 50 minutes for SPIJN-MRF estimations on a standard desktop pc (Intel i7-8650U CPU 1.9GHz, 8GB RAM). Fig. 4 shows all components resulting from the MRF data of a single acquisition across several slices from one representative subject. The leftmost column shows the T 1 and T 2 values for each of 9 identified components while the images depict corresponding estimated tissue fractions. In Fig. 5 , the main component maps and standard deviation (std) maps (after registration) obtained after grouping as per Table 1 for a central slice from the same subject are depicted for each acquisition day. Observe that there are minimal differences over time (as reflected in the std maps in the right column). Global intensity variations can be observed in the myelin water map (in particular concerning scans 5 and 6 seem to yield higher tissue fractions). However, for the other tissues the variability is similar across the brain. Table 2 collates the means and standard deviations of T 1 and T 2 relaxation times across all subjects and scans per tissue as well as the mean intra-subject standard deviations per tissue. On average 8.3 com-ponents were identified over all acquisitions (26 times 8 components, 13 times 9 and 1 time 10). No components were outside the predefined ranges ( Table 1 ). The mean intrasubject standard deviations of myelin water and veins and arteries for T 1 estimation and of short CSF for T 2 estimation was larger than 10% percent of the reported mean value. For all other tissues the intrasubject standard deviations of T 1 and T 2 relaxation times showed small variation. After changing the dictionary ranges the observed clipping to the dictionary boundaries appeared persistent and visually did not appear to affect the estimated partial volume maps. Supplementary Fig. S5 shows scatter plots of the estimated T 1 and T 2 values across subjects and scans.  Fig. 4. Multi-component tissue fraction maps for one MRF dataset of a single subject. Obtained T 1 and T 2 relaxation times are shown in the left column. Note that the component maps are only estimated inside the brain mask. For illustration purposes only the ten central odd slices are shown instead of the total 28 slices and components with a relative volume fraction of less than 1% were not shown.

Table 2
Means and corresponding standard deviations (std) of estimated T 1 and T 2 relaxation times across all 5 subjects and 8 repeated scans per tissue as well as the means of 5 intrasubject standard deviations per tissue. * T 1 or T 2 values are the minimum/maximum value represented in the dictionary, which also occured with extended boundaries.

Table 3
CoVs of the estimated total myelin water volume, white matter volume (including myelin water), gray matter volume, CSF volume and total brain volume. The last column reports the mean of the CoVs over all subjects.  Table 3 presents the CoVs of the total volume estimates of myelin water, white matter (including myelin water), gray matter, CSF and the total brain volume (WM + GM) across subjects. Total brain volume showed minor variation (mean CoV = 1.1 %); the volume estimates of CSF, GM and WM (including MW) were slightly more variable, with mean CoV of 4.6%, 5.6% and 6.0%, respectively, while MW by itself had higher mean CoV (10.5%). Fig. 6 illustrates that the SPIJN-MRF tissue maps from the in vivo data contained more gradual transitions between brain tissues than the con- ventional methods, similar to simulated data. For instance, the SPIJN-MRF CSF map shows small details not observed in the other maps (green circle) whereas parts are more confined in the SPM12 and FSL segmentations (red circle). Furthermore, the GM SPIJN-MRF component was almost always identified as a mixture of GM and CSF or WM components (maximal GM fraction around 90%). This highlights the ability of SPIJN-MRF to approximate brain tissue content of partial volume voxels.

Comparison with brain volume measurement methods
The distributions of total volumes for each subject with SPIJN-MRF, SPM12 and FSL are collated in Fig. 7 . It can be observed that SPM12 estimates are lower in white matter than FSL estimates, and consequently, the opposite is noticeable for gray matter and CSF as the sum of tissue percentages is 100% by definition. In general, the estimated volumes with SPIJN-MRF appear between the estimated volumes of SPM12 and FSL. Furthermore, the SPIJN-MRF volume distributions show a larger spread than those of SPM12 and FSL.
In Fig. 8 , the estimated relative volumes per anatomical region for each tissue, method and subject are summarized (see Supplementary  Fig. S6 for absolute volumes per region). In general, the three methods had relatively similar relative volumes per region. However, for the cerebellum and for the deep gray matter FSL gave higher white matter volume than SPIJN-MRF and SPM12 in all subjects. As a consequence, FSL yielded lower gray matter volumes. Fig. 9 shows CFTCs (upper row) and CoVs (lower row) for the motion-artifact-free data and the full dataset across the subjects and the tissues.
The SPM12 yielded slightly higher CFTC than SPIJN-MRF in white and gray matter in all subjects. SPIJN-MRF gave higher CFTCs compared to both SPM12 and FSL in CSF, which was observed consistently across slices (see Supplementary Fig. S7). Also, the CoVs of SPM12 and FSL were distinctly lower than those of SPIJN-MRF for white and gray matter, whereas differences were smaller for CSF and total brain volume.
The CoVs for SPIJN-MRF were lower using the data without motion artifacts than with all data in all tissues but one in subjects C and D, and E. The only exception was in subject E, in which the white matter score was barely affected.
We did not observe a particular trend in the CFTCs nor in the CoVs of the motion-artifact-free data related to anatomical region nor any specific differences between left/right brain regions (see Supplementary  Fig. S8).

Discussion
This paper aimed at evaluating the accuracy and repeatability of the joint sparsity based SPIJN-MRF estimations ( Nagtegaal et al., 2020 ). The results show that it yields accurate brain tissue voxel fraction estimation in simulated data (mean systematic errors between 2cm 3 and 6cm 3 and Fuzzy Tanimoto Coefficient above 0.95) and gives good repeatability in in vivo data (Fuzzy Tanimoto Coefficient above 0.7 and mean (across subjects) CoVs between 5.7% and 6.1%).
In simulations, we observed that the white matter, gray matter and CSF fraction maps obtained with SPIJN-MRF had smaller systematic er- rors than those obtained with SPM12 and FSL in both total volume estimation as voxel-wise similarity.
The T 1 and T 2 values obtained by SPIJN-MRF from the in vivo data for each component were very stable for all the acquisitions and in the range of previous quantitative studies ( Mackay et al., 1994 ;Bojorquez et al., 2017 ;Stanisz et al., 2005 ;Alonso-Ortiz et al., 2015 ;Bouhrara and Spencer, 2017 ). The observed standard deviation of zero for WM derives from the dictionary resolution (which is 5% for T 1 values) and resulted from repeated selection of the same dictionary atom. Previous studies have reported variations in T 1 below 5% Buonincontri et al., 2019 ). Unexpectedly and contrary to the singlecomponent estimations, we found that the T 2 value from the white matter component was slightly longer than the T 2 value found for the gray matter (see Table 2 ). This can be partly due to the attribution of myelin water (having a very short T 2 time) to a separate component independently of white matter, whereas values reported in the literature for white matter commonly include myelin water In effect, this will lead to longer relaxation values for the 'pure' white matter component, merely as a shorter time component is left out. Previously, a thorough study of the T 2 distributions in the brain indeed demonstrated that T 2 distributions are affected by properties of the spaces between myelin sheaths ( Whittall et al., 1997 ).
Furthermore, CSF was consistently represented by two different components, one with longer T 2 (around 1 s) and one with shorter T 2 (around 150 ms). This separation could be caused by the choroid plexus within the ventricles or by flow effects. The identification of the veins and arteries component as shown in Fig. 5 was not observed before in multicomponent relaxometry to our knowledge. This component had high T 1 and T 2 times (both around 1s) and consistently appeared in all subjects and scans. Further research into physiologically understanding of this fluid-like component would be an interesting direction for future work.
Estimated T 1 relaxation times showed a very high repeatability, but for some components the maximum T 1 value of the dictionary was selected. The use of a maximum T 1 value of 6s instead of 3s did not affect this biasing effect (data not shown here) and was therefore not used any further. For better estimation of T 1 relaxation times in CSF a different MRF sequence and dictionary range would probably be required to improve discrimination between long T 1 components.
The consistent estimation of a separate myelin water fraction (MWF) map with the SPIJN-MRF approach could be useful, e.g. for assessment of multiple sclerosis as well as other white matter diseases ( Mackay et al., 1994 ). A comparison to other MWF estimation methods ( Piredda et al., 2021 ) would be needed to further validate our technique, although differences between methods are known ( Alonso-Ortiz et al., 2018 ;.
A SPIJN regularization of 0.03 was used in the analysis resulting in reproducible estimates. We observed that changes from 0.025 to 0.035 led to only small changes in MW and veins/arteries component relaxation times and volume fractions, while WM/GM/CSF were less affected by the regularization. The required regularization will be affected by the SNR level and might therefore require adaptation for different scanners, sequences, or reconstructions.
The study on the repeatability of the SPIJN-MRF yielded CoVs < 10% for the total white matter volume (including myelin water), the gray matter volume and the CSF volume for all subjects. This variation of the estimated volumes with the SPIJN-MRF is similar to the variation found previously with commonly accepted methods, such as SPM12 and FSL ( Tudorascu et al., 2016 ;Klauschen et al., 2009 ). The estimation of the myelin water volume showed more variation (average CoV = 10%) than other tissues. The repeatability was similar across the whole brain, even though we applied a rather large slice thickness.This signifies the robustness of the method also in regions with large susceptibility vari-  Fig. 9. Combined Fuzzy Tanimoto Coefficient and CoV of estimated total volumes for each subject for white matter (including myelin), gray matter, CSF and total brain (white matter plus gray matter) obtained with SPIJN-MRF (blue), SPM12 (red), and FSL (green). Results from the data without motion artifacts are depicted using circles and solid lines, results obtained using all data are depicted with squares and dashed lines.
ations such as in areas with air (frontal part) or areas with more iron (deep gray matter).
The closer resemblance of the SPIJN-MRF maps to the reference in the simulations was reflected in higher FTC values with smaller standard deviations compared to the conventional methods. Especially in the CSF maps (both in simulation and in vivo ), more details of the anatomy were visible with the SPIJN-MRF method than with SPM12 and FSL. These results could indicate that SPIJN-MRF improves measuring partial volume properties of smaller brain structures. Furthermore, compared to SPM12 and FSL it may provide new information, such as the myelin water fraction. To validate the high accuracy in simulations, further investigation into the in vivo accuracy should be performed. Currently we considered this outside the scope of our in vivo study, which mainly focused on the repeatability of SPIJN-MRF. Post-hoc, we performed as a preliminary approach to an accuracy evaluation, a small follow-up study one of the subjects that was rescaneed four times indicating similar performance in volume estimation of SPIJN-MRF compared to conventional SPM12 and FSL segmentation using conventional, high resolution T 1 -weighted acquisitions (see Supplementary Fig. S9). Furthermore, FSL and SPM12 recommend using T1w-MPRAGE for segmentation, while our simulated images were created based on a spoiled FLASH sequence as was used for the BrainWeb simulations. Although the resulting images have a very similar contrast (see Fig. S10), the set of parameters for the synthetic 1 -weighted images can be optimized further and the reported accuracy might be slightly affected by the generated contrast.
SPM12 uses a probability atlas with anatomical information to segment the brain tissues, which could enhance the repeatability of results but potentially also introduces a bias. FSL uses a hidden Markov random field model and an expectation-maximization algorithm to obtain robust results with improved denoising, but this might lead to removing small brain structures. Instead, the joint sparsity multi-component MRF model does not incorporate explicit spatial regularization or anatomical a priori knowledge.
Differences in estimated brain tissue volumes between SPM12 and FSL were already previously reported ( Tudorascu et al., 2016 ). Our estimated volumes for FSL are higher in white and gray matter and lower in CSF compared to SPM12 with the simulated data just as in Klauschen et al. (2009) . Contrary to the simulated data, the estimated total brain volumes for the in vivo data by SPIJN-MRF are closer to the estimated volumes from SPM12 than those from FSL. Also, the relative volume per region calculated by SPIJN-MRF is closer to the relative volume calculated by SPM12 than FSL for the in vivo data. This could be caused by intrinsic differences between the T 1 -weighted brainweb im-ages (used in the simulations) and the MRF based T 1 -weighted images (used in the in vivo experiments). The differences in the contrast could result in slightly different classification of the voxels and affect the local partial volume estimates.
In contrast to the simulations, the CoVs of SPIJN-MRF for in vivo data reflected higher variability than those of SPM12 and FSL for almost all subjects and tissues. The low variability of SPM12 and FSL, also in comparison with previous studies ( Tudorascu et al., 2016 ;Klauschen et al., 2009 ), could be due to the use of synthetic images. This is because the quantitative parameters (M 0 , T 1 and T 2 ) from MRF were highly repeatable, differing only between 0 and 2% amongst all the acquisitions. As a consequence the resulting synthetic T 1 -weighted images are also highly similar. This likely biases the repeatability of the SPM12 and FSL segmentations. We chose to create synthetic T 1 -weighted images in order to have perfect spatial correspondence of the data. As such differences in imaged volume were avoided.
SPM12 and FSL demonstrated to be slightly more robust against motion artifacts than SPIJN-MRF, resulting in minimal differences in CoV and FTC when using data without and with motion artifacts. Although motion effects were observed in a relatively small number of slices, it did affect the volume estimates. Simultaneously, however, estimated T 1 and T 2 relaxation times were not affected by motion affected slices. Nevertheless, our results show that efforts to minimize the impact of subjectmotion in the MRF data may enhance in particular the repeatability, e.g. by applying motion-correction ( Cruz et al., 2019 ) or 3D-acquisitions with possible use of navigators ( Johnson et al., 2011 ) or self-navigations ( Kurzawski et al., 2022 ).
A limitation of this work is that the evaluation of accuracy was mostly done on simulated data (see Figs. 2 , 3 and Supplementary Fig.  S9). Further validation of segmentation accuracy could be performed, for example by assessment of the segmentations through expert neuroradiologists. Future work could also take into account other potentially relevant aspects in the simulations, such as: reconstructions from undersampled data, modeling of B 0 or B 1 + inhomogeneities, representation of the presence of myelin water or inclusion of other biological phenomena such as magnetization transfer or flow.
Another limitation could be that we did not include B 1 + field inhomogeneity as a parameter in our SPIJN-MRF estimation. B 1 + inhomogeneity might affect the SPIJN-MRF parameter estimation and especially MWF estimation. However, we did not observe particular spatial variations in the obtained fraction maps that appear to resemble smoothly varying B 1 + inhomogeneities. Further research (including an acquired B 1 + map) would be needed to explicitly study the effect of B 1 + and to analyze the potential benefits of accounting for B 1 + variation in the estimation of the tissue fraction maps.
Furthermore, optimizing the MRF sequence for multi-component MRF estimations ( Heesterbeek et al., 2021 ) to make it more sensitive to myelin water or to improve the distinction between gray and white matter could lead to further improvement of estimations, reduced scan times or the possibility to use a reduced slice-thickness. However, we consider the implementation of such optimization beyond the scope of our current paper.

Conclusion
We studied the accuracy and repeatability of the SPIJN-MRF method in simulations and in vivo brain MRI scans. SPIJN-MRF showed more accuracy and higher repeatability in simulated data than conventional methods (SPM12 and FSL). In the in vivo data, SPIJN-MRF consistently identified the same brain tissue components and gave highly repeatable relaxation times related to these tissues. SPIJN-MRF partial volume maps showed small details, also in CSF. The repeatability of the estimated brain tissue volumes of SPIJN-MRF was somewhat lower compared to SPM12 and FSL, possibly due to simulation bias . A further advantage of using SPIJN-MRF is the additional simultaneous estimation of MWF maps, which is not obtainable with single compartment-based methods.

Data and code availability statement
I n-vivo reconstructed data is publicly available at 10.5281/zenodo.5576039 .

Declaration of Competing Interest
The authors declare that there is no conflict of interest.

Data Availability
In vivo reconstructed data is publicly available at doi.org/10.5281/ zenodo.5576039 . The authors do not have permission to share data.