Bayesian non-linear regression with spatial priors for noise reduction and error estimation in quantitative MRI with an application in T1 estimation

Purpose. To develop a method that can reduce and estimate uncertainty in quantitative MR parameter maps without the need for hand-tuning of any hyperparameters. Methods. We present an estimation method where uncertainties are reduced by incorporating information on spatial correlations between neighbouring voxels. The method is based on a Bayesian hierarchical non-linear regression model, where the parameters of interest are sampled, using Markov chain Monte Carlo (MCMC), from a high-dimensional posterior distribution with a spatial prior. The degree to which the prior affects the model is determined by an automatic hyperparameter search using an information criterion and is, therefore, free from manual user-dependent tuning. The samples obtained further provide a convenient means to obtain uncertainties in both voxels and regions. The developed method was evaluated on T 1 estimations based on the variable flip angle method. Results. The proposed method delivers noise-reduced T 1 parameter maps with associated error estimates by combining MCMC sampling, the widely applicable information criterion, and total variation-based denoising. The proposed method results in an overall decrease in estimation error when compared to conventional voxel-wise maximum likelihood estimation. However, this comes with an increased bias in some regions, predominately at tissue interfaces, as well as an increase in computational time. Conclusions. This study provides a method that generates more precise estimates compared to the conventional method, without incorporating user subjectivity, and with the added benefit of uncertainty estimation.


Introduction
Quantitative magnetic resonance imaging (qMRI) can provide measurements of tissue properties that are independent of the exact details of the data acquisition, and simultaneously often provide interpretations of measurements (Tofts 2003). Several applications of qMRI can be found in cancer diagnostics and follow-ups, e.g. prostate cancer staging using apparent diffusion coefficient imaging (Fütterer 2016) and dynamic contrast-enhanced MRI (DCE-MRI) for early response assessment (Pham et al 2017). There are also numerous applications outside of oncology, for example T 1 and T 2 relaxometry in assessment of multiple sclerosis (Bitsch et al 2001, Tozer et al 2005, Parkinson's disease (Nürnberger et al 2017), and renal function (Wood 2014).
Quantitative parameters are usually obtained by fitting a non-linear regression model to the data in each voxel. Typically, the model is fit by minimising the mean squared error, corresponding to a maximum likelihood estimator assuming independent Gaussian noise in the data. The voxel-by-voxel approach is intuitive and simple to implement, but may result in noisy parameter maps, in particular when the model contains parameters that are difficult to estimate. The noise in the parameter maps can be reduced by including a regularisation term when fitting the model to the data. In a Bayesian interpretation, this regularisation term can be seen as the corresponding prior distribution over the model parameters. Structured spatial regularisation terms and priors have been used in qMRI by several authors. They have, for instance, been used when estimating relaxation times and proton density (Wang and Cao 2012, Baselice et al 2016, Kumar et al 2012, Raj et al 2014, in diffusion and intra-voxel incoherent motion (IVIM) estimation (While 2017, Orton et al 2014, for B 0 -estimation (Baselice et al 2010), and for dynamic contrast-enhanced MRI (DCE-MRI) (Schmid et al 2006, Kelm et al 2009, Sommer and Schmid 2014, Bartos et al 2019. The inclusion of structured spatial regularisation terms or priors usually means there will be regularisation parameters or hyperparameters that must be tuned to the data. Selecting the hyperparameter is difficult, and how it is done varies between studies. Examples of different approaches include visual inspection (Kumar et al 2012), finding good parameters for a representative dataset (Bartos et al 2019, Wang and Cao 2012, Freiman et al 2013, L-curve analysis (Kumar et al 2012), constraints on the size of the residuals (Raj et al 2014), and non-informative priors have been used in the Bayesian framework (Orton et al 2014).
An often overlooked part of qMRI is the uncertainty in the parameter maps. Knowledge of the uncertainty can add much value since it gives a means to determine to what degree the data can be trusted, enables more accurate statistical analysis, and can be used as a tool when optimising the image acquisition. Typically, the uncertainty is estimated in each voxel separately, using for instance linear error propagation (Schabel andParker 2008, Garpebring et al 2013). However, this simple approach is no longer valid if spatial regularisation or priors are used. In that case, one needs to take into account that the noise in the parameter maps is spatially correlated. By modelling the entire distribution for the parameter maps and sampling from this distribution using, for example, Markov chain Monte Carlo (MCMC), it is possible to obtain the uncertainties in the presence of spatial priors, as has been demonstrated in a few studies, e.g. in Orton et al (2014)), Schmid et al (2006), Sommer and Schmid (2014), Glad andSebastiani et al (1995), De Pasquale et al (2000). This approach has the added benefit, relative to linear error propagation, that the non-linear noise propagation is accounted for, which is highly relevant when there is much noise, and for models that are highly non-linear for some parameter combinations. The downside is that MCMC can be computationally expensive.
Ideally, the estimation of parameter maps should incorporate some prior knowledge in order to reduce uncertainties; it should include automatic selection of hyperparameters to make the method less subjective and make it more easy to use; and include error estimation to improve the understanding, interpretability, and reliability of the parameter maps. The purpose of this work was to develop a method that can achieve this, and evaluate it for the case of T 1 -estimation based on the variable flip angle (VFA) method.

Statistical model
In the descriptions that follow, θ ∈ R V×2 denotes a collection of two tissue parameter maps over V voxels. An individual parameter map is denoted by θ ·,p ∈ R V , where p ∈ {1, 2} is the index of the particular parameter. Let v ∈ {1, 2, . . . , V} be an index over the voxels in the analysed region. All parameters in the vth voxel are then denoted by θ v,· ∈ R 2 . The value of a specific tissue parameter, p, at voxel v is denoted θ v,p .
The relationship between the tissue parameters and the measured VFA signal is modelled as where m ∈ {1, 2, . . . , M} is an index for each acquired image, i.e. an index for the flip angles, the analysed magnitude signal, y v,m , is an element of y ∈ R V×M , the s m (θ v,· ) denotes the spoiled gradient echo (SPGR) signal model, and ε v,m denotes independent Gaussian noise with mean zero and the same variance, σ 2 , for all voxels and flip angles. The assumption of Gaussian distributed noise is valid in our case of high SNR conditions. At lower signal intensities, the noise needs to be modelled as Rician to be properly described (Gudbjartsson and Patz 1995). The SPGR signal model is given by Tofts (2003): where θ v,· = [ρ v , T 1v ] T . At each voxel, indexed by v, the ρ v is proportional to the proton density and T 1v is the spin-lattice relaxation time. The T R denotes the repetition time and α m , for m = 1, …, M, denotes the different flip angles. Also, note that any dependence on the echo time has been absorbed into ρ v . Estimating the parameters of this model has traditionally corresponded to maximising the data likelihood, i.e. p(y | θ, σ 2 ). Since the model targets, y v,m ∼ N (s m (θ v,· ), σ 2 ), are assumed independent, the data likelihood becomes Equations (1) and (2) can be used for non-linear least squares regression between the measured signal and the VFA signal to estimate the parameters of the model, corresponding to maximising the data likelihood in equation (3). In this work, we present an extension to the maximum likelihood approach where we employ an hierarchical Bayesian model that incorporates a spatial prior over the parameters. The prior probabilities for the tissue parameters are modelled as neighbourhood dependencies expressed using the total variation function (Rudin et al 1992), TV(θ ·,p ), as the energy function in a Boltzmann distribution (Gibbs distribution).
Let λ = [λ 1 , λ 2 ] T be positive hyperparameters that relates to the image gradient for the two parameters in each spatial location, then the joint spatial prior distribution over the parameters is given by in which C TV is the normalising factor (partition function) and TV(θ) is given as the the common discrete approximation of the total variation function, i.e.
for the case of a 2D image with voxels indexed by i and j, such that v(i, j) maps the spatial coordinates to the linear indices of the input vector, x. That is, the total variation function is the sum of the norms of the spatial gradients at each voxel, here approximated by the forward difference. There is a possibility that a voxel indexed by (i + 1, j) or (i, j + 1) does not exist (as is the case along the borders and in the corners). In that case, the quadratic term containing this term is set to zero (i.e. essentially using reflection padding). To make the equations easier to read, these special cases have not been included here but are implicitly assumed. Details on how the form of the total variation prior was derived is given in appendix A. The parameters are also constrained by a uniform prior over a compact feasible region. This is needed to make the total variation-based prior proper and to ensure convergence in the sampling-procedure. This is given by where l and h denote the lower-and upper limits of the uniform prior, U, in terms of ρ and T 1 . Hence, the full prior on the parameters is given as The hyper-prior for σ was defined using the non-informative prior i.e. the Jefferys' prior for the standard deviation σ > 0 for Gaussian distributions (Gelman et al 2014b). The prior over λ was set to which is a non-informative (improper) prior with unknown hyper-parameter κ. Combining equations (3), (6), (7), and (8) gives us the full log posterior that was used in this work, namely, The joint prior of the total variation and uniform priors in equation (6) is proper in this case, since its integral over the support of the uniform prior is finite.

Parameter estimation
Inference was done by drawing samples from the posterior in equation (9) using MCMC. To sample from the posterior density, we utilised a blocked Gibbs sampler, comprised of a Gibbs sampler (Geman and Geman 1984) for the hyperparameters, σ 2 and λ, and the affine invariant ensemble sampler by Goodman and Weare (2010) for the likelihood parameters, θ.
The conditional densities for the hyperparameters are straight-forward to derive from equation (9), and can easily be found to be i.e. an inverse gamma distribution with shape parameter A σ = (MV − 1)/2 and scale parameter i.e. a gamma distribution with shape parameter A λ i = (1 − κ)V + 1 and scale parameter B λ i = 1/TV(θ ·,i ), with i ∈ {1, 2} and j = 3 − i. The conditional density for θ is a bit more involved. We write the TV term in equation (9), using the approximation in equation (4), as We note that all terms associated with a voxel with index (i, j) can be identified by looking at a 3 × 3 neighbourhood around that voxel, where voxel (i, j) is in the centre. This is illustrated in figure 1. Hence, a conditional distribution for the centre voxel only depends on the surrounding 3 × 3 neighbourhood. We exploit this property in order to develop an efficient sampler.
It is clear from figure 1 that only three terms are required in the part associated with the total variation prior for a particular voxel, and hence we can construct a conditional log-posterior density over the parameters in a voxel v as where θ −v,· denotes all voxels except the one with index v. Hence, all voxels at least three voxels apart, i.e. those that lie in different 3 × 3 neighbourhoods, can be sampled independently (and therefore also in parallel).

Figure 1.
A zoomed-in view of a 3 × 3 voxel region. Each term in the total variation (equation (10)) has one central voxel (green square) and two neighbouring voxels (green circles). The voxel with index (i, j) will be included in three groups of voxels, illustrated here with black connecting lines holding the voxels in a group together. These three groups correspond to the last three summation terms in equation (11). Since the conditional distribution for the centre voxel only depends on the 3 × 3 neighbourhood, the analysed region with a total of V voxels can be split into V/9 subsets of 3 × 3 regions for parallel sampling.

Implementation
The proposed sampling method is summarised in algorithm 1 and was implemented in MATLAB R2018b (The MathWorks, Inc., Natick, MA, USA). The algorithm returns N s samples of θ, λ, and σ 2 . In our implementation, the parameters of the uniform prior, see equation (5), were set to l ρ = 0, h ρ = 10 5 , l T1 = 0 s, and h T1 = 10 s, thus constraining the estimators to exclude physically unreasonable high-and low tissue parameter values. The algorithm starts by computing an initial guess for the model parameters using INITIALGUESS, by first spatially averaging all signal in the data and fitting a single curve to this average using an ML estimator. This yields a single pair of T 1 and proton density values which are distributed over all V image voxels. To satisfy the requirement that AFFINEINVARIANTSAMPLER needs several (N w ) walkers with nonidentical initialisation, randomness is added to each voxel and for each walker. 10 % uniform noise with zero mean is used for both T 1 and the proton density. A starting guess for σ 2 is obtained from the residual resulting from applying the parameters of the single pair of T 1 and proton density at each voxel.
Before producing the samples, the algorithm generates N b burn-in samples that are discarded. We adapted the method of Betancourt (2010) to determine when the burn-in phase was over, i.e. computing the number of samples (N b ) required before the chains have converged to the posterior in a sufficient manner for practical use. The proposed method is presented in appendix B.
Thinning was used to reduce the correlation between samples by computing N s × N t samples, from which every N t sample was kept. To determine the amount of thinning (N t ) to use, we estimated the effective sample size by computing the autocorrelation time, R, of the chains, and then stored every ⌈1/R⌉ sample (Gelman et al 2014b).
The AFFINEINVARIANTSAMPLER performs MCMC sampling using the affine invariant sampler of Goodman and Weare (2010). This sampler has been shown to perform significantly faster than standard MCMC algorithms on highly skewed probability distributions. The sampler was tuned for best performance for T 1 estimation (in terms of the number of independent samples per second) by performing a grid search over Algorithm 1: A blocked Gibbs sampling algorithm that returns Ns × Nw samples of θ, λ, and σ 2 .
relevant sampler settings. A step size of 4.0, thinning of 8, and 10 walkers were found to perform well over a large range of values for κ.
The SAMPLER denotes any generic function that draws samples from the given distribution; we used the default random number generators that are available in MATLAB for the uniform, gamma, and inverse gamma distributions. The arguments to the samplers are the density functions that samples should be drawn from.
To speed up the algorithm, we make use of the fact that the conditional probability, equation (11), for a particular voxel only contains terms associated with voxels in its immediate surrounding. That is, the loop over V voxels is implemented as a loop over 9 subsets of voxels and the AFFINEINVARIANTSAMPLER performs calculations at V/9 voxels in parallel, see figure 1 for more details on these voxel subsets. To keep notation simple and not introduce yet another index, this implementation detail is not illustrated in algorithm 1.
To determine the value of the hyperparameter κ, the only unknown hyperparameter in this work, we propose to use the widely applicable information criterion (WAIC, also known as the Watanabe-Akaike information criterion) (Watanabe 2009). We used the criterion denoted WAIC 1 in Watanabe (2009) to evaluate the relative quality of our model as a function of the hyperparameter κ. The WAIC is a fully Bayesian method to estimate the out-of-sample generalisation. It is a generalization of the Akaike information criterion (AIC), an improvement of the deviance information criterion (DIC), and is asymptotically equivalent to Bayesian cross-validation (Watanabe 2009, Watanabe 2010, Gelman et al 2014a. The WAIC is described in more detail in appendix C. A grid search for κ in equation (8) was conducted in the range [−0.5, 0.99] in steps of 0.01. The resulting curve was smoothed using a moving average in order to find a κ min less affected by noise in the estimations, and the κ corresponding to the minimum WAIC value in the grid was selected. Code for running the above implementation, complete with a synthetic dataset, is available at the following GitHub repository: https://github.com/MaxHellstrom/Bayesian-VFA-T1-estimation.

Data
Two sets of data were used in the evaluation of the proposed method: one synthetic dataset with known ground truth, and one dataset with eight volunteering patients. 6 2.3.1. Synthetic data An axial brain slice from the BrainWeb phantom (Kwan et al 1996, Cocosco et al 1997, Collins et al 1998, Kwan et al 1999  These tissue parameters were then used to compute spoiled gradient echo (SPGR) data according to Equations (1) and 2, with T R = 6.8 ms, T E = 2.1 ms, flip angles α = 2 • , 4 • , 11 • , 13 • , and 15 • . Complex circular Gaussian noise was used to generate synthetic noise, so that the magnitude SPGR signal follows the Rician distribution. The variance of the noise was tuned until it reached the same level as the in-vivo dataset by visual comparison. One hundred multi-flip angle images with independent noise were generated this way.

2.3.2.
In-vivo data A total of eight patients volunteered for this study and were scanned with a 3D SPGR sequence on a GE Signa 3 T PET/MR scanner. The patient group consisted of 7 men, and 1 woman, in the age span 39 to 65 year old (with a mean age of 52 years). The scans were conducted with identical T R , T E , voxel size, and flip angles as for the synthetic dataset. The acquisition was conducted with a matrix size of 256 × 256 × 8 (8 axial slices) and a pixel bandwidth of 488 Hz/pixel. Bloch-Siegert based B 1 mapping was performed to enable corrections of flip angles. This study was conducted in accordance with the principles embodied in the Declaration of Helsinki and was ethically approved by the regional research ethics committee (dnr: 2019-02666). Informed consent was obtained from all patients.

Analysis
Three different estimators, referred to as ML, B unif , and B TV , were evaluated on both the synthetic and the in-vivo datasets. The first estimator (ML) was a conventional maximum likelihood estimator, where the parameter maps were obtained by minimising the negative log-likelihood independently in each voxel, v, i.e. solving the program and assigning the result to the parameter estimateρ v andT 1v , for v = 1, …, V, and where R ⊂ R 2 is the compact feasible region for the parameter values as defined by the uniform prior. This program was solved using the trust-region-reflective algorithm as implemented in MATLAB R2018b. The estimators B unif and B TV correspond to Bayesian approaches with different priors. The difference between these two estimators is that B unif only uses the uniform prior on the parameters, while B TV combines the uniform and the total variation priors. Both estimators used the blocked Gibbs sampling algorithm (see algorithm 1) and collected N s = 1 280 samples (after thinning) from the posterior distribution. Point estimates for B unif and B TV were chosen to be the sample means, i.e. and The synthetic dataset with 100 generated multi-flip angle images were used to investigate the performance of the point estimate when using the different estimators. The normalised bias and normalised standard error, i.e. the coefficient of variation, CV, were computed as In the equations above, b is an index over the N B = 100 generated multi-flip angle images, T 1v,b and ρ v,b are the estimates of T 1 and ρ at voxel v for image b. The average estimates are defined as The references values T ref 1v and ρ ref v correspond to the ground truth images from which the synthetic data was generated. To conduct a quantitative evaluation of the Bayesian estimators B unif and B TV , we adapted the method of Sjölund et al (2018) and constructed probability-probability (P-P) plots from the posterior samples. To illustrate this concept, consider a set of samples from the posterior distribution with known ground-truth reference in each voxel. A P-P plot can then be created by calculating the frequency, across all voxels, with which the reference value is smaller than the pth percentile of the posterior samples. A one-to-one P-P ratio is an ideal case, i.e. when the reference value is smaller than the pth percentile in p % of the cases. This P-P ratio can act as a sanity-test of our choice in priors. This was conducted exclusively for the synthetic data where ground truth data are available.

Tissue parameter estimation
Parameter maps computed using all three estimators are shown in figure 2. These maps are estimated from the data from one of the patients in the in-vivo dataset (patient 1). Previous to the imaging occasion, the patient went through surgical removal of a grade 2 astrocytoma tumour, which is clearly visible in all images. In figure 2, the voxel intensity equals the parameter estimates ρ v (in 2-c) and T 1v (in 2d-f). The same parameter estimates are presented as normalised histograms, ρ v (in 2g-i) and T 1v (in 2j-l). These histograms present the same results as in (2a-f), but sorted in order of increasing parameter values, instead of spatial location. The difference between the three estimators is shown more clearly in figure 3, which shows a zoomed-in version of the T 1 estimate in figure 2(d)-(f). The degree to which fine-grained details are affected by the priors is shown in the top row of figure 3, which shows a region with several fine structures. By comparing 3e and 3f, i.e. the introduction of the TV-term, we see that fine structures are still clearly visible after the smoothing has been applied. The B unif estimator delivers estimates in a similar way as the conventional estimate (ML). Further, it is clear that B TV delivers a distinguishable amount of noise reduction, as well as an overall T 1 reduction in homogeneous regions of high T 1 value. When comparing the T 1 histograms (2j-l), we see that the distribution produced with B TV (2l) differs significantly from the ones produced with ML and B unif (2j-k) in the sense that the distribution is more tightly concentrated at three clearly visible peaks. These peaks are located at approximately 0.6 s, 1.2 s and 4.7 s. Parameter maps computed for the other volunteers in the in-vivo dataset were very similar and are presented in appendix D.
Parameter estimates with ML, B unif , and B TV were also conducted for the synthetic dataset of 100 generated multi-flip angle images. For the in-vivo dataset, the mean computation times, including burn-in, were 6.0 (ML), 6.1 (B unif ), and 31 (B TV ) minutes per patient (one axial slice). The corresponding numbers for the synthetic dataset was about 6.1, 5.6, and 28 minutes. For both datasets, the burn-in phase took about 45 % of the total computation time. The difference in computation time between the two datasets is likely affected by the number of pixels in the analysed region. Please note that hyperparameter search for κ is not included in the above stated estimation time. This is presented in detail in section 3.2. All computations were performed on a 3.7 GHz Intel Core i7-8700K processor. Figure 4 illustrates the utility of obtaining distributions of likely values rather than only a single point-estimate. A ground truth synthetic T 1 map is shown together with histograms of the samples acquired with B unif and B TV using one of the 100 synthetically generated multi-flip angle images. The total variation prior in B TV produces more narrow histograms in all four locations. This is most evident in regions with large T 1 values. Further, the magnitude of the estimation bias varies between the different locations and tends to be larger in regions with large T 1 gradient. By observing e.g. figure 4(d), we see that B unif produces a positive estimation bias, while B TV produces a negative bias. Please note that the histograms in figure 2 present the distribution of ρ v and T 1v estimates in all voxels, while the histograms in figure 4 present the distribution of T 1 samples in a voxel at four different locations.  P-P plots of observed versus theoretical probabilities as well as the normalised standard deviation in the posteriors are presented in figure 5. A slight deviation from the ideal one-to-one P-P ratio is visible in both estimators, most evident in B TV at higher probabilities.
The normalised bias and standard deviations, as defined in equations 14 through 17, are shown in figure 6. The left 3 × 3 array plot corresponds to the ρ v estimate, and the right one corresponds to the T 1 estimate. Comparing the coefficient of variation in 6(a-c) shows an increase in precision when applying B TV . This is most evident in regions with high parameter values and low signal, such as in the ventricles. In 6(d-f), we see that B unif and B TV introduce some level of estimation bias. The effect of the TV term when using B TV is clearly visible in regions with sharp edges, where it introduces a negative estimation bias (see 6(f)). This bias is also presented in 6(g-i), but in terms of parameter values without spatial information.

Sampling parameters
The hyperparameter search for κ were conducted for both datasets prior to parameter estimation.  For the synthetic dataset, a single search was conducted for all 100 images, which resulted in κ min = 0.06. In addition to this, the effect of a difference in κ value was investigated by conducting parameter estimation with κ = κ min ± {0.05, 0.1}, i.e. comparing the parameter estimation result for slightly different κ values.

array plots) and
T1v (right 3 × 3 array plots) estimated from the synthetic-and in-vivo datasets. The estimation bias is also presented in ((g)-(i)), where it is expressed as a function of the reference parameter value, grouped in the percentile ranges: 1-99%, 5-95%, and 25-75%. The dashed red line is the ideal 1:1 ratio. The largest voxelwise difference was observed at κ = κ min − 0.1 (+ 6.1 % in ρ v and + 4.0 % in T 1v ). This is presented in figure 8.
Maps of the calculated required thinning, based on the autocorrelation, were investigated for both datasets, where it was evident that the amount of required thinning varies spatially. Further, it peaks at about 4 using B unif and 8 when using B TV . Therefore, the peak values 4 and 8 were chosen when conducting parameter estimation with B unif and B TV , in order to ensure a sufficient amount of uncorrelated samples in all voxels.

Discussion
The purpose of this work was to develop a method that can reduce and estimate uncertainty in quantitative MR parameter maps without the need for hand-tuning of hyperparameters. To this end, a Bayesian hierarchical non-linear regression model with a total variation prior was used to enable noise reduction. Uncertainties were obtained through sampling from the corresponding posterior distribution, and WAIC was used to automate the selection of hyperparameters. The method was evaluated for parameter estimation of proton density and T 1 relaxation time on VFA data. The main finding in this work is that noise reduction, uncertainty estimation, and automatic hyperparameter tuning can be combined, and that the method is applicable to VFA based T 1 estimation.
An important consideration when applying Bayesian methods for parameter estimation is the choice of prior. This choice entails both selecting the type of prior to use and also to what degree one should trust the prior or the data. With sufficient amounts of data, uninformative priors can be used, e.g. Orton et al (2014). However, in this work, we used the WAIC to compare models and based on that select the hyperparameters for the prior. This approach is novel in the context of qMRI and is much more objective than varying the hyperparameters until the images look good, or finding an optimal hyperparameter using e.g. synthetic data and using that value for subsequent estimations. A TV-based prior was used in this work due to its previous success in image denoising, including in the context of T 1 estimation using VFA data (Wang and Cao 2012). As can be observed in figures 2, 3 and 6, the TV-prior is quite successful in reducing uncertainties in T 1 and proton density values, and from a qualitative point of view, sharp borders at tissue interfaces, e.g. in figure 3(f) and the fine-grained details in figure 3(c) seem to be preserved. However, when looking more closely at the result in figure 6, it is clear that bias is introduced by the TV-based prior, in particular at interfaces between different types of tissues. This causes slightly incorrect tissue parameter values in interface-regions and fine-grained details. This bias is expected since the prior makes assumptions about differences between neighbouring voxels. This is also a sign that there likely is room for improvements. For instance, one could imagine that the use of two separate hyperparameters for the strength of the prior-one for each parameter, or using a weighted TV-based prior (e.g. such that the data is trusted more near borders)-could result in further improvements. Using other priors, e.g. exploiting wavelet compressibility (Ji et al 2008), could potentially improve the results. However, the MCMC algorithm used in this work is likely to perform poorly in that particular case since the developed algorithm's speed relies on each voxel being conditionally dependent only on its immediate neighbours.
It is important to mention that this estimation bias does not include contributions from inaccuracies caused by errors in the B1 map, insufficient signal spoiling, and other physical factors from the data acquisition itself. Although beyond the scope of this study, these factors could be included in a Bayesian model to incorporate their effects on the estimated uncertainty.
Obtaining the uncertainty of the estimated parameters was a key goal of this work. Several important statistical questions may be posed when uncertainty information is provided, such as, what the probability is that a parameter is above or below a certain value of clinical relevance. One particular strength of Bayesian statistics and sampling of parameter maps is that the spatially correlated noise will be properly captured, which is very important in order to obtain correct uncertainty estimates in ROI analyses, for instance.
In this work, we utilised P-P plots for quantitative evaluation of the parameter uncertainty estimation. Looking at the results in the P-P plots in figure 5(a), (d), we see a near-ideal match between observed and theoretical probability when using B unif . When using B TV , we see that the observed probability is slightly below the ideal 1-1 ratio. e.g. in figure 5(d) where only 91 % of the true T 1 values are in the 95th percentile. The cause of this can be either a biased posterior distribution or one with too small tails.
Although very powerful and flexible, Bayesian methods also have some downsides. Using MCMC for inference can be very computationally demanding, and may introduce complexity in terms of parameters that must be tuned for good performance, e.g. the burn-in length and amount of thinning. To reduce this complexity, we implemented a method that automatically determines the burn-in length. To reduce the computation time, we exploited the fact that parameters at different voxels can be updated in parallel in the Gibbs sampling scheme when the structure of the posterior allows for it (as was the case here). In this work, we used a TV-based prior, which assumes that voxels are conditionally dependent only on their immediate neighbours. Hence, the voxel could be updated in blocks as depicted in figure 1. This results in a substantial speed-up when using languages such as MATLAB that heavily rely on vectorisation for speed. However, even after vectorising the computations, the method cannot be considered fast. Obtaining a single slice (approximately 27 000 non-masked voxels) requires 18 minutes of burn-in and an additional 20 minutes of sampling to obtain 1 280 uncorrelated samples when using the TV-based prior. If several slices are to be computed, the estimation may take hours, which can be prohibitive-especially in clinical applications.
In addition to the estimation time, the search for the hyperparameters must also be added. A very fine grid was used in this work, requiring about 9 hours to find the hyperparameter κ for a single slice. This is a clear limiting factor that can be improved in several ways. When observing the results in figure 7, the most straight-forward approach to increase the efficiency in the hyperparameter search is to empirically narrow the search window. The maximum spread in κ min we observed was 0.11, i.e. very small in comparison to the entire range we tested in this work. Simply narrowing the search grid to the range [0, 0.25] would decrease the computational time to about 1.5 hours per slice. Further, a coarser grid could likely be used since a difference in κ less than 0.1 had a relatively small effect, see figure 8, on the produced images. This small difference also implies that one likely only need to do the search for a single slice in a volume. Another promising approach is to investigate if it is possible to use a fixed κ value for all patients undergoing imaging with the same protocol. Our results in figure 7(b) and figure 8 suggests that it might be possible due to low patient variability in κ min . This would drastically improve the effectiveness of our method.
To further speed up the estimation and hyperparameter search, the most obvious next step would be to move to more powerful hardware, e.g. GPUs instead of CPUs. Based on our experience of applying GPUs for computations, we expect such a migration to give improvements of at least one order of magnitude. Although this would greatly facilitate the practical applicability of the method in many situations, the computation time may still be a problem-in particular for clinical applications. Thus, there is a need for improvements of the algorithm used for sampling, perhaps using other MCMC algorithms such a e.g. Hamiltonian MCMC (Duane et al 1987). For the hyperparameter search, the grid-search employed in this work is quite inefficient and better methods are needed in particular if more than one hyperparameter is used. Bayesian optimisation (Snoek et al 2012), for instance, is commonly used for efficient hyperparameter search in other machine learning applications and would thus likely be a valuable tool here as well.
This work focused on T 1 estimation using VFA data. The proposed method is likely also applicable to other types of qMRI, e.g. diffusion and T 2 /T * 2 quantification and possibly also DCE-MRI, since those estimations also imply finding a few parameters at each voxel location. Other possible generalisations and improvements could be to use other priors as mentioned above, utilise spatial information in 3D (e.g. 3D total variation), improve the gradient approximation method, and to use a larger family of priors and select the optimal one using WAIC, e.g. use WAIC to select between different functional forms for the prior and using a prior with more than a single hyperparameter, for instance, a separate κ value for T 1 and the proton density. Other possible improvements for this work would be to model the noise as spatially varying in order to properly describe the noise distribution when using e.g. parallel imaging techniques.

Conclusion
We have presented a novel framework for parameter estimation with associated error estimates in qMRI, applied for tissue parameter estimations of T 1 relaxation time. Due to automated hyperparameter selection, our method can deliver noise reduction without incorporating end-user subjectivity. Key features to address in future developments are to refine the design of the spatial priors, as well as to address the computational efficiency to enable more widespread use.

Conflict of interest
Tommy Löfstedt and Anders Garpebring are co-owners of NONPI Medical AB, the developer of MICE Toolkit-a software for medical image analysis that was used in this work.
The evidences, equations (B3) and (B4), can easily be computed analytically (using the error function). The posterior evidence (the denominator in equation (B2)) was computed numerically using Simpson's rule.
The chain was deemed having converged to the posterior when the mode of the posterior of the mixture coefficient was larger than 0.99. In that case, the model has found that the overwhelming majority of the voxels in the 'mean images' are better described by a single Gaussian than by two independent Gaussians.

Appendix C. The widely applicable information criterion (WAIC)
We used the criterion denoted WAIC 1 in Watanabe (2009), which is defined through the Bayesian and Gibbs training losses as where BL t and GL t are the Bayesian and Gibbs training losses, respectively, V is the empirical variance, o p denotes convergence (to zero) in probability, and β > 0 is an inverse temperature parameter that was set to β = 1 in our estimations-corresponding to Bayesian estimation. The criterion is related to the Bayes generalization loss as where BL g is the Bayes generalization loss, N is the number of data samples, and o is the little-o notation. The WAIC 1 can be estimated by Var log p(x i | θ)

Appendix D. In-vivo dataset
Tissue parameter estimates, calculated with the three estimation models ML, B unif , and B TV are presented in figure D1. Each row in figure D1 presents the estimated parameters for a different patient the in-vivo dataset (patients 1 to 8).