Fast and robust quantification of uncertainty in non-linear diffusion MRI models

Diffusion MRI (dMRI) allows for non-invasive investigation of brain tissue microstructure. By fitting a model to the dMRI signal, various quantitative measures can be derived from the data, such as fractional anisotropy, neurite density and axonal radii maps. We investigate the Fisher Information Matrix (FIM) and uncertainty propagation as a generally applicable method for quantifying the parameter uncertainties in linear and non-linear diffusion MRI models. In direct comparison with Markov Chain Monte Carlo sampling, the FIM produces similar uncertainty estimates at much lower computational cost. Using acquired and simulated data, we then list several characteristics that influence the parameter variances, including data complexity and signal-to-noise ratio. For practical purposes we investigate a possible use of uncertainty estimates in decreasing intra-group variance in group statistics by uncertainty-weighted group estimates. This has potential use cases for detection and suppression of imaging artifacts.


Introduction
1 Diffusion Magnetic Resonance Imaging (dMRI) allows for non-invasive in-2 vestigation of brain tissue microstructure.By fitting a dMRI model to all 3 data of each voxel, various quantitative measures can be derived, such as 4 fractional anisotropy (Basser et al., 1994;Pierpaoli & Basser, 1996), neurite 5 density (Assaf et al., 2004;Assaf & Basser, 2005;Zhang et al., 2012) and ax-6 onal radii maps (Assaf & Pasternak, 2008;Alexander et al., 2010).These 7 parameters require specific parametric models such as Bingham-NODDI 8 (Tariq et al., 2016) and CHARMED (Assaf et al., 2004).After model fit-9 ting, these quantitative measures can be used in statistical group analysis (e.g.Smith et al., 2006).Since the quantitative microstructure measures result from a model fitting process, there is inherent uncertainty in the estimated parameters.That is, the parameter estimates have only a finite precision (Jones, 2010).Despite early focus on quantifying uncertainty in diffusion MRI (e.g.Jones, 2003;Behrens et al., 2003;Pajevic & Basser, 2003), the majority of diffusion microstructure studies currently do not attempt to quantify parameter uncertainty and therefore do not evaluate the precision of their estimates on either the individual subject or the group level.
The main reason for this omission is the absence of a method for uncertainty quantification that is both generally applicable, robust, and fast to compute.With generally applicable we mean that the method should be applicable to the majority of current diffusion microstructure models without model-specific adjustments for non-linear models or models requiring data with multiple b-values (i.e.multi-shell data).With robust we mean providing a value close to the ground truth value (for simulations), or to a gold standard reference method (for measured data with unknown ground truth) in the majority of cases where assumptions are clearly met.With fast to compute, we mean that the method should be computationally efficient to the degree that additional time required for the calculation of uncertainty is short and proportional to the computation time of the point estimate itself, as such making it easily applicable in large group studies or population studies.Previous work in quantifying parameter uncertainties in diffusion MRI modeling have included bootstrapping approaches, MCMC sampling-based inference and the FIM analytical approach.Each has its advantages and disadvantages, summarized below.

Boostrapping
The bootstrap or bootstrapping method (Efron, 1992) is a non-parametric statistical technique based on repeated resampling which can, in principle, be applied to arbitrary models and estimation methods.The bootstrap has been applied in diffusion MRI modeling in several different forms.
The repetition bootstrap can estimate uncertainty in diffusion tensor imaging ( DTI Jones, 2003;Chung et al., 2006), but requires repeated acquisition of each of the gradient directions for resampling.This has several practical disadvantages.For instance, repeated acquisition means recording more data, costing more time and money.In addition, it risks introducing more noise as the subject is more likely to move during long acquisitions.
Post-processing time also increases as bootstrapping requires many model fitting procedures.Finally, for older data without repeated acquisitions the repetition bootstrap is not applicable.
A second bootstrapping method is the residual bootstrap (Chung et al., 2006).Instead of requiring additional acquisition requirements, the residual bootstrap resamples the model residuals.Unfortunately in heteroscedas-tic data the covariance estimates become biased and inconsistent (Flachaire, 2005).Since most of the current diffusion microstructure models require multi-shell data and the signal-to-noise ratio decreases in the higher bvalue shells, multi-shell data is heteroscedastic by nature.
The wild bootstrap (Chung et al., 2006;Whitcher et al., 2008;Polders et al., 2011;Vorburger et al., 2012Vorburger et al., , 2016) is a successor to the residual bootstrap and uses a heteroscedasticity consistent covariance matrix estimator specific to the model and estimation method to handle heteroscedastic models (Flachaire, 2005).This means that, similar to the residual bootstrap, the wild bootstrap may not always be available, as it has not been developed yet for the majority of diffusion microstructure models.
Bootstrap methods have been applied primarily in DTI, a linear model supported by single shell data and recently the wild bootstrap was formulated and applied to the mean apparent propagator (MAP) (d)MRI model.The MAP (d)MRI model is a model supported by multi-shell data, showing good uncertainty estimates, but runtimes for these estimates ranged from 40 minutes for a simple phantom dataset to 40 minutes per slice for an extensive multi-shell human dataset (Gu et al., 2019).

MCMC sampling
Bayesian sampling based inference, particularly Markov Chain Monte Carlo (MCMC) methods, can be applied to all diffusion microstructure models, including non-linear models requiring multi-shell data (Behrens et al., 2003(Behrens et al., , 2007;;Harms & Roebroeck, 2018;Wegmann et al., 2017;Gu et al., 2017).Besides their broad applicability, MCMC methods have the advantage that they can include compatibility with improper priors, such as priors with sparsity-induced automatic relevance determination which can help with model selection (Behrens et al., 2007).Furthermore, sampling-based inference has the property that for each sample obtained for the model parameters x i , a derived value can also be obtained for any function f (x i ), by just applying f (•) to the drawn samples, allowing inference on complex functions of the model parameters (Fonteijn et al., 2007).In essence, a well converged MCMC sampling chain, applied for a sufficient amount of time, will characterize the entire shape of the parameter distribution, including parameter correlations and multiple peaks indicating non-unique solutions.These properties have made MCMC a reference method for uncertainty quantification on non-linear diffusion microstructure models that require multi-shell data.Unfortunately, the nature of the sampling process generally requires long computation times, even with sampling strategies optimized for efficiency and highly parallelized implementation on graphical processing units (Harms & Roebroeck, 2018).

J o u r n a l P r e -p r o o f
Journal Pre-proof

Fisher Information Matrix
A third avenue for diffusion MRI uncertainty estimation is to formulate analytic expressions of uncertainty around a point estimate, as shown before in MRI analysis (Andersson, 2008;Alexander, 2008;Sj ölund et al., 2018).
Various techniques are available, for instance, under certain assumptions on priors and noise model, the Bayesian posterior becomes analytically tractable and assessing uncertainty amounts to implementing and estimating the corresponding function (Broemeling, 2017).For linear diffusion models (e.g.Diffusion Tensor Imaging), a method for computing and using uncertainty estimates has been suggested (Sj ölund et al., 2018), but this method is not applicable non-linear diffusion models (e.g.BinghamNODDI, CHARMED).
For non-linear diffusion MRI models, the Fisher Information Matrix (FIM), related to the Cramér Rao Lower Bound (CRLB; Rao, 1945;Cramer, 1946), can be used as a fast, accurate and generally applicable method for quantifying the parameter uncertainties.The FIM allows for estimating the variance, or uncertainty, around the maximum likelihood point estimate, which is the point estimate typically used in individual diffusion MRI model analysis and in subsequent group statistics.When applied on a known ground truth parameter vector (rather than an estimate), the FIM can be used to bound the variance of any unbiased estimator.This lower bound is known as the CRLB.
The relevance of the FIM has been suggested before in the context of diffusion MRI.In Andersson (2008), the FIM is used as an uncertainty measure for the linear DTI model under a Rician noise model, and it is demonstrated how the FIM can be used as a generic tool for designing optimal diffusion experiments.In Alexander (2008), the FIM for non-linear diffusion models under a Rician log likelihood is derived to provide a generally applicable CLRB for non-linear model experiment design optimization, and is used for a wider range of nonlinear models in Drobnjak et al. (2010).The FIM and CRLB have been used more widely in the context of MRI sequence optimization (e.g.Wyatt et al., 2012;Lankford & Does, 2018;Brihuega-Moreno et al., 2003;Gras et al., 2017;Zhao et al., 2018;Teixeira et al., 2018;Lee et al., 2019).
Despite its use as a sequence optimization tool and recognition as an essential parameter uncertainty estimate in fields such as astrophysics (Vallisneri, 2008;Rodriguez et al., 2013), the usage of the FIM for uncertainty quantification of diffusion microstructure parameters is scarce except in specific cases (e.g.Versteeg et al., 2018).Furthermore, the use of uncertainty propagation remains relatively unexplored as a tool for uncertainty quantification of arbitrary functions of diffusion model parameters, extending uncertainty quantification to derived diffusion parameters such as frac-tional anisotropy (FA) or fraction of restricted compartment (FR).
In this work we aim to evaluate the Maximum Likelihood Estimator (MLE) together with the FIM and uncertainty propagation, as a fast, robust, and generally applicable method for quantifying the parameter uncertainties in linear and non-linear diffusion MRI models for single-shell and multi-shell data.

Research aims
We first investigate the robustness of the FIM, concretely, if the uncertainty estimates agree with MCMC estimates as a reference method, using multiple datasets and multiple dMRI microstructure models.We then investigate several data and model characteristics that can influence the parameter uncertainty, such as model and data complexity and Signal-to-Noise Ratio (SNR).Finally, we investigate a possible use of uncertainty estimates in decreasing intra-group variance in group statistics by uncertainty-weighted group estimates, which may be used in detection and suppression of imaging artifacts.

Methods
The method section is subdivided into four subsections.We start with details about the FIM and MCMC methods to obtain parameter uncertainties in 2.1, followed by a theoretical comparison of these methods in 2.2.Details on variance weighted averaging are presented in 2.3, followed by a summary of the used diffusion MRI models in 2.4.The datasets and simulations are detailed in subsection 2.5.

Computing parameter distributions, MLE+FIM and MCMC
Models such as diffusion Tensor, Ball&Stick, Bingham-NODDI, and CHARMED have parameters like diffusivity, diffusion direction, and axonal radii.Some of these parameters may be fixed to known values (fixed parameters), while other parameters need to be estimated from the dMRI data (free parameters).After estimating the free parameters we have a distribution of parameter values which is called the (full) posterior.The estimated values per parameters are called the marginal posteriors.These posteriors are called the parameter distributions.
We compare two different methods for the estimation of parameter point estimates and corresponding standard deviations of a diffusion MRI model in a single voxel, MLE+FIM and MCMC.The first method is based on estimating the model parameters most likely to explain the measured data using Maximum Likelihood Estimation (MLE) and estimating the corresponding standard deviations using the Fisher Information Matrix (FIM).
The second method is based on estimating the posterior distribution of the In parameter modeling we can distinguish two cases based on Gaussianity, either a marginal posterior distribution is unimodally Gaussian distributed, or it is not.If it is unimodally Gaussian distributed, the mean and mode are the same and the MLE+FIM theoretically provide the same point estimate and variance as MCMC, provided uniform priors are used as in this work.If the distribution is not unimodally Gaussian, one could use either the MLE with the variance from the FIM, and accept that only part of the distribution is captured (typically one mode), or use both the mean and variance from MCMC.The Gaussian assumption is also supported by two theoretical arguments.First, if the model is suitable to describe the data (e.g. if model selection, such as selecting the correct number of unique fiber directions in a voxel, was successfully applied), the posterior asymptotically approaches a Gaussian distribution (Gelman et al., 2013).
Second, for a finite sample size n and a set of parameters x, determining the marginal distribution of a component of x is equivalent to averaging over all the other components of x, which, by the same logic underlying the central limit theorem, generally brings them closer to normality (Gelman et al., 2013).An important and relevant exception is convergence to the edge of parameter space, which is relevant for angular representations of fiber orientation.In summary this means that, theoretically, the posterior or likelihood distribution of important microstructural parameters such as diffusivities, fractional anisotropy or volume fractions (of intra-cellular or extra-cellular compartments), can often be very well approximated by a Gaussian, as we set out to test in this work.
A final difference between MLE+FIM and MCMC is runtime.MCMC methods typically take more computational power than MLE and FIM, resulting in a longer runtime.In addition, most dMRI software tools already provide an MLE point estimate, computing the FIM would then be faster than additionally running MCMC.

Fisher Information Matrix
The observed Fisher Information Matrix is defined as the negative Hessian of the log-likelihood function when evaluated at the maximum likelihood estimate (MLE; Pawitan, 2013;Gelman et al., 2013).The inverse of the FIM is an asymptotic estimator of the covariance matrix around a parameter vector (Pawitan, 2013;Gelman et al., 2013).When the inverse of the FIM is evaluated at a (typically unknown) true parameter vector, we can use it as the Cramér Rao Lower Bound (CLRB) (Kay, 1993).For example, in simulation studies, the CRLB can function as a ground truth lower bound

Normal mod
Figure 2: Simulation examples of the behaviour of the MLE+FIM and MCMC pipelines on theoretical distributions.In each of the five plots we have drawn samples from a theoretical distribution and computed the MLE of those samples using the Nelder-Mead optimization method and computed the FIM on the thus obtained mode.We took the mean and variance of the samples as a representation of the MCMC procedure.Results are shown for A) a normal distribution X ∼ N (0, 1), B) a multi-modal normal distribution with X ∼ 5 8 N (0, 1) + 3 8 N (3, 0.8), C) a skewed normal distribution X ∼ SN (0, 2).with skew factor S = 10 (Azzalini & Capitanio, 1999), D) a truncated normal distribution X ∼ N (0, 1) truncated at zero, and E) a normal distribution modulus π, X ∼ N (0, 1) mod π.
on the estimable variances, thereby indirectly evaluating the performance 285 of the maximum likelihood routines (Kay, 1993).Likewise, in MRI pulse 286 sequence optimization studies, sequence parameters are adjusted to mini-287 mize the true CRLB at known or assumed parameter values.However, we 288 deal here with the situation of parameter estimates.Therefore, we follow 289 the results in astrophysics and interpret the inverse of the FIM, evaluated 290 at the MLE estimate, as a measure of uncertainty around the estimated pa-291 rameters (Vallisneri, 2008), and refer to it as a FIM uncertainty estimate, or 292 an MLE+FIM parameter and uncertainty estimate.

293
In more detail, let l(x) be a log-likelihood function with maximum likeli-294 hood estimate x.A second order Taylor approximation of l(x) centered at 295x is then given by: J o u r n a l P r e -p r o o f Journal Pre-proof ignoring the higher terms and having dropped the linear term since the 297 first derivative of a function is zero at the mode.Considering the first term, 298 l(x), a constant and the second term, to the logarithm of a Gaussian density N ( , ), we get the approximation: where I(x) is the observed Fisher Information Matrix, equal to the negative 301 of the Hessian matrix H: For the Hessian to be positive definite (and thus invertible), this theory 303 requires x to lie within the boundaries of the parameter space (Gelman 304 et al., 2013).We compute the Hessian numerically (see Appendix A) and 305 its inverse using a (Moore-Penrose) pseudo-inverse, using the RS routine 306 from a C version of the EISPACK library (Johannesgerer, 2015).tion (Arras, 1998), which states that if x is Gaussian distributed with mean 326 µ x and covariance matrix Σ x , the distribution of y can be approximated as: with J f the Jacobian matrix of f .More succinctly, the covariance matrix of 328 y = f (x) is given by: which holds as a generally applicable formula for linear propagation of co-330 variances (Arras, 1998).In the case of an univariate output y = f (x), the 331 Jacobian can be formulated as a gradient vector ∇ f , leading to the follow- ing expression for the variance in y: and regular standard deviation as: If each data point z i has a corresponding weight w i , we can compute a 352 weighted mean as: and a weighted standard deviation as: with m for the number of non-zero weights, included here to allow for non-355 normalized weights.It has been shown that the weights that minimize the 356 variance of the weighted average are the reciprocals of the variances of each 357 of the data points z i (Shahar, 2017).That is, given the variances σ 2 i for each 358 z i , the weights that minimize Var( i w i z i ) is given by: Incidentally, these weights are also the maximum likelihood estimator of 360 the weighted mean and variance under the assumption that the data points 361 z i are independent and Gaussian distributed with the same mean (

Datasets
In this study we used simulated data and imaging data from two population studies.To illustrate the methods on a dataset with a clinically feasible, fast to acquire, acquisition scheme, we used data from the diffusion protocol pilot phase of the Rhineland Study (www.rheinland-studie.de).
We refer to these datasets and acquisition schemes as RLS-pilot.To illustrate the methods on a dataset with a high-end, long acquisition time, acquisition scheme, we used data from the Human Connectome Project MGH-USC Young Adult study.We refer to these datasets and acquisition schemes as HCP MGH.For simulated data, we used a single representative acquisition scheme from both the RLS-pilot and HCP MGH studies.

Imaging data and preprocessing
The RLS-pilot datasets were acquired on a Siemens Magnetom Prisma (Siemens, Erlangen, Germany) with the Center for Magnetic Resonance Research (CMRR) multi-band (MB) diffusion sequence (Moeller et al., 2010;Xu et al., 2013).For all datasets we created a white matter (WM) mask from the Tensor FA estimates and, using BET from FSL (Smith, 2002), a whole brain mask.The whole brain mask is used for MLE and MCMC sampling, whereas averages over the WM mask are used in model or data comparisons.For each dataset, voxel-wise signal-to-noise ratio (SNR) is estimated using only the b 0 volumes, by dividing a voxel signal's mean with its standard deviation.

Ground truth simulations
To illustrate the effects of SNR on the variance of parameter estimates we used both imaging data as well as simulated data.In imaging data we can only estimate the true SNR, in simulations we can control it exactly.We used a single representative acquisition scheme from both the RLS-pilot and HCP MGH datasets (the acquisition schemes of subject v3a 1 data -ms20 and hcp 1003), and simulated data for all of the dMRI models used in this study.For each acquisition scheme and each model, we simulate 10000 voxels with random volume fractions in [0.2, 0.8], random diffusivities in [5e − 11, 5e − 9] mm 2 /s, and random orientations in [0, π] for each orientation.For each model we simulated data by evaluating the model at multiple random parameter configurations for that specific model.For models supporting multiple fiber orientations, each orientation direction is selected independently of the other orientation directions, as such randomly leading to parallel and crossing fiber voxels.From the resulting simulated data, we created multiple copies with Rician noise (Gudbjartsson & Patz, 1995) of SNRs 5, 10, 20, 30, 40 and 50.We then fit and sample each model ten times to these simulated datasets and estimate the standard deviation using both the MLE+FIM and MCMC approach as described above.
Per SNR we summarize the results of these ten trials as a mean standard deviation and its corresponding standard error of the mean (SEM).

Parameter distribution analysis
To quantify the correspondence between the MLE+FIM and MCMC uncertainty estimates we computed scatter plots and Pearson's correlation coefficients for multiple models and model parameters.We used both the MLE+FIM and MCMC methodologies to estimate parameter maps and parameter standard deviation maps over a white matter mask, for a single subject from both the HCP MGH and RLS-pilot datasets.For each voxel in the mask we can then compare the estimated mean and estimated standard deviation from both the MLE+FIM and MCMC methodologies.This comparison can be visually represented using a scatter plot and statistically using the Pearson's correlation coefficient ρ defined between 0 and 1.For the scatter plots, points closer to the diagonal represent a better fit, for the ρ values, higher is better.

J o u r n a l P r e -p r o o f
Journal Pre-proof

543
We applied variance weighted averaging using the uncertainty estimates to

Results
We begin by comparing the parameter estimates and parameter uncertainty estimates of MLE+FIM to the corresponding estimates from MCMC.Next, we investigate the effect of SNR on the parameter standard deviations using both simulated and imaging data.We conclude with a comparison of regular versus weighted averaging in group statistics.HCP MGH RLS-pilot mean std.mean std.

Parameter distribution estimates
Ball&Stick in1 FS 0.99 0.99 0.99 0.97 Ball&Stick in2 FS 0.99 0.99 0.99 0.97 Ball&Stick in3 FS 0.99 0.99 0.99 0.96 Bingham-NODDI FR 0.99 0.95 0.99 0.86 CHARMED in1 FR 0.98 0.85 0.92 0.69 Tensor FA 0.99 0.99 0.97 0.96 Table 1: Correlation comparison between MLE+FIM and MCMC methodologies over the volume fraction estimates of multiple diffusion MRI models.Each number is a ρ correlation coefficient for the difference between the estimation of the MLE+FIM and MCMC methodologies, averaged over an entire HCP MGH and RLS-pilot dataset.We included the volumetric parameters of the Ball&Stick, Bingham-NODDI and CHARMED in1 models.For a visual illustration of these coefficients see Supp.   and Markov Chain Monte Carlo (MCMC) sampling, for six different models and using a single representative subject from both the HCP MGH and the RLS-pilot datasets.Reported run times are over the entire brain mask and are in units of (h:m:s), with next to it the relative speed advantage of the MLE+FIM over MCMC.

J o u r n a l P r e -p r o o f
Journal Pre-proof

SNR and parameter variances
Fig. 5 compares an estimate of SNR, its reciprocal, and the parameter standard deviation estimates of multiple white matter models on a single HCP MGH dataset.We observe a decreased SNR in the center of the brain and an increase of SNR towards the periphery.A very similar gradient can be observed in the standard deviation maps, with a decrease in parameter standard deviations towards the periphery.We further note a small asymmetry in SNR and variances between the hemispheres, which is specific to this dataset.As in the previous results, we observe an increase in standard deviations for an increased number of Sticks in the Ball&Stick in{1, 2, 3} models, and Tensor FA standard deviations are about a factor two higher than the other standard deviation estimates.
Fig. 6 plots the quantitative analysis of the parameters standard deviations against SNR.In general, we observe an inverse relationship between SNR and standard deviation, where an increase in SNR leads to an decrease in parameter std.estimates.Standard deviations on RLS-pilot data are always higher than corresponding estimates on HCP MGH data.For lower SNR (< 10), MLE+FIM std.estimates are slightly higher than the MCMC estimates.
For higher SNR (> 10), the MLE+FIM and MCMC standard deviation estimates quickly converge, except for Ball&Stick in2, Ball&Stick in3 and Tensor estimates on the RLS-pilot dataset, where MLE+FIM standard deviations stay higher than those from MCMC.Since the CHARMED in1 model requires relatively high b-values ≳ 6000 s/mm 2 ), which are not present in the RLS-pilot datasets, we did not use the RLS-pilot dataset when showing CHARMED in1 results.For the HCP-MGH protocol results are consistent between simulated and imaging data, with differences within the Standard Error of the Mean (SEM).For the RLS-data, we observe higher variances on the imaging data than in the simulated data, particularly for higher SNRs.
We finally observe that the standard error of the mean is generally higher for the simulated data compared to the imaging data, especially for lower SNR.maps can play a role in detecting biased estimates resulting from imaging artifacts, measurement noise or poor model fits, without necessarily distinguishing between these.In particular, alterations which may not always be detectable in the parameter maps themselves.et al., 2003;Sotiropoulos et al., 2013;Wegmann et al., 2017).For the volume fraction, diffusivities as well as fractional anisotropy estimates we found robust empirical proof for the assumption that the posteriors were unimodally Gaussian distributed.For the angular parameters this assumption is not valid and different assumptions on the posterior are in order, e.g.different post-processing on the MCMC samples are required.For instance, one could estimate the maximum a posteriori from the samples as a region of high likelihood.Alternatively, one can use regression with a suitable model like a bimodal normal, truncated normal, or a normal mod n.This would then need to be applied on each of the (possibly millions) of voxels of the dataset, further complicating the modeling procedure and adding additional runtime.
Besides a few disadvantages, the MCMC also has a few advantages over MLE+FIM.For instance, after using MCMC to obtain a sample β for the model parameters, a random sample can be obtained for any function g(β), by applying g( ) to the drawn samples, allowing inference on arbitrary complex functions of the samples (Gelman et al., 2013).This has for example been used in q-ball imaging (Fonteijn et al., 2007).As an additional advantage, MCMC can also work with not-well behaved (i.e.improper) priors, as for example shown when using the Ball&Stick model with sparsityinduced automatic relevance detection (Behrens et al., 2007).
In general, for the maps typically of interest in dMRI microstructure modeling, the volume estimate maps, we find that both the FIM and MCMC give highly comparable standard deviation estimates.The only essential difference is one of computation time, computing a maximum likelihood point estimate together with the FIM is on average about 38x faster than using MCMC.This was expected, since MCMC is generally known to be a time-consuming process, even when run on a GPU with optimized hyperparameters (Harms & Roebroeck, 2018).MLE on the other hand can be applied very efficiently using a GPU (Harms et al., 2017) and computing the FIM requires only a few extra function evaluations (dependent on the number of parameters, see Appendix A).

Factors affecting uncertainty
There are several model and data characteristics that affect the uncertainty of parameter estimates, such as the amount of data, the complexity of derived parameter maps and the signal-to-noise ratio.In general, we have confirmed and further quantified these effects in this study.
Concerning data dependency, as expected, standard deviation estimates on the RLS-pilot dataset are generally higher than those on the HCP MGH dataset, reflecting a decrease in point estimate uncertainties with more data points.The same holds for the relatively large standard deviations in the Tensor Fractional Anisotropy (FA) estimates, since for the Tensor model we only used the data volumes with a low b-value.
A higher variance can additionally be observed for parameter maps which are not estimated directly but derived from the estimated parameters, such as FA.This makes the variance of such derived parameters a function of multiple parameter variances through uncertainty propagation, often leading to a higher total variance (Appendix B).This can be observed in the estimated uncertainty of the Tensor FA measure.The same propagation effect applies to the variance of the Fraction of Stick (FS) of the Ball&Stick models.
For an increasing number of Sticks, the variance in FS (i.e. the sum of all Stick fractions) is a parameter-value-weighted sum of the single-parameter variances and co-variances, which mostly increases the total variance of the derived parameter FS (Appendix B).
For all models, parameter standard deviations are influenced by the signalto-noise (SNR) ratio of the data, with a low SNR (< 10) leading to a large increase in standard deviations.Both in real and simulated data, the effect of SNR on the standard deviation estimates seems to be more gradual after an SNR ≥ 20.In general, as expected, we find parameter uncertainty to be highly correlated to the local SNR of the data (c.f.Fig. 5).This correlation to SNR shows that the uncertainty of diffusion microstructure estimates varies over space in the brain.Higher uncertainties in the deep white matter and deep brain nuclei could be caused by lower SNR of phased-array RF receiver coils.This common and unavoidable effect can be accounted for by estimating uncertainties and using weighted averaging as explained in this work.

Empirical Coverage Statistics
We investigated Empirical Coverage Statistics (ECS) results using the RLSpilot dataset as a function of model and SNR.These results showed very comparable results for MLE+FIM and MCMC, both are slightly undercovered, implying slightly too narrow CIs and slightly too low estimated uncertainty, by at most 10% for SNR of 10.However, with increasing SNR, ECS results mostly asymptotically tend towards the nominal confidence interval of 95%.This implies accuracy of the advocated MLE+FIM methodology in often occurring cases, with a caution for slight overconfidence, particularly at low SNRs of 10 and below.

Residual local uncertainty
The computed parameter standard deviations can be used as a tool to de- to include these standard deviation maps into quality control frameworks (Bastiani et al., 2019;Liu et al., 2010;Oguz et al., 2014).

Uncertainty weighted group statistics
By weighting down voxels with a high standard deviation, weighted averaging can reduce the effect of artifacts, poor model fits, and high noise models, potentially providing more optimal and more accurate estimates of group variances and increase power of group statistics.In theory, if the within group datapoints are distributed with the same mean, variance weighted averaging will provide the lowest possible variance in the group mean.We observe this in large parts of the white matter where weighted averaging lowers the variance in the group average as expected, thereby potentially increasing power in group comparisons.
We have shown that some white matter artifacts are visible in the parameter standard deviation maps as patches of relatively large standard deviations, even when the data was pre-processed with FSL EDDY with data outlier detection.Since variance weighted averaging automatically reduces the effects of outliers whenever, and wherever, they have a large variance, variance weighted averaging automatically reduces the effect of artifacts on group results.Even after removing a few outlier subjects with a similar artifact, variance weighted averaging still reduces the presence of what appears to be a lower-expressed artifact in the remaining subjects (see Suppl. Fig. 6).Therefore, variance weighted averaging is a more robust method to calculate group means and variances, thereby increasing power and reducing bias from imaging artifacts.The increased power could in the future potentially be leveraged in complex multi-factorial group comparisons by incorporating uncertainty estimates in generalized linear statistical modeling (Chen et al., 2012;Woolrich et al., 2004).Additionally, excluding (relatively few) outliers based on high individual uncertainties can be advisable.

J
o u r n a l P r e -p r o o f Journal Pre-proof the variance (width of the colored distributions) of both the MLE+FIM and MCMC results are practically the same.In illustrations, B, C, D and E, the variances are different where MCMC sometimes has a wider (2B, 2C, 2E) and sometimes a smaller (2D) variance.
only provides the uncertainties of the model parameters, if 309 we want to know the uncertainty of derived parameters, we can use uncer-310 tainty propagation, otherwise known as error propagation.For example, 311 we can use error propagation to estimate the standard deviation of a Ten-312 sor Fractional Anisotropy (FA) estimate, by propagating the standard devi-313 ation estimates of the Tensor diffusivities.Note that uncertainty propaga-314 tion computes the uncertainties of derived parameters from the elementary 315 parameters of the model (i.e.model parameters) and their corresponding 316 uncertainties.Here it does not matter how the elementary parameters and 317 uncertainties have been derived from the data.For instance, by a linear re-318 gression or non-linear optimization process.In this paper we focus on non-319 linear MLE optimization methods for obtaining the elementary parameters, 320 and the Fisher Information Matrix for the corresponding uncertainties as 321 input to the uncertainty propagation.322 Given a function y = f (x) where f (•) is a known function, uncertainty 323 propagation provides the probability distribution of y given the probability 324 distribution of x.We use a first order Taylor expansion linear approxima-325

J
o u r n a l P r e -p r o o f Journal Pre-proof These datasets had a resolution of 2.0 mm isotropic with diffusion parameters ∆ = 45.8 ms, δ = 16.3 ms, TE = 90 ms and TR = 4500 ms, and with Partial Fourier = 6/8, MB factor 3, no in-plane acceleration with 3 shells of b = 1000, 2000, 3000 s/mm 2 , with respectively 30, 40 and 50 directions to which are added 14 interleaved b 0 volumes leading to 134 volumes in total per subject.Additional b 0 volumes were acquired with a reversed phase encoding direction, which were used to correct susceptibility related distortion (in addition to bulk subject motion) with the topup and eddy tools in FSL version 5.0.9(Andersson & Sotiropoulos, 2016).The total acquisition time is 10 min 21 sec.These three-shell datasets represent a relatively short time acquisition protocol that still allows many models to be fitted.From this dataset we used a single representative subject (v3a 1 data ms20).The HCP MGH datasets come from the freely available fully preprocessed dMRI data from the USC-Harvard consortium of the Human Connectome project.Data used in the preparation of this work were obtained from the MGH-USC Human Connectome Project (HCP) database (https://ida.loni.usc.edu/login.jsp).The data were acquired on a specialized Siemens Magnetom Connectom with 300 mT/m gradient set (Siemens, Erlangen, Germany).These datasets were acquired at a resolution of 1.5 mm isotropic with diffusion parameters ∆ = 21.8 ms, δ = 12.9 ms, TE = 57 ms, TR = 8800 ms, Partial Fourier = 6/8, MB factor 1 (i.e.no simultaneous multi-slice), in-plane GRAPPA acceleration factor 3, with 4 shells of b = 1000, 3000, 5000, 10.000 s/mm 2 , with respectively 64, 64, 128, 393 directions to which are added 40 interleaved b 0 volumes leading to 552 volumes in total per subject, with an acquisition time of 89 minutes.These four-shell, high number of directions, and very high maximum b-value datasets allow a wide range of models to be fitted.From these datasets we used a single representative subject (hcp 1003) in single subject illustrations and we used all 35 subjects in the group comparisons.To assess the utility of uncertainty estimation in the context of of current movement and distortion correction preprocessing strategies which include data outlier detection (Andersson & Sotiropoulos, 2016), we revisited the complete preprocessing of all datasets with FSL EDDY (FSL version v6.0.1).The preprocessed HCP MGH datasets as released are corrected for movement and distortion correction, but this preprocessing does not involve outlier detection and replacement.For all 35 datasets, the original unprocessed datasets (diff.nii)were submitted to eddy cuda9.1,running on an NVIDIA GTX 1080Ti GPU using the -repol option to use outlier detection and replacement.Since the Tensor model is only valid for b-values up to about 1200 s/mm 2 , we only use the b-value 1000 s/mm 2 shell and b 0 volumes in maximum likelihood estimation and MCMC sampling of the Tensor model.All other models use all the data volumes.J o u r n a l P r e -p r o o f Journal Pre-proof

544
down weight voxels with a high standard deviation.To illustrate, we com-545 puted a baseline unweighted statistic using a simple mean and standard 546 deviation over the Bingham-NODDI FR results of all 35 subjects.We then 547 J o u r n a l P r e -p r o o f Journal Pre-proof used variance weighted averaging using the Bingham-NODDI FR standard deviation maps as weights in the variance weighted averaging.As a comparison method between regular and weighted averaging we computed (µ weighted − µ regular )/µ regular and (σ weighted − σ regular )/σ regular as difference measure for the mean and standard deviation estimates between regular and weighted averaging.

Fig. 3
Fig. 3 demonstrates the results of MLE+FIM and those of MCMC using the Bingham-NODDI Fraction of Restricted (FR) parameter on a single subject from the RLS-pilot dataset.Comparing results of a single transverse slice show high qualitative correspondence between the MLE and MCMC methods (Fig. 3A), with both the point estimates and corresponding standard deviations (stds.) in close resemblance.Similarly, a single voxel illustration of the estimated Gaussian distributions (Fig. 3B) also show a high degree of similarity, with both Gaussian fits capturing the characteristics of the MCMC sample distribution.Fig.4 shows the comparison scatter plots for the Bingham-NODDI FR mean and standard deviation.The FR point estimates are tightly confined to the identity line, illustrating a high degree of correspondence in the point estimates from MCMC and MLE.This is confirmed by the Pearson's correlation coefficient for both datasets with ρ ≈ 0.99.The std. estimates between the MLE and MCMC methodologies also show a high correspondence with a ρ ≈ 0.93 for the HCP MGH dataset and ρ ≈ 0.82 for the RLS-pilot dataset.

597
Figure 3: A) Visual comparison of parameter and standard deviation uncertainty maps between the Maximum Likelihood Estimation with Fisher Information Matrix (MLE+FIM) and Markov Chain Monte Carlo (MCMC) methodologies for the Bingham-NODDI Fraction of Restricted (FR) on an RLS-pilot dataset.B) Histogram of the 20 thousand MCMC samples of the highlighted voxel in Fig. A, with in red and blue the fitted Gaussian distributions of, respectively, the MLE+FIM and MCMC methodologies.For a few more additional comparisons see suppl.figure 1.

Figure 4 :
Figure 4: Scatter plots comparing Maximum Likelihood Estimation (MLE) and Markov Chain Monte Carlo (MCMC) point estimates (left column) and standard deviations (right column) for the Bingham-NODDI Fraction of Restricted (FR) values over a white matter mask for both a complex, long acquisition time HCP MGH dataset and a clinically feasible RLS-pilot dataset.Plots are color coded using a kernel density estimate (a.u) from purple (low density) to red (high density).The left upper corner of each plot shows the ρ correlation coefficient between the MLE and MCMC data.

Fig. 7
Fig.7shows the Empirical Coverage Statistics (ECS) for the RLS-pilot protocol simulated dataset as a function of SNR for each of MLE+FIM and MCMC.All ECS, for both methods, are slightly undercovered, implying slightly too narrow CIs and slightly too low estimated uncertainty, by at most 10% for SNR of 10.However, with increasing SNR, ECS results (e.g.BallStick and NODDI) asymptotically tend towards the nominal confidence interval of 95% indicating a high coverage probability resulting from the computed parameter variance estimates.For Tensor FA, the coverage of both methods is uniformly close to nominal over all investigated SNRs, implying a robustness against low SNR in this case.In general we observe a close correspondence between the MLE+FIM and MCMC methodology across the compared models, except for CHARMED in1 were we noted

Fig. 9
Fig.9shows the two types of group statistic estimates, a regular mean and standard deviation (baseline) and a variance weighted mean and standard deviation.Between regular and weighted averaging we computed a percentile difference map over a white matter mask to highlight the differences in estimates of both the group mean and group standard deviations.The variance weighted mean is lower across an artifact above the corpus callosum and, to a lesser degree, over the left internal and external capsules, with nearly equal values in most of the white matter.Standard deviation estimates vary more between regular and weighted averaging, with a lower weighted average across the white matter artifact, equal values in most of the white matter and higher estimates near the border with gray matter.

Figure 8 :
Figure8: Illustration of residual localized model fit uncertainties in the HCP MGH datasets after preprocessing of the HCP MGH data using FSL Eddy and the -prepol option for data outlier detection.We then computed the Bingham-NODDI Fraction of Restricted (FR) mean and standard deviation (std.) using the MLE+FIM methodology.In the top row, estimates for HCP MGH subject 1017, with an artifact across the corpus callosum.In the middle row, estimates for HCP MGH subject 1016 with increased standard deviation estimates near a ventricle.In the bottom row, estimates for HCP MGH subject 1016 with no artifacts visible in the mean or standard deviation map.

Figure 9 :
Figure9: Comparison of the regular group mean and variance weighted group mean and standard deviation for summarizing group study results.For this illustration we used 35 co-registered HCP MGH, preprocessed using FSL Eddy and the -prepol option.Results show group averages of the Bingham-NODDI Fraction of Restricted (FR), computed using the MLE+FIM methodology.First row, the regular mean and standard deviation, second row, the variance weighted mean and standard deviations, final row, percentage difference between regular and weighted averages.
tect poor model fits or acquisition artifacts in individual datasets as localized residual uncertainty after appropriate preprocessing and robust model J o u r n a l P r e -p r o o f Journal Pre-proof fitting.In one provided example (Fig. 8 middle row), a patch of high intensity voxels was visible in the standard deviation estimate but not in the parameter estimate itself, which might point to an acquisition artifact such as incomplete fat suppression or ghosting.As such, standard deviation maps have the potential, beyond signaling poor model fits, to be more sensitive in detecting artifacts than point estimate maps alone.It is unfortunately fundamentally not possible without further information to distinguish an elevated uncertainty due to a poor model fit from one due to an acquisition artifact in the data.Similarly, there is no fundamental way to distinguish increased uncertainty due to a poor model choice or increased noise in the data.For our purposes this does not matter since we aimed at detecting and removing all of these cases and/or down-weighting them in group statistics.Despite that the parameter standard deviations do have the potential to flag potential problems.As such, a future development could be

Table 2 :
Correlation comparison between MLE+FIM and MCMC methodologies in all the parameters of the Tensor model.Each number is a ρ correlation coefficient for the difference between the estimation of the MLE+FIM and MCMC methodologies, averaged over an entire HCP MGH and RLS-pilot dataset.We included all parameters of the Tensor model, the diffusivities and angular parameters.For a visual illustration of these coefficients see Supp.Figs.4 and 5.

Table 3 :
Runtime comparison between the two methodologies for computing parameter statistics, Maximum Likelihood Estimation (MLE) with the Fisher Information Matrix (FIM)