Model-based Bayesian inference of brain oxygenation using quantitative BOLD

Streamlined Quantitative BOLD (sqBOLD) is an MR technique that can non-invasively measure physiological parameters including Oxygen Extraction Fraction (OEF) and deoxygenated blood volume (DBV) in the brain. Current sqBOLD methodology rely on fitting a linear model to log-transformed data acquired using an Asymmetric Spin Echo (ASE) pulse sequence. In this paper, a non-linear model implemented in a Bayesian framework was used to fit physiological parameters to ASE data. This model makes use of the full range of available ASE data, and incorporates the signal contribution from venous blood, which was ignored in previous analyses. Simulated data are used to demonstrate the intrinsic difficulty in estimating OEF and DBV simultaneously, and the benefits of the proposed non-linear model are shown. In vivo data are used to show that this model improves parameter estimation when compared with literature values. The model and analysis framework can be extended in a number of ways, and can incorporate prior information from external sources, so it has the potential to further improve OEF estimation using sqBOLD.


Introduction
Quantitative measurements of the BOLD signal can be used to noninvasively construct maps of parameters related to brain metabolism, which have been shown to be useful in clinical assessment of stroke (Seiler et al., 2017), as well as for understanding baseline healthy brain function (Rodgers et al., 2016). Oxygen extraction fraction (OEF) is of particular relevance as a measure of activity, and can be combined with measurements of cerebral blood flow to directly quantify metabolism. The quantitative BOLD (qBOLD) model (Yablonskiy and Haacke, 1994) relates oxygen extraction fraction and deoxygenated blood volume (DBV) to the measured reversible transverse relaxation rate R ' 2 (where R ' 2 ¼ R * 2 À R 2 ). Streamlined quantitative BOLD (sqBOLD - Stone and Blockley, 2017) uses a simplified version of this model with R ' 2 -weighted measurements made using an asymmetric spin echo pulse sequence (ASE -Wismer et al., 1988) to provide fast and robust quantitative BOLD measurements. Simplification of the qBOLD model is made possible by minimising the signal contribution of confounding factors during data acquisition, such as macroscopic magnetic field gradients using GESEPI (Yang et al., 1998) and cerebrospinal fluid (CSF) using FLAIR (Hajnal et al., 1992). sqBOLD uses the following stepwise process: first, R ' 2 is measured from ASE volumes with a large spin echo displacement τ; then, DBV is calculated from the mismatch between a linear exponential model fit and the measured spin echo data; finally, OEF is calculated as a function of the ratio of R ' 2 and DBV. This procedure could propagate noise resulting in poor signal-to-noise ratio (SNR), and could introduce bias into the estimates. A curve-fitting approach in which multiple parameters are fitted simultaneously to a more complete qBOLD model (He and Yablonskiy, 2007;Simon et al., 2016) could overcome the limitations imposed by fitting the linear model required in the existing analysis. Non-linear model fitting, as is required for the qBOLD model, is notoriously challenging on data with limited SNR. Bayesian methods provide a valuable framework for approaching this problem and have been used in similarly challenging applications such as for perfusion estimation from Arterial Spin Labelling . In particular, Bayesian methods allow for the incorporation of prior knowledge about physiological parameters (such as might be obtained from both imaging and non-imaging sources), which are used to regularise the fitting. Bayesian methods also give an estimate of the uncertainty in model parameters, which could provide extra information about the validity or interpretability of the results.
In this paper, simulated ASE data are used to evaluate different versions of the qBOLD model in a Bayesian framework with the primary aim of selecting a model that could provide the most reliable estimates of OEF. In vivo data are then used to compare a fully model-based parameter estimation procedure against the existing method.

Quantitative BOLD model
The qBOLD model, proposed by Yablonskiy and Haacke (1994) describes how transverse magnetisation in bulk tissue changes in the presence of susceptibility altering objects. In this case, those objects are blood vessels whose susceptibility difference is proportional to OEF, and the protons in the surrounding tissue are assumed to be relatively stationary with respect to the vessels (the static dephasing regime). When modelled as randomly oriented, infinitely long cylinders with uniform magnetic susceptibility, the signal originating from the surrounding tissue, as a function of time, is given by (An and Lin, 2003): where R t 2 is the irreversible transverse relaxation rate of bulk tissue, J 0 ðxÞ is the zero-order Bessel function of x, and δω is the characteristic frequency, given by: where γ is the proton gyromagnetic ratio, Δχ 0 is the susceptibility difference between the tissue and deoxyhaemoglobin contained within blood vessels, and Hct is the fractional haematocrit. This model assumes that deoxyhaemoglobin is the dominant source of magnetic susceptibility. Therefore, the presence of an additional source of susceptibility, such as myelin in white matter or iron deposited in deep grey matter structures, will confound estimates of OEF and DBV. This has previously been shown to result in the overestimation of R ' 2 and DBV (Stone and Blockley, 2017). Hence the following analyses are restricted to grey matter.

One compartment model
Under ASE, a refocussing pulse is applied at time ðTE À τÞ =2, leading to an effective echo time of TE À τ (Wismer et al., 1988). The signal is read-out at time TE (which is constant across all echoes), and so can be expressed as a function of τ. The integral in Equation (1) can be replaced by two asymptotic forms (Yablonskiy and Haacke, 1994): where the characteristic time t C depends on δω, and the factor S 0 controls for constant terms, including R 2 decay and inter-voxel differences in signal. Given that R ' 2 ¼ DBV Á δω, this equation can also be written in terms of R ' 2 , removing explicit dependence on OEF: These equations are two forms of a normalized one-compartment (1C) qBOLD model. The latter can be used to calculate OEF by rearranging Equation (2):

Log-linear model
The sqBOLD technique uses the long-τ, R ' 2 -dependent model (Equation (4b)) to fit R ' 2 and DBV (Stone and Blockley, 2017). R ' 2 -weighted images with τ > t c can be fit to a log-transformed model (L-model): Here R ' 2 can be calculated as the gradient of lnS t ðτÞ, and DBV can be obtained by subtraction of ln S t SE from the intercept, where S t SE is the signal measured at the spin echo (τ ¼ 0). OEF can then be calculated using Equation (5).

Two compartment model
The qBOLD model can be extended to include the signal which originates from venous blood. Several models have been proposed which describe this intravascular signal. The first assumes that the blood signal S b decays monoexponentially, and is reversible with respect to the spin echo (Simon et al., 2016): where R b 2 and R b* 2 are functions of OEF (and Hct). This does not properly consider the dephasing effects inside the blood vessel, which consists of a bulk (plasma) containing susceptibility altering objects (red blood cells). The intravascular dephasing can be parametrised in terms of Fresnel functions (Sukstanskii and Yablonskiy, 2001;Yablonskiy et al., 2012): where CðηÞ and SðηÞ are Fresnel functions, and A recently proposed analytical model describes the blood signal under the motional narrowing regime. This is valid because the size of a red blood cell is significantly smaller than the distance a spin in the plasma will diffuse during time TE. The signal in this regime is (Berman et al., 2018): Here characteristic diffusion time t D ¼ R rbc =D b where R rbc ¼ 2:6 μm is the characteristic size of red blood cells and D b ¼ 2 μm 2 ms À1 is their rate of diffusion, R b;true 2 is the intrinsic R 2 of fully oxygenated blood and G 0 is the mean square field inhomogeneity in blood (Berman et al., 2018): The total signal measured from a voxel in this two-compartment model (2C) is the sum of the signal from each compartment, weighted by their apparent volume fractions. Apparent DBV, denoted ζ ' , is given by: where m b is the steady-state magnetisation of the blood (which depends M.T. Cherukara et al. NeuroImage 202 (2019) 116106 on its T 1 and the sequence parameters TR, TE, and TI), and n b is the relative spin density of blood (He and Yablonskiy, 2007). Total 2C signal is therefore given by: where S 0 is the signal without any transverse relaxation (at t ¼ 0). This can be used as a forward model in a Bayesian framework to infer parameter distributions for either OEF and DBV, or R ' 2 and DBV (with OEF calculated from those parameters afterward). A model such as this expected to have most value in voxels which have a relatively small blood volume and where this additional compartment can correct for the presence of intravascular signal. In the presence of larger blood volume fractions, whilst the intravascular signal would be appropriate, the assumption of the extravascular signal model (Eq. (1)) would be invalidated (Yablonskiy and Haacke, 1994).

Bayesian inference
Bayesian inference is built upon Bayes' theorem, which defines how, for a model M with parameters Θ, a posterior probability distribution of those parameters can be formed by combining prior beliefs about the parameter with some observed data Y: where PðΘjY; M Þ is the posterior probability of the parameters, PðΘjM Þ is the prior, PðYjΘ; M Þ is the likelihood of the data, given the parameters and the model, and PðYjM Þ is the evidence for the data, given the model. Often, analytically evaluating the posterior is impossible. It can, however, be easily sampled in a uniform pattern using a grid search. Though exact, this method is wasteful and prohibitively time-consuming in high dimensional space. An alternative to sampling the true posterior is to approximate it, then analytically solve the approximation. A variational Bayesian (VB) inference scheme has been developed for non-linear signal models similar to those considered here and applied to imaging data to analytically calculate approximate posterior parameter distributions (Attias, 2000;Chappell et al., 2009). This is significantly faster than sampling based methods, and still allows prior information to be used to generate more physically reasonable estimates from noisy data . The priors take the form of Gaussian distributions, with specified means μ 0 and standard deviations σ 0 . These values being chosen based on prior knowledge of typical realistic parameter values (including the possibility of pathological variation), but not so as to bias the results away from what the underlying data would predict. Priors can also be used to enforce physically reasonable conditions such as spatial smoothness by means of a spatial prior distribution (Groves et al., 2009).

Simulations
Simulated qBOLD data were used to examine whether a Bayesian model-based approach was able to estimate OEF more accurately than least-squares fitting to a simplified log-linear model. In MATLAB (MathWorks, Natick, MA), simulated two-compartment ASE qBOLD signals were generated using Equations (1), (10) and (13). Signals were simulated with 50 OEF values from 20% to 70% (which includes the expected range of healthy values (Marchal et al., 1992)) and 50 DBV values from 0.3% to 15% (based on the expected range of total blood volume (Roland et al., 1987)), for a total of 2500 artificial ASE signals. Gaussian noise was added to simulate 7 SNR values between 5 and 500. Each signal consisted of 24 R ' 2 -weighted images with spin-echo offsets τ in steps of 4 ms between À28 ms and þ64 ms. This optimised range of τ values were chosen to maximise the SNR of the R ' 2 parameter estimate (Stone and Blockley, 2017). Fractional haematocrit Hct was set at 0.40, the literature value for general circulation (Nicoll et al., 2012), which is similar to what has been used in other related studies (Simon et al., 2016;Berman and Pike, 2017;Lee et al., 2018). Previous studies have used lower values such as 0.34 (He and Yablonskiy, 2007;Stone and Blockley, 2017), in accordance with the Fahraeus effect (Fahraeus and Lindqvist, 1931), which suggests that Hct in small vessels (including capillaries and venules) is lower than in general circulation. This effect has been quantified as a reduction of between 14% (Calamante et al., 2016) and 25% (Sakai et al., 1985). It is not clear to what extent the range of Hcts present throughout the brain affect the qBOLD signal, or whether an average value can be assumed throughout. Other physiological and sequence parameters were held to constant values, which are given in Table 1.
In order to assess the differences between the OEF-based and R ' 2 -based models (Equations (3) and (4) respectively), a single simulated dataset (OEF ¼ 40%, DBV ¼ 3%) was analysed in a grid search scheme. The posterior probability of each pair of values in the intervals OEF 2 f20%; 70%g and DBV 2 f0:3%; 15%g, given the 2C model, was evaluated. The OEF-based and R ' 2 -based models were compared in their accuracy of DBV estimation by marginalizing over OEF and R ' 2 respectively. Whilst the grid search technique is prohibitive for analysis of in vivo data, it enables direct comparison of parameter distributions to determine separability. Grid searches were used to determine which of the OEFbased or R ' 2 -based models were more suitable, based on whether OEF-DBV or R ' 2 -DBV distributions were more separable. The full set of simulated data were analysed in a VB scheme, using the Fast ASL and BOLD Bayesian Estimation Routine (FABBER) Groves et al., 2009, Woolrich andBehrens, 2006). The 1C and 2C models were used, with parameters DBV and either OEF or R ' 2 .
Prior means μ 0 were taken from the literature for healthy subjects (Derdeyn et al., 2001;Roland et al., 1987) and are presented in Table 2.
The effect of different μ 0 and σ 0 were investigated in order to choose values that would not bias the results in the case of very different (e.g. pathological) true values.
VB inference was performed using the 2C model with a range of σ 0 (between 10 À1/2 and 1000) for each parameter at the highest simulated SNR (SNR ¼ 500), with μ 0 fixed to the values in Table 2. The absolute difference between the estimate of each parameter and its true value was then computed and averaged over all simulated pairs of OEF and DBV values. In order to minimise the risk of biasing the parameter estimates, the least precise prior which still resulted in a low error was chosen for each parameter, and used in all later inference. Once appropriate precisions were chosen, extreme values of μ 0 were tested to ensure that the chosen prior mean did not significantly affect the results. μ 0 ðR ' 2 Þ ¼ f3 s À1 ; 7 s À1 ; 15 s À1 g and μ 0 ðDBVÞ ¼ f0:5%; 3:6%; 10%g were tested, and errors in estimates of DBV and R ' 2 calculated as above. Least-squares regression (LS), and VB inference, using each model (L, 1C, and 2C), was performed on the same data, for all OEF-DBV pairs and SNR values. The absolute error between true and estimated OEF and DBV values were averaged across all OEF and DBV values for each SNR, in order to assess the utility of a more complete model, and the effect of noise on estimates. These were compared at the voxel level using twoway ANOVA and Tukey-Kramer (honestly significant difference) pairwise comparisons.
Another aspect of the qBOLD model was tested using simulated data. The point at which the asymptotic models transition, t C (Equations (3) and (4)) was defined originally as t C ¼ 1:5=δω (Yablonskiy and Haacke, 1994), whereas, more recently, Lee et al. (2018) used t C ¼ 1= δω. Since the model itself is phenomenological, parameters such as t C should be chosen so that the asymptotic model most closely matches the analytical version. For all OEF-DBV pairs, in the absence of noise, the analytical qBOLD model (Equation (1)) was compared with the asymptotic model (Equation (3)), and an optimal value for the transition constant τ Á δω was found using a golden section search (MATLAB's fminbnd).
In light of recent efforts to reduce the scan duration for application in clinical research (Stone et al., 2019), a supplementary analysis was performed using alternative acquisition parameters. The acquisition consisted of 11 R ' 2 -weighted images with spin-echo offsets τ in steps of 8 ms between À16 ms and þ64 ms, which represents a scan time reduction of 54%. This data was formed by retroactively removing τ values from already-simulated data.

Data collection
All imaging was performed on a 3 T S Verio system (Siemens Healthineers, Erlangen, Germany) using the body transmit coil and the vendors 32-channel receive coil. Subjects were scanned under a technical development protocol agreed with local ethics and institutional committees.
Scan data from seven healthy participants (aged 24-32, 4 female), which was collected as part of a previous study (Stone and Blockley, 2017), were used to compare non-linear model inference using VB against a linear-least-squares fitting procedure in vivo.
For each subject, a GESEPI ASE (GASE) scan  with FLAIR preparation (TI FLAIR ¼ 1210 ms) was acquired, with TR/TE ¼ 3000/74 ms, 64 Â 64 matrix, ten slabs, 3.75 Â 3.75 Â 5 mm resolution, and 24 τ values ranging from À28 ms to 64 ms in steps of 4 ms. Each 5 mm slab was constructed by averaging four 1.25 mm slices (acquired as 8 k-space partitions including 100% partition oversampling to reduce aliasing) to correct for macroscopic field gradients. Total acquisition time was 9 min 36 s. Data were acquired as part of a previous study  -see Appendix: Data Access Statement). The GESEPI acquisition compensates for the effect of macroscopic magnetic field gradients by effectively applying multiple z-shim levels and integrating them using a Fourier transform, which is equivalent to the acquisition of several thin partitions in a 3D acquisition (Yang et al., 1998).

Image analysis
Data pre-processing was performed in FSL (Jenkinson et al., 2012). GASE data were motion corrected using MCFLIRT (Jenkinson et al., 2002), and smoothed with a Gaussian kernel (σ¼0.85 mm). This sub-voxel smoothing was chosen to reduce the impact of noisy voxels, while maintaining the effective spatial resolution of the parameter estimates. The spin-echo volumes (τ ¼ 0) were segmented using FAST (Zhang et al., 2001) to create grey matter masks that were used to define the region of interest. Grey matter masks were obtained by thresholding the segmented ASE data at 60%, to exclude voxels where partial volumes of white matter or nulled CSF, where the assumptions of the qBOLD model are violated, may severely affect the signal. This approach was chosen over segmenting the MPRAGE images because it produced more consistent results, and does not require registration from MPRAGE-space to GASE-space.
Grey matter SNR was calculated for each subject across all τ values, in order to determine the approximate range of values from simulation that would be most appropriate. This was done by dividing the mean voxel intensity values within the grey matter mask by the standard deviation of values outside the head.
In vivo ASE data were analysed using the L model, implemented in MATLAB, and the R ' 2 -DBV 1C and 2C models in VB both with and without spatial regularization. The 1C and 2C models were fit to data for all 24 τ values; whereas the L model only applies to τ ¼ 0 and τ > t C (for a total of 14 data points). In addition to the motional narrowing model for intravascular signal (Equation (10)), a supplementary analysis of the linear and powder models (Equations (7) and (8)) was also performed. Inference was performed using a 2C model incorporating each different intravascular signal for all subjects (without spatial regularization) in order to compare their impact on final parameter estimates. Priors and initial posteriors were defined using literature mean values and the standard deviations determined from testing simulated data (see Section 4.1). These are given in Table 2. Mean grey matter parameter estimates were compared for each analysis method, for each subject and across the group. Under spatial regularization, each voxel's prior was defined by aggregating the posterior distributions from the 6 adjacent voxels, across 10 iterations. This imposes a degree of spatial smoothness in the parameter distributions.
To determine similarity, estimates were subjected to a two-way ANOVA test to determine whether the group means for each method were equal. In the case of significantly different group means, the Tukey-Kramer (honestly significant difference) method was used to perform pairwise comparisons.
In a Bayesian framework, it is also possible to assess goodness of fit using the model evidence term (PðYjM Þ in Equation (14)). The VB routine aims to minimise the Kullback Liebler divergence between the (intractable) true posterior and a (tractable) approximate posterior. Since PðYjM Þ is constant for a given model, it can be shown (by rearranging Equation (14)) that this minimization is equivalent to maximizing the free energy of the approximate posterior (Attias, 2000;Chappell et al., 2009). A model that results in a higher average free energy across grey matter therefore provides a better explanation of the data. The median grey matter free energy values for each method were analysed across subjects with the same two-way ANOVA test, and Tukey-Kramer pairwise comparisons. This was to ensure that applying additional constraints, such as spatial smoothness in parameters, did not negatively impact the fitting.
A supplementary comparison of L, 1C, and 2C models (using VB inference) was also carried out on the same data, analysing only 11 of the originally acquired 24 τ values, from À16 ms to þ64 ms in steps of 8 ms.
The resulting grey matter mean parameter values were compared between models, and between the 24-τ and 11-τ datasets.
To determine the impact of the assumed value of Hct ¼ 0:40 (Nicoll et al., 2012), a further supplementary analysis was performed in which the same data (with 24 τ values) were analysed with all models with Hct ¼ 0:34 (as in Yablonskiy, 2007, andBlockley, 2017). It is expected that in the L and 1C models, Hct will act purely as a scaling constant on OEF (Equation (5)); but since it occurs in the intravascular compartment (Equation (11)) its effect on R ' 2 and DBV estimates from the 2C model should also be investigated.

Simulated data
Asymptotic qBOLD models given in terms of OEF and R ' 2 were compared using grid search posterior sampling of simulated data. The results of this are shown in Fig. 1. In OEF-DBV space, there is a large region of collinearity between the two parameters, which is likely to prevent accurate estimation of both parameters. In contrast, the R ' 2 -DBV space has a posterior distribution in which the two parameters are largely separable and also which can more readily be approximated as a multivariate normal distribution within the VB inference method. Marginalizing over these results with respect to OEF and R ' 2 yielded DBV likelihood distributions that are directly comparable. The standard deviation in DBV was 1.05% for the OEF-based model, and 1.14% for the R ' 2 model, suggesting that there is very little difference between the two in their estimation of DBV. The same analysis of OEF estimates derived from each method yielded a standard deviation of 14.5% for the OEF-based model, and 15.8% for the R ' 2 model. Simulated ASE signals were used to compare the L model with the more complicated 1C and 2C models analysed in VB. First, the optimal parameters for VB inference were determined by testing a range of standard deviations for the priors on R ' 2 (between 10 5/2 and 1000) and DBV (between 10 À5 and 1). The results of these are shown in Fig. 2. They suggest that σ 0 ðR ' 2 Þ ¼ 10 3=2 and σ 0 ðDBVÞ ¼ 10 3=2 are the least precise values that consistently produce reasonable results, although using an even broader σ 0 ðR ' 2 Þ is not detrimental. Using these σ 0 , the choice of prior mean μ 0 was also investigated by testing extreme values. The error in R ' 2 and DBV estimates was calculated using each combination of μ 0 ðR ' 2 Þ and μ 0 ðDBVÞ, and the standard deviation of these was 0.76 s À1 for R ' 2 , and 0.47% for DBV, showing that the choice of μ 0 does not significantly bias the results.
Simulated data were also used to determine the optimal transition point between the two asymptotic regimes, as a function of δω. This was calculated for each simulated OEF and DBV value, in intervals OEF 2 f20%; 70%g and DBV 2 f0:3%; 15%g. Averaging across the entire parameter space, the optimal transition point was found to be 1:76 =δω. This value was used in all further analysis of simulated and in vivo data, for all models.
Using optimised parameters, the simulated ASE signals were analysed using each of the three models (L, 1C, and 2C) at 7 SNR values. Each model was evaluated with both least squares regression (LS) and VB. For each signal, the absolute error between the true value of each parameter and the estimate was calculated. The average errors across the whole parameter space, at each SNR and for each model, are shown in Fig. 3. The 2C model in LS performed better than any other at estimating R ' 2 . Two-way ANOVA with pairwise comparisons showed that there were not significant differences in estimates of R ' 2 between all the other methods. The 1C and 2C models in LS performed significantly worse in estimating DBV, and there were no significant differences between the others. At all tested SNRs except 100, the 1C and 2C models in VB performed significantly (p < 0:001) better than the L model, or any LS implementations, at estimating OEF. Apart from at very low SNR (SNR 10) and at SNR ¼ 500 there were not significant differences in parameter estimates between the LS and VB implementations of the L model.
The results of the analysis using alternative acquisition protocol are presented in Supplementary Information. Fig. S1 parallels Fig. 3 displaying the error in estimations of R ' 2 , DBV, and OEF, as a function of SNR. The error in OEF estimation across all SNRs are higher in the case of fewer τ values, but in this case the 2C model offers an even greater improvement over the L model than 1C, especially at lower SNR (SNR 50).

In vivo data
The analysis methods (L model in least squares fitting, and 1C and 2C models in VB with and without spatial regularization) were tested on GASE data from 7 healthy subjects. Computation time for the L model analysis averaged 2.5 s per subject, for the entire volume, and for both 1C and 2C models in VB was 10.3 s each per subject. VB with spatial regularization took, on average, 11.8 s per subject (for 1C and 2C models).
Grey matter SNR was calculated by dividing the mean GASE voxel intensity within grey matter by the standard deviation of the noise in empty space outside the head. Across all subjects and τ values, SNR was 89 AE 16. 2 -DBV pairs, using the same data. In the OEF-DBV model, there is a large area of collinearity, and the posterior density distribution does not have a Gaussian-like form. By contrast, the R ' 2 -DBV model has more separable parameters, and a distribution shape that can more easily be approximated by a multivariate normal distribution, which is a requirement for VB inference as implemented here.
Grey matter mean parameter values for R ' 2 , DBV, and OEF for each subject are presented in Table 3, as well as inter-voxel standard deviations of parameter estimates, which account for physiological variation as well as uncertainty in model fitting. Maps from an example subject are shown in Fig. 4, and a group-level comparison is shown in Fig. 5. The group-average grey matter OEF estimate from the L model was 21 AE 2%, the 1C model estimated 17 AE 2% without spatial regularization, and 19 AE 2% with spatial regularization. The 2C model estimated OEF at 18 AE 2% and 20 AE 2% (without and with spatial regularization, respectively). Group-average grey matter DBV was 5.38 AE 0.62%, 5.99 AE 0.43%, and 6.58 AE 0.26% for L, 1C, and 2C models respectively. At the inter-subject level, the variance in R ' 2 improved significantly (based on an F-test with α ¼ 0:05) with the 1C model (both with and without spatial regularization) compared to the L model. Group-average grey matter standard deviation of R ' 2 was 2.7s À1 in least-squares fitting, compared to 1.21s À1 under VB. Variances in OEF and DBV were lower in spatially-regularized VB, but were not statistically significant. At the intra-subject level, the improvements of VB were statistically significant in all subjects for R ' 2 , and in 4 of 7 subjects for DBV and OEF (based on an F-test with 100 degrees of freedom). Spatially regularized VB made a significant improvement in the variances of all parameters across all subjects. This effect can be seen in the relative number of voxels which contain values that are not physically plausible, such as those with OEF or DBVs greater than 100%. Across the group, the proportion of grey matter voxels with OEF estimates greater than 100% was 33.7% under VB without spatial regularization, compared to 8.6% with spatial regularization (under the L model, the proportion was 18.8%). Fig. 4 illustrates this improvement. Using the 2C model also resulted in a decrease in the proportion of voxels with OEFs or DBVs above 100% (17.7% and 8.1% without and with spatial regularization, respectively).
Two-way ANOVA showed statistically significant differences in average grey matter estimates of R ' 2 between L model and the 1C and 2C models inferred under VB. There was a significant difference between L model and 1C model estimates of DBV, but not between L and 2C. There were significant differences between the L model and all others (except non-spatially regularized 1C model) in estimates of OEF. For both 1C and 2C, across all parameters, there were no group-level statistically significant differences between estimates made with and without spatial regularization.
Additional analysis of the 2C model was performed using different models of the intravascular signal, which is shown in the Supplementary Material. Fig. S2 parallels Fig. 5, and shows that there is no significant difference in R ' 2 or OEF estimation between the three intravascular signal models. There is significant difference in DBV estimation, and the motional narrowing model results in DBV estimates that are closest to what is expected from the literature.
Another supplementary comparison of the L, 1C, and 2C models was carried out on the same data with only 11 τ values. Inter-subject grey matter average parameter estimates are shown in Supplementary Fig. S3. There a statistically significant difference (p < 0:001) between OEF 2 error, which is not strongly affected by σ 0 , except at σ 0 ðR ' 2 Þ 1. b) The effect of priors on DBV error, which diverges quickly at low σ 0 ðR ' 2 Þ, but is consistently at σðR ' 2 Þ ! 10. c) The effect of priors on OEF error, which was lowest for σ 0 ðR ' 2 Þ > 1. estimates made with the L model on 11-τ versus 24-τ data, but no significant difference between estimates made with the 1C or 2C models, suggesting that these more complete models are more robust to changes introduced by undersampling in τ.
Supplementary Table S1 (which parallels Table 3) shows the results of inference using the 2C model, with an assumed Hct of 0.34. DBV estimates were higher under the spatially-regularized 2C model with Hct ¼ 0:34, which is to be expected given the dependence on Hct in the intravascular compartment model's field inhomogeneity parameter G 0 (Equation (11)). OEF estimates were also higher with a lower Hct for the 2C model, both with and without spatial regularization. As expected, there was no difference in R ' 2 or DBV estimates based on Hct using the L or 1C models (not shown). Therefore, the OEF estimates from the L or 1C model were scaled as the ratio of the Hct values i.e.~17% increase in OEF with Hct ¼ 0:34.
Median grey matter free energy was used to compare the goodness of fit with each regularization method. Group-average free energy was À235 AE 53 and À208 AE 51 for 1C model (without and with spatial regularization, respectively), and À230 AE 58 and À207 AE 50 for 2C. Both these differences were statistically significant with p ! 10 À3 . There were no significant differences in free energy between the two models when the same regularization was used.

Improvements to model fitting using simulated data
Previously published work (Christen et al., 2014;Lee et al., 2018) has shown that the qBOLD model is not optimally suited for simultaneous estimation of OEF and DBV, which are important parameters in research or clinical assessment of brain metabolism. In this study, simulated data was used to demonstrate the collinearity between OEF and DBV that makes accurate estimation difficult. It also showed that the likelihood distribution in R ' 2 -DBV space (Fig. 1b) has visibly lower correlation between the parameters, providing the opportunity to accurately estimate them simultaneously. Furthermore, the distribution is relatively smooth and symmetrical in both the R ' 2 and DBV dimensions, so the assumption of a multi-variate normal distribution is reasonable, making these parameters suitable for VB analysis.
Simulated ASE-qBOLD data were used to optimise parameters for VB inference. In particular, prior standard deviations were chosen that resulted in the most accurate estimation across a broad range of OEF values (from 20% to 70%, encompassing both healthy and pathological values (Marchal et al., 1992)) and DBV values (from 0.3% to 15% (Roland et al., 1987)). The prior mean values were chosen based on relevant prior work (Stone and Blockley, 2017). It would also be possible to use values from other imaging modalities to define prior means. For example, a hyperoxia experiment could be used to estimate venous cerebral blood volume in grey matter (Blockley et al., 2013), which could be used to inform the prior on DBV. It was, however, shown that significantly changing the prior means does not have a detrimental effect on the accuracy of parameter estimation. This is to be expected given the deliberate choice of broad standard deviations for the priors.
Simulated data with a range of SNRs were also used to compare least-squares fitting of a log-linear model with VB inference on the asymptotic qBOLD model with one or two compartments. The model used for the second (intravascular) compartment is one that accounts for the fact that spins inside blood vessels diffuse over a significantly greater distance than the size of a red blood cell in TE, so they are in the motional narrowing regime (Berman and Pike, 2017). Fig. 3 showed that VB inference using the 1C or 2C models produced OEF estimates that were significantly more accurate than the L model at SNRs below 100, and estimated both OEF and DBV more accurately than the 1C and 2C models fit using least squares regression. Though Table 3 Parameter estimates of R ' 2 , DBV, and OEF using three qBOLD models: linear model (L) considering only tissue signal at long-τ and τ ¼ 0 data-points; a one-compartment model (1C) considering tissue signal across all τ; and a two-compartment model (2C) that includes intravascular signal. Both 1C and 2C models include spatial regularization. The table shows grey matter mean AE grey matter inter-voxel standard deviation, for 7 healthy subjects, and group mean AE group standard deviation. LS regression of the 2C model estimated R ' 2 more accurately than other methods, it also performed worse in DBV estimation, leading to poorer OEF estimates overall, compared with the VB implementation of the same model. Across all parameters, there was not a substantial difference in performance between the 1C and 2C models. This is likely to be because in voxels with normal DBV, the intravascular compartment contributes a very small amount to the overall signal, and so could potentially be ignored. Similarly, additional analysis showed that the 1C and 2C models were more accurate at estimating R ' 2 and OEF on sparser data, particularly at low SNR (see Supplementary Fig. S1). The difference in OEF error between L and 2C models in Fig. S1 is greater than in Fig. 3., suggesting that the use of this analysis framework is even more important when used with shorter acquisition times, such as in a clinical protocol.
One of the limitations of this study is the use of the analytical qBOLD model as the ground truth. Generation of synthetic qBOLD signals using a more detailed model incorporating different vascular compartments and vessel sizes would be more representative of in vivo data. Future work will concentrate on the integration of such models to validate and optimise new sqBOLD implementations.

In vivo validation
The same analyses were applied to GASE data from 7 healthy subjects, and grey matter mean parameter estimates were compared at the group level. Statistically significant differences were found between the L model and 1C and 2C models in R ' 2 and OEF estimation (but not between 1C and 2C), and between the L model and 1C model (but not 2C) in DBV estimation. The grey matter inter-voxel standard deviations of all parameters were significantly lower in VB estimation than the L model. This suggests that VB inference with a more complete model is less susceptible to fitting extreme values. Inter-voxel standard deviations are higher than the errors reported for simulated data of the same SNR because they also account for physiological variation across grey matter. The number of voxels with un-physiological values was reduced when using the 2C model compared with the 1C model, suggesting that modelling the blood signal provides an improvement to overall model fitting. The choice of intravascular signal model within the 2C model affected estimates of DBV, but not of R ' 2 or OEF, at the group level. The motional narrowing model was preferred, because it has been shown from a theoretical perspective to accurately model signal evolution in the blood following a refocussing pulse (Berman and Pike, 2017).  2 , b) DBV, and c) OEF, with error bars indicating inter-subject standard deviation, for the L model, and 1C and 2C models; (s) indicates spatial regularization. Two-way ANOVA and pair-wise comparisons show that the 1C and 2C models produce similar R ' 2 estimates as the L model, although the DBV estimates are different except in the non-regularized 1C model. The 1C and 2C models with spatial regularization produce estimates of OEF that are not statistically significantly different from those of the L model.
The expectation of spatial homogeneity of parameters can be incorporated into VB analysis using spatial regularization, in which the estimated posterior distributions of adjoining voxels are used as priors in an iterative process. The effect of spatial regularization was tested in both the 1C and 2C models. In both cases, though there was not a significant change in group-level parameter averages, the median free energy increased significantly, suggesting a better fit of the model to the signal. The resulting parameter maps (exemplified in Fig. 4) were also more homogeneous, and had fewer voxels where parameters had been estimated as having un-physiological values (such as OEFs above 100%).
The parameter estimates obtained from all methods are susceptible to partial volume effects from infiltrating white matter (where the assumptions of the qBOLD model do not hold), especially given the large voxel size used. By defining the grey matter segmentation threshold more strictly, it is possible to reduce this effect and to obtain a more accurate grey matter average, although the whole-brain maps still contain mixed voxels where estimates are not accurate. A model that describes the qBOLD signal in white matter could be used alongside a partial volume correction paradigm, as used in the analysis of arterial spin labelling data (Chappell et al., 2011), to correct for this effect.

Comparison of parameter estimates with literature values
Parameter estimates compared with literature values can be found in Table 4. The average grey matter R ' 2 estimate of 3.7 AE 0.4s À1 was higher than the 2.9 AE 0.4s À1 reported by He and Yablonskiy (2007) and the 3.1 AE 0.3s À1 reported by Simon et al. (2016). However, both of those apply the qBOLD model to GESSE data, which requires simultaneous quantification of R 2 and R ' 2 , whereas the GASE data used here is not sensitive to R 2 .
Grey matter DBV was here estimated to be 7.00 AE 0.55%, which is significantly higher than reported elsewhere, although there is considerable variation in the literature, with DBV ranging from 1.75 AE 0.13% (He and Yablonskiy, 2007) to 4.9 AE 2.0% (Lee et al., 2018) or up to 5% (An and Lin, 2003). This may be due to uncertainty introduced by assuming values of parameters used to calculate actual DBV from apparent DBV (Equation (12)), or due to the signal transient often observed at τ ¼ 0 (He and Yablonskiy, 2007, see, in particular, Fig. 1C). Here, signal is consistently higher at τ ¼ 0 than would be expected, which may lead to underestimation of apparent DBV in the linear model. The 1C and 2C models, which are not dependent on a single data point for their DBV estimation, should be more robust against this error than the L model. The 2C model, had less inter-subject variation in DBV estimates than the 1C model.
OEF estimates obtained using sqBOLD (with any analysis method) are significantly lower than those obtained using GESSE-qBOLD (He and Yablonskiy, 2007;Domsch et al., 2018) or non-qBOLD methods such as TRUST (Lu and Ge, 2008) and oxygen-15 tracer based PET (Derdeyn et al., 2001). This is in part due to the over-estimation of DBV that is consistently reported in ASE-based qBOLD studies. It may also be due to the assumed value of Hct ¼ 0:40. The supplementary analysis with Hct ¼ 0:34 led to an increase in estimated OEF to 21% (with the 2C model), which is closer to literature values, but was accompanied by an increase in estimated DBV to 8.4%, which is even further from what would be expected. The exact value of average Hct in GM is not known, although a reduction of around 15% from arterial Hct (around 0.40) is supported by Calamante et al. (2016). It is possible that larger veins (with Hct ¼ 0:40) may contribute more to the ASE qBOLD signal than smaller vessels with lower Hct. Similarly, this study assumed Δχ 0 ¼ 0:264 ppm (Spees et al., 2001) as opposed to Δχ 0 ¼ 0:18 ppm used in older studies (An and Lin, 2003). However, despite this systematic offset it has been demonstrated elsewhere that OEF estimates using this experimental technique are modulated by pathology (Stone et al., 2019).
OEF overestimation may also be due to error in R ' 2 estimation, which could be caused by diffusion in the tissue compartment, which is ignored in the static dephasing model (Yablonskiy and Haacke, 1994). Simulation and in vivo studies have shown that diffusion of water in brain parenchyma affects the BOLD signal measured under the GESSE sequence (Kiselev and Posse, 1999;Dickson et al., 2010) and ASE (Ni et al., 2015). A qBOLD model which accounts for diffusion in ASE, like that constructed by Dickson et al. (2010) may enable more accurate estimation of R ' 2 and DBV, although it requires quantification (or assumption) of the rate of diffusion, and the distribution of vessel sizes in grey matter. The GESEPI acquisition used (Yang et al., 1998;Blockley and Stone, 2016) reduces the effect of magnetic field inhomogeneities, which improves the physiological specificity of R ' 2 estimation, but does not perfectly remove them, which may explain why DBV estimates are higher than those obtained after retrospective magnetic field gradient correction (Yablonskiy, 1998).
The clinical utility of OEF estimation is primarily in quantitative inter-subject and inter-regional comparisons. The group level standard deviation of OEF among young healthy subjects reported here was similar to, or smaller than, those of other methods. Normalized variance, accounting for differences in mean OEF, is lower under spatially regularized VB than other methods. This suggests that the 2C, spatially regularized method proposed here may be useful in identifying differences across a population. Similarly, OEF is expected to be uniform across the healthy brain, and the spatially regularized 1C and 2C models produced smooth maps. These methods may, therefore, be useful in identifying regions of altered OEF, such as the ischaemic penumbra, by comparison with healthy tissue.
Other literature methods required precise initialization of parameter values in order to obtain reasonable results after model fitting. This is not required in a Bayesian framework, as broad, minimally-informative priors can be used to produce accurate results across a range of parameter values, as was shown in simulated data. Model-based fitting of sqBOLD data does, therefore, have considerable potential utility, although there are significant limitations which must be investigated further.

Conclusions
This study has shown that a Bayesian framework can be used to estimate parameters in the qBOLD model reliably and consistently, when compared with previous analysis methods. It provides additional information in the form of parameter uncertainties, allows for the possibility of using priors to inform estimates, and can incorporate spatial regularization which improves parameter distributions. The non-linear model used can be extended in a number of ways, such as including contributions to the signal from the intravascular compartment. The resulting parameter maps of R ' 2 and DBV can be used to quantify OEF, whose spatial distribution signifies clinically useful information, especially in the case of acute stroke, where it could be used to distinguish the  (He and Yablonskiy, 2007) 2.9AE0.4 1.75AE0.13 38.3AE5.3 qBOLD (Simon et al., 2016) 3.1AE0.4 --GESFIDE (Ni et al., 2015) 2.7AE0.4 --GESSE (Ni et al., 2015) 2.7AE0.3 --Neural network qBOLD (Domsch et al., 2018) -4.2AE0.1 40AE1 Conventional qBOLD (Lee et al., 2018) -4.9AE2.0 31.8AE6.0 Interleaved qBOLD (Lee et al., 2018) -3.1AE0.5 39.9AE3.3 TRUST (Lu and Ge, 2008) --35AE6 PET (Derdeyn et al., 2001) --41AE9 ischemic penumbra from the ischemic core (Hossmann, 1994;Liu and Li, 2016). Grey matter average parameter estimates obtained using VB have significantly lower variation than those of the simpler model, leading to more homogeneous distributions and more precise average values. The distributions are even more homogeneous, and the average values more precise, when using a model that includes the intravascular signal contribution. It has also been shown that ASE-qBOLD consistently estimates parameter values that are different from those obtained through other methods. In particular, DBV estimates tend to be higher than literature values, leading to lower OEF estimates. These differences could be the result of motional narrowing effects in the tissue compartment. Further work is required to determine whether these measurements can provide clinically relevant information without further correction. In addition, alternative qBOLD techniques, including those that use data acquired by other modalities such as GESSE, could be analysed in the same VB framework, in order to produce a comparison between the various methods for measuring R ' 2 (and hence, OEF) in vivo. Comparing other methods of OEF estimation (such as TRUST), or DBV estimation (using hyperoxia BOLD), in the same subjects, would also be very useful for validation.
The method presented here estimates parameters more accurately from simulated data, and leads to significantly less variance at the intrasubject level in vivo. Its utility could be reinforced by being applied to GASE data from clinical populations such as acute stroke patients, in order to distinguish between regions of the brain with varying OEF.