Bayesian Pharmacokinetic Modeling of Dynamic Contrast-Enhanced Magnetic Resonance Imaging: Validation and Application

Tracer-kinetic analysis of dynamic contrast-enhanced magnetic resonance imaging data is commonly performed with the well-known Tofts model and nonlinear least squares (NLLS) regression. This approach yields point estimates of model parameters, uncertainty of these estimates can be assessed e.g. by an additional bootstrapping analysis. Here, we present a Bayesian probabilistic modeling approach for tracer-kinetic analysis with a Tofts model, which yields posterior probability distributions of perfusion parameters and therefore promises a robust and information-enriched alternative based on a framework of probability distributions. In this manuscript, we use the Quantitative Imaging Biomarkers Alliance (QIBA) Tofts phantom to evaluate the Bayesian Tofts Model (BTM) against a bootstrapped NLLS approach. Furthermore, we demonstrate how Bayesian posterior probability distributions can be employed to assess treatment response in a breast cancer DCE-MRI dataset using Cohen's d. Accuracy and precision of the BTM posterior distributions were validated and found to be in good agreement with the NLLS approaches, and assessment of therapy response with respect to uncertainty in parameter estimates was found to be excellent. In conclusion, the Bayesian modeling approach provides an elegant means to determine uncertainty via posterior distributions within a single step and provides honest information about changes in parameter estimates.


Introduction
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a noninvasive imaging technique used to quantify microvascular tissue perfusion with the help of a contrast agent (CA) (Ingrisch and Sourbron, 2013). In MRI, a gadolinium-based CA is used most commonly and injected intravenously after the acquisition of pre-contrast baseline scans. The CA increases T1 and T2 relaxation rates of surrounding water protons and causes signal enhancement in a T1-weighted acquisition. By measuring multiple T 1 -weighted images during the passage of the CA through the tissue of interest, a time-dependent CA concentration can be extracted from the signal-time course of each voxel. Besides determining semi-quantitative and descriptive parameters from the concentration curves, e.g. time to peak, area under curve, or maximum, quantitative perfusion parameters can be obtained by fitting pharmacokinetic (PK) models to the data Buckley, 2012, 2013;Roberts et al., 2006). Popular PK models that characterize CA transport from DCE-MRI data are the classical Tofts model (TM) (Tofts, 1997), the extended Tofts model and the two compartment exchange model (Sourbron and Buckley, 2011).
The standard approach for estimating PK parameters from DCE-MRI data is using non-linear regression to determine a maximum likelihood estimator by non-linear least squares (NLLS) analysis (Seber and Wild, 2003). For this purpose, an optimizing algorithm aims to minimize the sum of squared residuals between model and data and yields, if successful, a point estimate of model parameters. The NLLS approach is widely used, and a number of software packages provide non-linear regression implementation of a range of PK models (Huang et al., 2014a;Beuzit et al., 2016). Bayesian probabilistic modeling, on the other hand, offers an alternative modeling approach within a framework of probability distributions. Briefly, a prior belief about model parameters is formulated as a probability distribution; this allows to incorporate domain expertise, e.g. physical constraints. With dedicated algorithms, this prior belief is then updated with the measured data and yields the posterior probability distributions of the parameters given the data (McElreath, 2015). Through recent algorithmic developments (Hoffman and Gelman, 2011) and the increasing availability of computational power, the use of Bayesian modeling approaches is spreading in various disciplines and has already shown to be a robust and accurate alternative for the analysis of MR imaging data (Schmid et al., 2006;Orton et al., 2007;Woolrich et al., 2009;Dikaios et al., 2017;Tietze et al., 2018;Hansen et al., 2019). The posterior probability distributions that result from Bayesian modeling greatly increase the interpretability of analysis results. Compared to simple point estimates, entire parameter probability distributions allow a straightforward assessment of, e.g., whether a parameter has truly changed in the course of a therapy, or whether the parameter change has only occurred within the uncertainty of the estimation (Shukla-Dave et al., 2018).
In the present manuscript, we investigated Bayesian tracer-kinetic modeling in the context of DCE-MRI. To this end, we implemented a Bayesian TM (BTM) with the purpose to i) evaluate accuracy against a NLLS approach using a digital reference object, ii) validate uncertainty estimates against a bootstrapped NLLS approach to assess the precision and iii) demonstrate how Bayesian posterior probability distributions can be used to assess treatment response in a breast cancer DCE-MRI dataset.

Signal Conversion and Pharmacokinetic Models
In a typical DCE-MRI experiment, time-resolved signal intensity curves S(t) are extracted voxel-wise from multiple T 1 -weighted images. To derive quantitative information, the measured signal intensities need to be converted to CA concentration curves. For this purpose, the signal equation for the spoiled gradient echo (SPGR) sequence in steady state can be used with the baseline signal S 0 (t), flip angle α, repetition time T R and relaxation rate R 1 (t) as: (1) One can solve Eq. (1) for the time-dependent relaxation rate R 1 (t): with the auxiliary variable A time-dependent concentration can then be calculated from the linear relation to the change in relaxation rates during and before administration of CA, R 1 (t) and R 10 , respectively: with the specific relaxivity of the gadolinium-based CA r 1 (Pintaske et al., 2006). A standard approach for the analysis of concentration-time curves in DCE-MRI data is the TM (Tofts and Kermode, 1991;Tofts, 1997;Sourbron and Buckley, 2011) which assumes a negligible amount of intravascular tracer and describes CA transportation as: Here, c t (t) is the time-dependent concentrations of CA in the tissue of interest; c p (t) is the concentration in the blood plasma of the tissue-feeding artery, often referred to as arterial input function (AIF). c t (t) and c p (t) are connected with a convolution, expressed as " * ". The parameter v e is the volume fraction of the interstitium, the extravascular extracellular space (EES). K trans is defined as the transfer constant of CA between blood plasma and EES. The rate constant k ep = K trans /v e is the ratio of the transfer constant to the EES (Tofts et al., 1999;Sourbron and Buckley, 2011).
The tissue concentration c t (t) can be calculated from the measured signal S(t) with Eq. (1-4) using the relaxation time T 10 in tissue via R 10 = 1/T 10 . Plasma concentration c p (t) in the AIF can be calculated likewise using the relaxation time T 10 of blood and the additional transformation from blood to plasma concentration via the hematocrit hct: The standard TM is used in the following within a classical NLLS likelihood framework and a Bayesian framework to quantify perfusion in simulated and measured DCE-MRI data. To account for noise in any observed data, an error term is added to the TM from Eq. (5) and an observation i is given as where the model parameters K trans and v e are summarized in the vector θ.

Validation: QIBA DCE-MRI Phantom
To evaluate accuracy of estimates and compare results of different fitting approaches, a simulated phantom with known PK parameters was investigated first. The Quantitative Imaging Biomarkers Alliance (QIBA) 1 provides several freely available test images for DCE-MRI analysis, known as digital reference objects (DRO). These have been used previously to validate various fitting algorithms and analysis toolkits (Ortuño et al., 2013;Smith et al., 2015;Debus et al., 2019 (1-4)). For a more realistic setting, complex Gaussian noise with standard deviation σ = 0.2 relative to the pre-contrast baseline signal S 0 was added to the original noise-free test data. No noise was added to the AIF for simplicity and to be able to reliably relate our results to published work of Smith et al. (2015) and Ortuño et al. (2013). Fig. 1 shows a snapshot of the DRO signal intensities at t = 100s, the AIF and an exemplary voxel with parameters K trans = 0.2 min −1 and v e = 0.2, respectively.

Application: Breast Cancer DCE-MRI Data
The Quantitative Imaging Network (QIN) aims at improving quantitative imaging and does so by sharing data which was acquired as part of various QIN studies, collected in The Cancer Imaging Archive (TCIA) 4 (Clark et al., 2013). A set of breast cancer DCE-MRI data (Huang et al., 2014b) in DICOM format acquired from 10 patients was used to demonstrate the performance of the BTM on clinical data. The dataset contains DCE-MRI measurements acquired before (visit 1) and during (visit 2) preoperative neoadjuvant chemotherapy (NACT), respectively. For three patients, pathologic complete response (pCR) was reported, the remaining seven patients were classified as non-pCR. In addition, the dataset includes a region of interest (ROI) per patient, drawn by an experienced breast radiologist. A sample-averaged AIF is provided as blood concentration c b (t) and was converted to plasma concentration c p (t) using Eq. (6). Signal intensities within the ROI were converted to tissue concentrations using Eq. (1-4). Parameters for the conversion are specified in Table 1, further details can be found in the original work by Huang et al. (2014a).

Non-linear Least Squares Approach with Bootstrapping
The standard evaluation of DCE-MRI data is performed in a likelihood framework by fitting a non-linear regression model to the concentration-time curve in every voxel. The NLLS approach minimizes the sum of squared errors between measured data y i at timepoint t i for i = 0, 1, ..., N and the model function to infer the best guess parameterθ. Assuming normally distributed noise i , the least-squares estimatorθ equals the maximum-likelihood estimator (Seber and Wild, 2003). An implementation of the Broyden-Fletcher-Goldberg-Shano (L-BFGS) algorithm (Byrd et al., 1994;Zhu et al., 1997) in SciPy 5 (Jones et al., 2001) was used for inference of the parameters via the optimize.minimize function. Initial values for K trans and v e were set to 0.001. The concentration curves of the DRO were then fitted and parameter maps were constructed for K trans and v e . By comparing them to the true parameter maps, percentage error maps were calculated as θ %err = (θ − θ true )/θ true .
A bootstrap method was implemented to assess the uncertainty ofθ (Kershaw and Buckley, 2006). For that, the residuals, i.e. the difference between the fitted and the measured curve were calculated. In a next step, the residuals were resampled by randomly drawing samples with replacement. Subsequently, the resampled residuals were added to the fitted curve and the TM was used to determine another set of estimates, equivalent to inferring the original best guess. The number of iterations was set to 1000.
Uncertainty maps were then calculated from the bootstrap samples for the NLLS approach. Denoted as σ, half the width between 17th and 83rd percentile was considered a more robust measure for the precision than the standard deviation and is used throughout this work. For samples following a Gaussian normal distribution, σ would be equal to the standard deviation.

Bayesian Inference and Implementation
The alternative evaluation is performed in a Bayesian framework which infers a full posterior distribution P (θ | y) of the model parameters θ given an observation of data y. The observational error i for each measurement y i at timepoint t i for i = 0, 1, ..., N in Eq. (7) is assumed to be Gaussian with standard deviation σ. Hence, the joint observations of CA concentration in each voxel, conditional on the parameters, are modeled in the likelihood as with N representing a normal distribution and c t (t i , θ) the CA tissue concentration evaluated with the TM in Eq. (5). Information about the parameters prior to the observation of data are specified in the prior distribution P (θ), enforcing physical or biological constraints. The likelihood of the data P (y | θ) and the product of the prior probability densities P (θ) are combined with the observed data to infer the joint posterior distribution via Bayes' theorem: The denominator in Eq. (10) is referred to as model evidence and calculates as P (y) = P (θ)P (y | θ)dθ. If the complexity of the model allows no analytical solution to this integral, Markov Chain Monte Carlo (MCMC) methods (Gilks et al., 1995) offer a means to determine the posterior probability distribution. Briefly, a MCMC algorithm draws samples from a target distribution, which equals the desired posterior distribution. The accepted parameter proposals are stored in a chain or trace of estimates (Kruschke, 2014).
The BTM was implemented in Stan (Carpenter et al., 2017), an open-source software package, using pystan 6 . In the present analysis, weakly informative priors were chosen. In particular, for the volume fraction v e ∈ [0, 1] a beta prior v e ∼ Beta(α = 2, β = 2) was chosen and for K trans ∈ R + a gamma prior was specified K trans (min −1 ) ∼ Gamma(α = 1.1, β = 1/0.002). The prior for the standard deviation of the observational error was set to σ(mmol/L) ∼ LogNormal(µ = 0, σ = 1). The appendix A provides a prior predictive check on these prior distributions. MCMC samples were drawn from the posterior distribution with the No-U-Turn (NUTS) algorithm (Hoffman and Gelman, 2011). The number of iterations was set to 1000, sampled in two chains simultaneously, following a warm-up period of 500 iterations. Stan also reports divergences of the sampling algorithm and indicates the need to update the default settings of NUTS, e.g. initial step size and target acceptance rate.
To monitor the convergence of the MCMC chains to the target distribution, different diagnostics are automatically run alongside in Stan. The potential scale reduction statistic,R, by Gelman and Rubin (1992) compares the sample variance within and across chains, and indicates if chains have not converged to a common distribution (R > 1.1). The effective sample size N eff indicates the degree of uncertainty in estimates due to autocorrelation of samples (Geyer, 2011).
All concentration curves of the DRO were then fitted with the BTM to obtain posterior probability distributions of the parameters θ. To be able to compare the distributions to point estimates and to generate parameter maps, two hallmarks of the posterior distributions were determined: the median and, as for the bootstrap samples, half the distance between the 17th and 83rd percentile, denoted as σ. By comparing the median parameter maps to the true parameter maps, a map of the percentage error was calculated as above to assess the accuracy of estimates.
To evaluate the breast cancer DCE-MRI datasets, the mean tissue concentration curve over the ROI c t,ROI (t) was calculated for each patient and both visits. Subsequently, all concentration-time curves were fitted with the BTM to infer posterior distributions for the model parameters θ. To ensure that the model adequately captured the underlying data generating process, a posterior predictive check (PPC) was performed. Briefly, we used the BTM to generate new predictive dataŷ and checked if it resembled the observed data. The full posterior distribution is exploited in this way to generate a posterior predictive distribution P (ŷ|y) = P (ŷ|θ)P (θ|y)dθ, 6 Python 3.6.6, pystan 2.18.0, https://pystan.readthedocs.io/ which propagates the uncertainty in the parameter estimates to uncertainty about prediction (Betancourt, 2015;McElreath, 2015;Gabry et al., 2017). In this way, PPCs allow to detect systematic modeling errors and violations of model assumptions. Subsequently, the posterior distributions of K trans were compared across visits for all patients with the objective to discriminate between patients with pCR and non-pCR.

Statistical Analysis
A quantitative statistical measure for signal fidelity is the structural similarity index (SSIM) (Wang et al., 2004). It gives an average value over similarities of three key elements of an image: luminance, contrast and structure (Zhou Wang and Bovik, 2009). To assess the accuracy of parameter estimates for the DRO, the SSIM was calculated between the estimated and the true parameter maps. As a comparison, the rootmean-squared error (RMSE) was calculated alongside. In order to get reasonable values for RMSE, outliers in K trans -estimates obtained from NLLS fitting needed to be restricted to one. In addition, the SSIM was calculated between the σ-uncertainty maps determined with the BTM and the bootstrapping method to assess similarities in the precision of estimates.
To compare the K trans posterior distributions between visits for the breast cancer DCE-MRI dataset, Cohen's d was calculated for each of the ten patients as: x represents the average K trans value per visit, σ its standard deviation. In this way, the width of the posterior distributions are incorporated into a single value. Compared to just reporting the percentage change of K trans mean values, the uncertainty in parameter estimation is accounted for. An univariate logistic regression (ULR) model, implemented in scikit-learn 7 (Pedregosa et al., 2011), was fitted to the Cohen's d values. The receiver operating characteristic (ROC) area under curve (AUC) was calculated in order to obtain a quantitative measure for the assessment of response.

Validation: QIBA DCE-MRI Phantom
Concentration-time curves of the DRO were evaluated within a Bayesian and likelihood framework. The resulting parameter estimates for K trans are exemplarily shown in Fig. 2 for the Bayesian approach (a) and the NLLS reference (d). Note that the voxels in the Bayesian framework show median values of their respective posterior distributions while voxels in the likelihood framework represent point estimates. In general, the parameter maps show high accordance with the true values. The corresponding percentage error maps in the middle column (b) and (e) display relatively low errors for all regions with v e > 0.01 for both methods. Low accuracy, hence high percentage errors are observed for regions where v e = 0.01. SSIM between estimated and true K trans -maps is higher for the BTM than for the NLLS approach. Furthermore, RMSE is lower for the BTM for both PK parameter maps. Details are provided in Table 2. The right column of Fig. 2 displays the precision of the parameter estimates evaluated with the BTM (c) and a bootstrapping method applied to the fit results of the NLLS approach (f). The visual analysis of the uncertainty maps reveals very similar patterns for both approaches, supported by a SSIM of 91%. The highest uncertainty occurs in regions with the highest percentage error for the fitting parameter estimates. The remaining parameter combinations have much greater precision. Information about divergences (BTM) and pixels where the NLLS algorithm did not find a solution can be found in Table 3, together with computational times for fitting all 3000 pixels with BTM and NLLS approaches and the additional bootstrap analysis.   Fig. 3 shows representative signal intensity-time curves with the associated PPCs (a-c) and their corresponding K trans posterior distributions (d). Here, the dark line illustrates the median and the increasingly lighter bands are the 20%, 40%, 60% and 80% highest density intervals (HDI) between the corresponding (0.4,0.6), (0.3,0.7), (0.2,0.8) and (0.1,0.9) percentiles of the posterior predictive distribution. The PPC in (a) indicates a good fit of the model to the data, the corresponding posterior distribution (green) for K trans is narrow. The PPC in (b) suggests that the chosen model provides a good fit to the data, the high noise level in the data is associated with a broader posterior distribution (orange). In (c), the noise level of the data is comparable to (a), however the PPC indicates a modeling error. Fig. 4 shows the posterior distributions of K trans for all patients for visit 1 (blue) and visit 2 (orange), before and during NACT, respectively. With one exception, a general decrease in K trans is observed. The degree of change, dependent on the width of the posterior distributions, is summarized in Cohen's d values and visualized in Fig. 5; light-gray represents non-pCR, dark-gray pCR. The ULR analysis revealed a ROC AUC of 0.952. Computational time for fitting all 20 ROI-averaged concentration curves was ∼ 20 s for the BTM.   Figure 3: Observed signal intensity-time courses (dots) with posterior predictive distribution median (line) and the 20%, 40%, 60% and 80% highest density intervals (HDI); (a) for a good fit to data with low noise level, (b) for a good fit to data with high noise level, and (c) for a bad fit to data with low noise level. (d) Posterior distributions for K trans estimated from the respective signal intensity-time curves (color-coded).

Discussion
In this study, we assessed posterior probability distributions of tracer-kinetic parameters obtained with a BTM against a standard NLLS approach. Validation with a DRO revealed high accuracy of BTM and NLLS approaches, indicated by strong similarity between estimated and ground truth maps. In addition, precision of estimates, assessed via the width of the posterior probability distributions and bootstrapping, respectively, was in very good agreement between both approaches. Analysis of the breast cancer DCE-MRI dataset with the BTM revealed that the degree of decrease in K trans gives information about the pathologic  response to NACT. The response in dependence of the uncertainty of parameter estimates was quantified with Cohen's d, calculated from the posterior distributions between visit 1 and 2. ULR modeling indicated excellent prediction of response.
Concerning the analysis of the DRO with the BTM, median parameter estimates were compared to the ground truth to assess the accuracy, otherwise not available with measured data. It was found that the Bayesian estimates generally have a very strong similarity with the ground truth, validating the accuracy of our BTM. The recovered parameters also have complementary regions of high and low percentage errors compared to the established NLLS fitting routine. RMS errors were lower for both implementations in the present work compared to similar DRO analysis by Smith et al. (2015) and Ortuño et al. (2013). Albeit, the results are in good comparison. Caution is still required for voxels with low v e . Concentration curves with these parameter combinations have very limited intensity changes which practically vanish in the added background noise.
The variance of estimates inherent in the Bayesian posterior distribution was compared to a bootstrapping error analysis, performed likewise to the work of Kershaw and Buckley (2006). It was demonstrated that the uncertainty maps of the BTM resemble those calculated with the bootstrap analysis, validating the precision of parameters recovered with the BTM. To the best of our knowledge, only Schmid et al. (2006) implemented a Bayesian PK model with the objective to make use of the posterior probability distribution. They used it to state the probability of a pixel to be greater than a certain threshold for tumor masking. Their approach was applied to patient data, whereas in the present work, accuracy and precision of estimates were validated with a digital phantom first.
Furthermore, we applied the BTM to the breast cancer DCE-MRI data, performed PPCs and investigated the posterior distributions. For a PPC, the observed data was compared to the posterior predictive distribution, illustrated as percentile intervals of highest density. A good fit to the data results in a posterior distribution which reflects the noise level in the data; low noise corresponds to a narrow posterior and vice versa. However, a bad fit to the data results in a broad posterior distribution despite a low noise level. This indicates a systematic modeling error which influences the information we gained about uncertainty. More complex PK models which incorporate additional assumptions about CA transport, e.g. the extended Tofts model, could be able to produce a better fit to certain data. Hence, assessing posterior distributions requires to check the corresponding data and fit before drawing any conclusions from it. While feasible for ROI-based analysis with only a handful of concentration curves, visual assessment is not possible in a pixelwise analysis. An automated Bayesian model selection step as proposed in the work of Duan et al. (2017) could be an effective means to reduce systematic modeling error but is beyond the scope of this study.
In order to assess therapy response for the patients in the breast cancer DCE-MRI dataset, Huang et al. (2014a) showed in their original work that using visit 2 K trans or the percentage change of K trans between visits as metrics yields good to excellent results. However, the uncertainty in estimating PK parameters with tracer-kinetic models is not accounted for. For this purpose, we calculated Cohen's d as a means of quantitative change in parameter estimates which depends on the precision of estimates. Using Cohen's d metric, the assessment of response was found to be excellent by means of an ULR analysis. Considering the findings of the PPCs, including a model selection step as explained above could decrease the influence of systematic modeling errors on posterior distributions and hence Cohen's d values which may further improve assessment of therapy response.
Limitations of the present work include large computational time when fitting the BTM to the DRO-data. On the one hand, the MCMC sampling is time and memory consuming but necessary to avoid divergences. On the other hand, it yields a full posterior probability distribution with information about the uncertainty, and obtaining the same information with a bootstrap analysis of a NLLS fit requires even more computation time. Furthermore, the simulated DRO curves have a much higher time-resolution compared to measured data. Evaluating real DCE-MRI data increases the speed of the analysis greatly. Moreover, the influence of the chosen prior distributions on the results was not assessed in the present study.
In conclusion, we evaluated a BTM with a DRO, assessed accuracy and precision against the standard NLLS approach and showed how posterior distributions are used to assess therapy response. We demonstrated that Bayesian modeling provides an elegant means to assess posterior probability distributions, which are in good agreement with established approaches.

Acknowledgments
Funding: This work was supported by the research training group GRK 2274 of the DFG, Deutsche Forschungsgemeinschaft. To assess if the choice of prior distributions for the model parameters covers a reasonable range of concentration-time curves, it is useful to perform a prior predictive check. For this purpose, we generated 100,000 MCMC samples from the prior predictive distribution,

A Prior Predictive Check
only considering the prior distributions without any actual data. This quantifies the range of possible observationsŷ, predicted by our model. In a prior predictive check, the predicted data is compared to real observations and the extent of extreme observations indicates the level of disagreement between domain expertise and model assumptions. Fig. A.1 shows the probability density functions of the chosen priors (a-c) and the prior predictive check (d). The black dots are actual observed data from the QIBA phantom, one curve for each parameter combination of K trans and v e , to assess the scope of possible phantom curves. The increasingly lighter green bands represent the 20%, 40%, 60% and 80% highest density intervals between the corresponding percentiles of the prior predictive distribution; the green line is the median thereof. We find that the model predicts observations that are more extreme than the phantom data but not too extreme to be unrealistic given the assumed observational error. Hence, we conclude that the chosen prior distributions are reasonable.