Magnetic resonance-based computed tomography metal artifact reduction using Bayesian modelling

Metal artifact reduction (MAR) algorithms reduce the errors caused by metal implants in x-ray computed tomography (CT) images and are an important part of error management in radiotherapy. A promising MAR approach is to leverage the information in magnetic resonance (MR) images that can be acquired for organ or tumor delineation. This is however complicated by the ambiguous relationship between CT values and conventional-sequence MR intensities as well as potential co-registration issues. In order to address these issues, this paper proposes a self-tuning Bayesian model for MR-based MAR that combines knowledge of the MR image intensities in local spatial neighborhoods with the information in an initial, corrupted CT reconstructed using filtered back projection. We demonstrate the potential of the resulting model in three widely-used MAR scenarios: image inpainting, sinogram inpainting and model-based iterative reconstruction. Compared to conventional alternatives in a retrospective study on nine head-and-neck patients with CT and T1-weighted MR scans, we find improvements in terms of image quality and quantitative CT value accuracy within each scenario. We conclude that the proposed model provides a versatile way to use the anatomical information in a co-acquired MR scan to boost the performance of MAR algorithms.

. Other contributions are more model-independent, such as the photon starvation in the metal projections that leads to noise artifacts (Nuyts et al 1998, Gjesteby et al 2016.
MAR algorithms may be categorized into three overall approaches. Image inpainting algorithms replace corrupted CT values with better estimates by post-processing already reconstructed images (Lell et al 2013). Sinogram inpainting algorithms, on the other hand, replace metal projections by estimates that fit the reconstruction model to a higher degree, which may be particularly effective in dealing with photon starvation (Kalender et al 1987, Meyer et al 2009, Zhang et al 2011, Lell et al 2012, Meyer et al 2012. Finally, model based iterative reconstruction (MBIR) algorithms change the CT reconstruction model itself to a more complex probabilistic forward model that better accounts for the artifact sources, at the cost of having to optimize a generally non-linear image functional in a slow, iterative algorithm (De Man et al 2000, Elbakri and Fessler 2002, Thibault et al 2007, Hamelin et al 2008, Stayman et al 2012, Van Slambrouck and Nuyts 2012, Nuyts et al 2013, Tilley et al 2016, Jin et al 2015, Stille et al 2016, Fu et al 2017. An important part of many MAR algorithms is the inclusion of prior information about the image that is being estimated. While this is especially true for image inpainting algorithms, which impose such information directly in image space, prior information is also used in some of the most successful sinogram inpainting algorithms. An example is the 'normalized MAR' (nMAR) method (Meyer et al 2009(Meyer et al , 2012, which replaces metal projections by (scaled) projection estimates simulated from a template image. MBIR models also include prior information as they consist of two parts: a sinogram data likelihood that addresses, e.g. noise and beam hardening artifacts by modelling the detector noise, the x-ray source spectrum and the implant material; and an image prior distribution that may be used to guide the reconstruction with statistical knowledge about the image being reconstructed , Elbakri and Fessler 2002, Hamelin et al 2008, Van Slambrouck and Nuyts 2012, Nuyts et al 2013, Tilley et al 2016, Fu et al 2017.
In a general CT setting, the quality of the available prior information is rather limited. In sinogram inpainting, for example, the required template is typically generated by post-processing a CT image reconstruction that is heavily corrupted by metal artifacts (Meyer et al 2009, Lell et al 2012, Meyer et al 2012. Being generated from corrupted data, the resulting prior information may itself be compromised, thereby introducing new artifacts in complex, highly corrupted regions such as the head and neck near the teeth and oral cavity. Similarly, in MBIR the limited availability of accurate prior information often motivates the use of relatively simple functional priors that merely impose mathematical regularities in the reconstructed image (De Man et al 2000, Webster Stayman and Fessler 2001, Pan et al 2010, Makeev and Glick 2013.

Contributions of this paper
In the specific context of RT, a potential source of prior information for improved MAR is the magnetic resonance (MR) image that is commonly co-acquired for tumor-and normal tissue delineation. Since metal artifacts are often more localized in MR compared to CT (Park et al 2015, Bell andBashir 2018), the MR scan can provide superior anatomical prior information in regions that are heavily corrupted in CT (Jonsson et al 2010, Bell andBashir 2018). Furthermore, since co-registration and acquisition of the MR and CT scans in the same patient fixation is already part of the tumor-delineation process, this can be done with minimal interruption to the existing clinical workflow.
Despite the obvious potential, to the best of our knowledge only a few prior attempts have been made to use MR for reducing CT metal artifacts, most notably the image inpainting algorithms described in Anderla et al (2013), Delso et al (2013), Park et al (2015). The principal difficulty faced by such MR-based approaches to CT MAR lies in the image contrast disparities between the two modalities, especially between bone and air, which are easily distinguishable in the CT but not in the MR scan unless dedicated sequences are used (Delso et al 2013). Additional difficulties arise from co-registration issues, mainly due to to inter-acquisitional motion, which limit the accuracy of MR-based predictions.
In order to overcome these difficulties, we propose a novel Bayesian algorithm to predict uncorrupted CT values from a combination of corrupted CT measurements and corresponding MR intensities. The contributions of this work can be summarized as follows:  Delso et al 2013, Park et al 2015, our method automatically blends MRbased predictions with the original, corrupted CT values. As we demonstrate experimentally, this way of retaining the information of the original CT helps to discern bone from air as well as to address coregistration issues.
3. Rather than directly targetting only one particular MAR strategy such as image inpainting (Anderla et al 2013, Delso et al 2013, Park et al 2015, the predictive model developed here can be used flexibly within several widely-used MAR scenarios. In particular, we demonstrate experimentally the benefit of using the proposed model to directly calculate blended CT value replacements and thus perform image inpainting; to generate a template for sinogram inpainting; and to define an image reconstruction prior for MBIR. 4. In order to facilitate clinical adoption, our model uses data from only the patient under consideration and automatically tunes its parameters to fit this specific patient. We demonstrate how the model thereby accounts for variations in patient-specific features, such as the severity of the artifact corruption as well as variations in the MR sequence parameters. It also removes the need for external training data of wellaligned MR and CT image pairs that is commonly required for MR-based CT prediction in the literature (Dowling et al 2012, Andreasen et al 2015, Torrado-Carvajal et al 2015, Han 2017, Torrado-Carvajal et al 2019. The paper is structured as follows. We first derive and discuss the proposed predictive model, cover the automatic parameter estimation and present three MAR algorithms that are based on this model (section 2 'Methods').
Next, (in section 3 'Experiments') we describe the practical implementation of the three MAR algorithms, and benchmark their performance in comparison to conventional alternatives; the results are afterwards presented in sequence for the three MAR algorithms (section 4 'Results'). We end with a final summary of our findings and main conclusions, followed by a discussion of, in particular, the potential of clinical implementation of our methods (section 5 'Conclusion and Discussion'). An early version of our approach, with only qualitative (visual) results on a small set of patients, was previously presented (Nielsen et al 2018). The current manuscript provides substantially more technical derivations and analysis, in particular putting more focus on the self-tuning of the model parameters; proposes a novel sinogram inpainting variant as well as an improved MBIR algorithm; and contains detailed quantitative results on a larger set of patients.

Predictive model
Let a set of voxels covering a patient volume be assigned indices i from the index set T (i ∈ T ), and denote the voxel center locations as {x i } i∈T . To avoid cluttered notation, we will suppress the index set and use {·} ≡ {·} i∈T in the remaining. Consider now a CT volume reconstructed using FBP (Buzug 2008), with per-voxel FBP values {t i }. Assuming the availability of an MR image that is co-registered with the FBP reconstruction, we extract a set of small cuboidal MR volumes ('patches') centered at {x i }, with intensities stacked in the vectors {m i }. The values in {t i } are potentially corrupted by metal artifacts, and our task is to estimate the underlying true CT values, {y i }, given the observation of {t i , m i }.
To accomplish this, we infer the posterior predictive distribution p({y i }|{t i , m i }, σ), where σ denotes the set of model parameters, from a generative, probabilistic model of the corrupted CT values contingent on the observed MR scan. We specifically model the two-step process by which the uncorrupted CT values are first sampled from a prior distribution, assumedly in an independent fashion for each voxel: and then corrupted by sampling from a noise model p({t i }|{y i }), again independently for each voxel: Under this generative model, applying Bayes' rule yields: The predictive distribution of equation (1) depends on an FBP-dependent likelihood function p(t i |y i , σ) and an MR-determined prior distribution p(y i |m i , σ), which we detail below.

Likelihood
The likelihood p(t i |y i , σ) models the distribution of the corrupted CT value given the underlying true CT value. Introducing the maximal noise variance σ * t 2 , we model the artifact corruption as additive Gaussian noise with variance σ 2 t smoothly decreasing with the distance to the implants, such that it is equal to σ * t 2 within the implants but 0 far away where the FBP is free of artifacts. In particular: Here, D ⊥ (x i ) is the perpendicular spatial distance to a segmentation of the metal implants; N (·|ψ, ν 2 ) denotes a Gaussian with mean ψ and variance ν 2 ; and κ sets the decrease rate of the sigmoidal function f (x i ). As an illustration, consider the two voxels at different distances to the metal in figure 1(a), for which the noise models (figure 1(b)) have different widths due to the modulation with f (x i ) (figure 1(c)). The metal segmentation and κ parameter must be defined by the user; section 3.2 will present the definitions used for our experiments. The maximal artifact noise variance σ * t 2 will however be chosen automatically given the observed data, as described in section 2.3.

Prior
Letting T u ⊆ T denote an assumed uncorrupted part of the patient volume (see figure 2(a)), we learn the prior distribution p(y i |m i , σ) from all matched CT/MR pairs {y n , m n } n∈Tu in T u using kernel regression (Bishop 2006). In particular, we estimate the joint distribution p(y i , m i |σ) with kernel density estimation (Bishop 2006) using Gaussian kernels with diagonal covariance matrices: and obtain the prior distribution p(y i |m i , σ) accordingly: Here, N (·|ψ, Σ) denotes a multivariate Gaussian with mean ψ and covariance matrix Σ, I M is the identity matrix of dimension M (the number of voxels in a patch) and | · | denotes set cardinality. σ 2 y and σ 2 m are the kernel variances, which, together with the artifact noise variance σ * t 2 (see section 2.1), will be automatically estimated given the observed data as described in section 2.3. We obtain the required uncorrupted part of the patient volume T u (as well as its complement T c ) by thresholding f (x i ) as shown in figure 1(c), leading to the following sets: which are illustrated in figure 2(c). The kernel regression process is illustrated in figure 2(b) for the special case where 1x1x1 patches are used. After estimating p(y i , m i |σ) using uncorrupted CT/MR samples {y n , m n }, the prior distribution p(y i |m i , σ) corresponds to drawing a trace at the observed m i and normalizing, leading to a one-dimensional Gaussian mixture model that displays several peaks along the CT-axis. The mixture contains a large number of Gaussians, i.e. one for every sample in {y n , m n } n∈Tu . Since the samples in practice are clustered corresponding to tissue types, as illustrated in figure 2(b), the effective number of modes is however automatically reduced. In this way, kernel regression automatically performs an implicit tissue classification and reflects it in the prior model.
The expectation value of p(y i |m i , σ) presents a way to calculate a (Bayesian) prior estimate of the uncorrupted CT:ŷ shown are a few particularly good matches that would have a large weight in the prior model. (b) On the regression point set, kernel density estimation is used to define the joint distribution p(y i , m i |σ) (shown as a surface for 1 × 1 × 1 patches). The red curve corresponds to p(y i |m i , σ), results from kernel regression and is a normalized trace on the kernel density estimate surface at a specific m i . (c) Illustration of the corrupted voxel set T c (equation (9)), with the uncorrupted set T u as its complement.
This straighforward pseudo-CT (Andreasen et al 2015, Hofmann et al 2008) (pCT) estimate, which is dependent only on the MR scan, will serve as a baseline for our experiments.

Posterior predictive distribution
Using the prior and likelihood that we have now defined, the posterior in equation (1) may be calculated. Defining the parameter set σ ≡ {σ * t , σ y , σ m } and plugging in equations (7) and (4): Using that: with the normalizing factor (equation (3)) becomes: and we get: with weights: Equation (14) is our main result. We will later use it to calculate a Bayesian estimate of {y i } via its expectation: which will be used in the proposed MAR methods.

Interpreting the posterior
The means and weights {µ i n , v i n } n∈Tu in the Gaussian mixture of equation (14) are similar to the ones in the prior model of equation (7), but are now determined by both the potentially corrupted FBP value t i and the regression points {y n , m n } n∈Tu . How the two are blended is determined by the exact values of the parameters σ, which are therefore critical for the shape of the posterior.

Kernel regression parameters
The influence of the kernel regression variances σ 2 y and σ 2 m is isolated to the prior p(y i |m i , σ), and is as illustrated in figure 3(a).
The σ 2 y parameter controls the width of the Gaussians in the mixture of equation (7). This affects the modelled CT value variance within the implicit tissue classes defined by the kernel regression, which is reflected by the width of the peaks in the prior.
The σ 2 m parameter scales the squared MR image patch differences in the Gaussian weights w i n , such that decreasing it translates to more weight on a smaller set of Gaussians with smaller m i − m n . Compared to larger settings (figure 3(a), right), a small value (figure 3(a), left) means that only patches in uncorrupted areas that have very similar intensity profiles contribute to the prediction, which shows as an amplification of the corresponding peaks in the prior.

Artifact noise variance
The influence of σ * t 2 is isolated to the likelihood p(t i |y i , σ) and governs the expected level of corruption near the metal voxels (near the metal, . For a given voxel, σ 2 t directly determines the width of the likelihood and thus how the prior is modified to yield the posterior. Typical values of σ 2 t lead to posteriors such as those in figure 3(b), where peaks near the observed FBP value t i are amplified.
In the special case where σ 2 t is very large, however, the likelihood becomes effectively a constant, so that the prior dominates the estimate and µ i n ≈ y i n , v i n ≈ w i n , and the estimate in equation (16) therefore approaches the pCT in equation (10) ). The purely MR-based pCT thus arises as a special case of equation (16). In the other extreme where σ 2 t → 0, the likelihood becomes a sharp function, while µ i n → t i and ŷ i → t i . In that scenario the estimate therefore simply copies the FBP without reference to the MR scan. The FBP is thus another special case of equation (16).

Automatic parameter tuning 2.3.1. Parameter tuning
As we have seen, the σ parameters critically alter the shape of the posterior as well as the predictions derived from it, and so it is important that they are chosen appropriately. The observed data themselves may be used as a guide for this process; for instance, a highly corrupted FBP would suggest a large σ * t 2 while a certain degree of CT value variation between voxels containing the same tissue type would suggest a certain setting for σ 2 y . We will now use a Bayesian method to automatically pick the parameters that best explain the observed data given our probabilistic model.
Equations (6) and (4) together define a joint distribution over the observed data {t i , m i } together with the unobserved variables {y i }: Marginalizing equation (17) over the unobserved variables {y i }, using equations (6) and (4) for the components, yields In the last step, we used equation (12) to carry out the integral. Given the observed variables {t i , m i }, appropriate parameter values σ may now be obtained by maximizing equation (18), effectively fitting the model to the available data. We perform the optimization using an iterative expectation-maximization (EM) algorithm (Dempster et al 1977) that only requires closed-form updates. Starting from an initial setting of σ, EM alternately applies an expectation (E) and a maximization (M) step. In the E-step, a lower bound to the objective function log p({t i , m i }|σ) is constructed given the current estimate of σ, derived directly from Jensen's inequality (Borman 2009): where v i n are the weights in equation (15) and we defined: . The lower bound ( * ) is, by design, equal to log p({t i , m i }|σ) at the current parameter estimate, so when it is maximized during the M-step, the objective is guaranteed to increase (Dempster et al 1977). In general, the nonanalytic f (x i ) prevents a closed-form formulation of this maximization, and so we simplify the objective by thresholding at f = 0.5, as illustrated in figure 1 5 . This removes f (x i ) from the equations while splitting T into the corrupted and uncorrupted sets T c and T u , leading to a closed-form M-step and algorithm 1: Algorithm 1. Automatic parameter estimation.
2: while δ > 10 −3 do 3: σ 0 ← σ 4: E-step: Calculate v i n , ∀i ∈ T and ∀n ∈ T u , using equation (15). 5: M-step: Update the parameter estimates: 6: The calculation of v i n in the E-step may be viewed as a probabilistic assignment of the data points to the tissue classes that were implicitly defined during kernel regression. Given this classification, the update equations in the M-step estimate the within-class variance, over different parts of the patient volume: • σ 2 y is calculated over the uncorrupted set T u , which reflects the observed noise in the CT values within a tissue class in the absence of the metal artifacts. • σ * t 2 updates to the additional variance in the metal artifact corrupted volume T c , reflecting the level of artifact corruption.
• σ 2 m updates to the MR image patch variance over the entire volume.
During the iterations, the update of the parameters in the M-step improves the soft classification in the E-step as the model increasingly fits the data, which in turn improves the parameter estimates. This continues until the objective is maximized and the parameters stop changing.

Application to MAR: three algorithms
We propose to use the posterior predictive distribution p(y i |t i , m i , σ), specified in equation (14) and with parameters σ automatically tuned using algorithm 1, as the basis for three different MAR algorithms. Specifically, we define an image inpainting method that simply uses the mean of the predictive distribution; a sinogram inpainting method that uses the image inpainting result as a template to estimate the metal projections; and an MBIR algorithm that uses the full distribution p(y i |t i , m i , σ) as a reconstruction prior together with a Poisson likelihood model of the x-ray intensity measurements.

Image inpainting
The image inpainting method that we propose, which we will call kernel regression MAR (kerMAR) in the remainder, directly calculates a CT estimate as the mean of p(y i |t i , m i , σ), which is given by equation (16), with one exception: the metal implants, as segmented for the definition of f (x i ) (see section 2.1), are left untouched. For a comparative evaluation of our model, we also define a similar algorithm that we call pCT, which instead uses the mean of p(y i |m i , σ), given by equation (10). This latter algorithm is entirely MR-dependent, effectively ignoring the information in the available FBP reconstruction.

Sinogram inpainting
The sinogram inpainting method we propose builds upon the nMAR algorithm (Meyer et al 2009(Meyer et al , 2012, which uses simulations on a template image to replace the metal projections. Rather than directly replacing the metal projections, nMAR smoothly integrates the simulated metal projections in the sinogram as follows: the ratio of the original sinogram and the simulation is calculated. In the 'sinogram' of ratios, the metal projections, labelled by a similar projection simulation on a binary mask derived from the metal implant segmentation, are then linearly interpolated between the nearest-lying values. This provides a set of weights that reflect the deviations between the simulation and original. These are used to weigh the simulated metal projections to smooth the transition over the metal implants between the original projections and the simulated ones. The resulting, inpainted sinogram is finally used for a reconstruction with FBP, and the image is post-processed: in particular, since the template does not in general reproduce the metal implants accurately, their contribution to the metal projections is not accurately simulated, leading to errors near and in the metal implants. The implant segmentation is therefore used to reintroduce the original metal CT values, upon which a frequency split (Meyer et al 2012) is performed to preserve high frequency information (details) near the implants. Our MR-based method, nMAR-k, simply uses our inpainted kerMAR image as the template within the established nMAR framework. For comparison, we also apply the nMAR algorithm with a conventionally generated FBP-derived template, calculated by thresholding using K-means clustering (Bishop 2006) on an initial linear interpolation MAR (liMAR) (Kalender et al 1987) image, followed by bulk CT value assignment (Meyer et al 2009, Lell et al 2012, Meyer et al 2012.

MBIR
MBIR maximizes the posterior distribution of a CT reconstruction given the acquired x-ray intensity data and a reconstruction prior, for which we propose to use our predictive distribution: where p(y i |t i , m i , σ) is given by equation (14). Here {n j } is the set of x-ray intensity measurements for difference paths through the patient volume. Following (Nuyts et al 1998), we use a Poisson likelihood: where Γ j is the emitted x-ray count toward detector j , and L is the system matrix whose entry l j ,i defines the intersection between the x-ray path j and voxel i. To maximize the reconstruction posterior in equation (19) we use the iterative MLTR algorithm (Nuyts et al 1998, Van Slambrouck andNuyts 2012) We will refer to the resulting reconstruction algorithm as MLTR-k.

Materials
We evaluated the three proposed algorithms on an anonymized retrospective data set of nine head-andneck radiotherapy patients containing dental implants and/or fillings. The image sets had a resolution of 1.2 × 1.2 × 2.0 mm (CT) and 0.5 × 0.5 × 5.5 mm (MR). The MR images were resampled to the CT resolution after rigid, multi-modal co-registration by mutual information (Wells et al 1996, Maes et al 1997 using the MatLab (version R2019a) image processing toolbox (Mathworks 2019). The CT scanner was a Philips Brilliance Big Bore with kVp120, the MR scanner a Philips Panorama 1.0T HFO. The patients were MR scanned with a T1weighting 2D spin-echo sequence at TE =10 ms and TR =520.2-572.2 ms (varying between scans). The patients were positioned in the same fixation for both MR and CT scans. For MBIR and sinogram inpainting, we exported the sinograms from the CT scanner.

Practical implementation
A generic procedure for implementing the three MAR algorithms is described in algorithm 2: Algorithm 2. Summary of implementation.

4: Obtain the uncorrupted set T u by thresholding
f (x i ) at 0.5 (see equation (9)). 5: Estimate σ using algorithm 1, then calculate the weights {w i n } n∈Tu using equation (8) and {µ i n , v i n } n∈Tu using equations (13) and (15) For our experiments, we implemented the various steps as follows: Step 1: We acquired the FBPs as reconstructed by the vendor-provided scanner software.
Step 3: We found the exact value of κ in the expression for f (x i ) to be non-critical, and used κ = (10 mm) 2 for all our head-and-neck patients.
Step 5: The vast majority of the weights {w i n } n∈Tu will in practice attain very small values and therefore not contribute to the model in a meaningful way. In order to speed up computations, we therefore used a fast patch matching algorithm (Ta et al 2014) to identify 200 regression points for each voxel i with particularly small patch differences m n − m i and therefore large weights w i n , effectively clamping the weights of all other regression points to zero. We used 5 × 5 × 5 patches, i.e. 6 × 6 × 10 mm.
Step 6: For sinogram inpainting, we used 3D spiral forward projection to detect the metal projections and calculate the prior projections. For the interpolation of the metal projections, we used 2D barycentric linear interpolation over the cylindrical detector array on a triangular grid. For image reconstruction, since the MBIR implementation in particularrequired numerous slow but highly parallellizable forward and back projection operations, speed was a priority both for potential clinical implementation and to facilitate the experiments. We therefore used the GPU-accelerated primitives in the ASTRA (van Aarle et al 2015, 2016) package, as well as the bundled FBP implementation (with a Shepp-Logan filter). We performed the reconstructions in 2D, after rebinning and interpolating the 3D spiral sinograms from the scanner to sets of 2D sinograms with a linear detector geometry.
We initialized the iterative reconstruction process in MBIR with uniform images with an attenuation coefficient of 10 −6 mm −1 . We stopped iterating once the voxel-averaged change between iterations fell below 10 −6 mm −1 .
Our entire computational framework used Python. The MAR algorithms, parameter estimation algorithm and sinogram rebinning methods in particular used NumPy and SciPy, while the reconstruction algorithms used ASTRA in its Python-wrapped form.

Quantitative evaluation
For each MAR algorithm, we evaluated the quantitative accuracy of the artifact-reduced CT images as follows: first, we acquired manual delineations of the oral cavity and the teeth for each patient, drawn in the FBP CT images guided by the MR scans. We split these delineations into a corrupted and uncorrupted part using our definitions of T c and T u , and calculated reference mean CT values for each structure and each patient using the CT values in the uncorrupted parts. Around these mean values, we calculated the CT value standard deviations (STDs) over the corrupted parts; this provided an image quality metric for each structure, with each MAR, and for each patient. We performed the calculations in Hounsfield Units (HU) (Buzug 2008). We finally contrasted the STD observations for the different MAR algorithms using a repeat measurements Student's t-test, and report the patient-averaged STD results for all MAR algorithms and structures, as well as the p-values of the tests. The use of the STD metric as an image quality metric relies on the assumption that different partitions of each ROI do not display large differences in mean HU values in the absence of artifacts as a result of large anatomical variations; such differences would lead to a systematic error in the reference mean, directly affecting the STD results. To estimate the magnitude of this error, we used the uncorrupted parts of our nine-patient cohort to emulate the artifact-free tissue distributions in the oral cavity and teeth. Using that the corrupted and uncorrupted sets in our experiments were, on average, not far from equal-sized, we then created 10 6 random bipartions of the extracted set of CT values, for each calculating the difference in mean between the partitions. These differences between means turned out to be near-Gaussian distributed; for the oral cavity the average was ∼0 HU and the standard deviation 1.1 HU, while for the teeth the average was ∼0 HU and standard deviation 6.3 HU. Even after multiplying the standard deviations by a factor 10 for a more conservative estimate, the error on the reference means in our experiments should thus be in the tens of HU, which is an order of magnitude smaller than the variations between MAR algorithms that we present in the following section. Consequently, the STD metric appears to be a valid indicator of artifact suppression.

Results
We now report our results in sequence for the proposed image inpainting, sinogram inpainting and MBIR algorithms. Figure 4 shows visual results of the MR-based image inpainting algorithm kerMAR for representative head-andneck patients, at a narrow, soft tissue enhanced window level (a) and a wider level (b). The T1w MR images are shown in (c). The green arrows in (a) indicate regions where kerMAR provided notable artifact reduction by suppressing both high and low intensity streaks, while at the same time preserving complex structures where the CT values are difficult to predict from the MR scan, such as the teeth.

Image inpainting
The region where kerMAR was least successful was the windpipe, indicated by the rings. Here, the sometimes severe misalignments of the CT and MR images, due to inter-scan patient motion, led to a highly inaccurate MRbased prior and thus anatomical errors. However, in similar but less severe cases, such as with smaller misalignments, mechanisms in the kerMAR algorithm allowed it to avoid such anatomical errors. In particular, as we saw in section 2.2.1, the kerMAR image is an intermediate between the extreme special cases of pCT (σ * t 2 → ∞) and FBP (σ * t 2 → 0), and thus corresponds to a non-trivial blending of the FBP and MR-based prediction. Figure 5 shows the kerMAR image along with these two special cases for a patient where the MR and CT scans were misaligned in the teeth. In this less severe case of misalignment, while we see poor pCT performance and thus a poor prior-based prediction, the blending with the FBP led to a much improved kerMAR. Similar results occurred in the spinal cord (red square), where similar inaccuracies led to an apparent introduction of MR features in the pCT, which were successfully suppressed in the kerMAR. Figure 5. Axial slices of kerMAR and its special cases, i.e. the MR-based pCT (kerMAR with infinite artifact noise variance) and the FBP (kerMAR with 0 noise variance). Results are shown at a wide (top) and narrow, tissue enhancing (bottom) window level. The kerMAR simultaneously displays artifact reduction, improved anatomical fidelity in difficult bone and air regions (rings) and handling of co-registration errors due to inter-scan motion (arrows). The squares focus on a slice of the spinal cord, where the kerMAR noticeably improves upon the pCT by referencing the FBP.
To summarize the results of the visual inspection, we saw (1) clear visual artifact reduction with both ker-MAR and pCT and (2) a clear performance difference in bone and air regions between the two MAR algorithms. For the quantitative analysis of the oral cavity and teeth, these observations are reflected in the STDs shown in figure 6, which measure the CT value standard deviation in the corrupted parts of the structures around the mean intensity in the uncorrupted parts. In the oral cavity, kerMAR and pCT were equally effective, with significant STD reductions relative to the FBP of ∼150 HU. In the teeth, however, the pCT did not improve on the FBP, consistent with our observations of poor pCT performance in such thick, bony regions. Here, only kerMAR showed a benefit, providing a significant improvement over the pCT of ∼100 HU.

Influence of automatic parameter tuning
We finally considered the influence of the automatic patient-specific parameter tuning (algorithm 1) by intentionally swapping the estimated parameters between patients. Figure 7 shows the results for two representative patient-pairs, where we swapped the parameters for patient 1 and 3 with those of patient 2 and 4, respectively. Patients 1 and 3 displayed relatively smaller settings of σ 2 y and σ * t 2 , and swapping the parameters with those of patient 2 and 4 therefore led to increased σ 2 y and σ * t 2 . This substantially altered the predictive model; as illustrated in figure 3, a σ y 2 increase widens the prior peaks while a σ * t 2 increase widens the likelihood, leading to a prediction that was both less precise and less accurate in areas where the MR-based prediction was compromised. These changes in the predictive model caused errors especially in the teeth (rings and arrow), as well as blur for patient 1, as compared to the automatically tuned parameters.
Patients 2 and 4 correspondingly displayed larger settings of σ 2 y and σ * t 2 . Upon swapping, the ensuing decrease in σ 2 y and σ * t 2 , and thus narrowing of the posterior, led to image noise and less effective artifact reduction; this is clearly visible for patient 2. Along with the larger σ * t 2 and σ y 2 , patients 2 and 4 further displayed smaller values of  . kerMAR calculated for four head-and-neck patients using tuned parameters σ (top) and with these parameters intentionally swapped between patients with different tunings; in particular, 1 with 2 and 3 with 4 (bottom). The rings and circles highlight the most substantial differences owed to the parameter swaps.
σ 2 m . The swap therefore also led to an increase in σ 2 m and thus a prior model with a less sensitive MR patch-based distinction between its regression points. Compared to using the automatically tuned parameters, this led to errors for patient 2 in the thick molars indicated by the circles, where the relevant MR patch differences are subtle. Figure 8 shows visual results of the proposed MR-based sinogram inpainting method nMAR-k, alongside those produced with a standard nMAR implementation based on a conventional, FBP-based template image. As seen at the narrow, soft tissue enhancing window level in figure 8(a), nMAR and nMAR-k performed similarly in terms of artifact reduction; neither algorithm reconstructed an artifact-free oral cavity, while both reduced artifacts elsewhere to similar degrees (see regions indicated by arrows). At the wider window level in figure 8(b), we see a clearer difference between nMAR and nMAR-k: The sometimes lower quality of the FBP-based template in standard nMAR led to the visible artifacts (arrows), which are absent using the proposed MR-based method. A clear benefit of using the higher quality MR-based template thus appears to be to avoid certain artifacts introduced by the CT-based one. The influence of such improvements on the quantitative STD metric was comparatively minor, as figure 9 shows significant improvement of both methods over the FBP, but little quantitative difference between them.

Model-based iterative reconstruction
In order to evaluate the benefit of including the predictive distribution p({y i }|{t i , m i }, σ) as an image reconstruction prior in the MLTR algorithm, we ran the algorithm both with (the proposed MLTR-k algorithm) and without this prior (referred to in the remainder as simply 'MLTR'). We ran each algorithm until the voxelaveraged difference accross two subsequent iterations of the reconstructed image decreased below a predefined threshold (10 −6 ). Figure 10 displays the obtained reconstructions at the specified convergence threshold for both methods in five representative patients. Results are shown at a soft tissue enhancing window level (a) and at a wider level (b). At the narrow window level, the main detectable benefit of MLTR-k compared to MLTR was less blurry reconstructions, which indicate that the algorithm was closer to full convergence; in terms of artifact reduction, MLTR-k more successfully eliminated low intensity streaks but did not wholly eliminate high intensity ones. At the wider window level, MLTR-k more clearly improved the MAR over MLTR (arrows in (b)), although these benefits came, in a few cases, at the expense of some newly introduced artifacts in the teeth (see black rings), that however were not a general trend. Figure 11(a) further shows the logarithm of the voxel-averaged difference used as convergence criterion for both algorithms and our nine patients across iterations. As we can see, the rate of convergence is universally larger for MLTR-k with the MR-based prior, as compared to the purely likelihood-based MLTR. In particular, the convergence speed was essentially doubled with MLTR-k, decreasing the number of iterations from ∼400-600 to ∼200-300.
Visually, MLTR-k with its MR-based prior thus provided superior MAR to the prior-free MLTR, as well as an image closer to full convergence at the specified convergence threshold. These results are supported by the quantitative results in figure 11(b) that show significant standard deviation improvements compared to MLTR, in both the oral cavity and teeth, of respectively ∼200 HU and ∼150 HU.

Comparison to gold standard
The nMAR sinogram inpainting algorithm (with the CT-based template) may be used as a gold standard for benchmarking and comparing the three proposed MR-based algorithms between themselves. To more easily perform this comparison, we summarize the results of our quantitative evaluations in figure 12 along with p-values of the comparison to nMAR; the results for the MR-based results are highlighted. We see no case of significantly worse performance compared to the gold standard of the three MR-based algorithms, and further see a quantitative improvement in the oral cavity using the image space approach (kerMAR). Additionally, the quantitative results should be taken together with the visual comparison, and when we compared the sinogram inpainting method (nMAR-k) to nMAR in section 4.2, we found the negligible quantitative results to be accompanied by visual improvements. The comparison is more difficult with the MBIR approach (MLTR-k), where we again see insignificant STD results but where, as we saw in section 4.3, the visual quality may have been compromised by additional artifacts. In summary, the image and sinogram inpainting algorithms at least show benefits over the gold standard nMAR algorithm, while the MBIR results are more inconclusive.

Conclusion and discussion
We have presented a novel Bayesian approach to MR-based MAR, and in particular derived a predictive distribution of an ideal, uncorrupted CT from a corrupted FBP CT and a co-registered, conventional-sequence MR scan. We used the obtained predictive distribution to define three automatic MAR approaches: 1. The image inpainting algorithm kerMAR seamlessly blends MR-derived predictions with information from the corrupted FBP CT, the former leading to substantial artifact reduction and the latter helping especially with bone/air disambiguation and correcting co-registration errors. 2. The sinogram inpainting algorithm nMAR-k uses the kerMAR result as the template in the well-known nMAR algorithm. In our experiments, this led to improvements over using a conventional, FBP-based template by introducing fewer artifacts and improving the artifact reduction. 3. MLTR-k uses the proposed predictive distribution as an image reconstruction prior in the MBIR algorithm MLTR. This led to improvements in terms of both speed (of ∼50%) and the quality of the reconstructed image, especially in terms of artifact reduction.
We conclude that the proposed approach provides a versatile way to use the anatomical information in the MR scan to boost the performance of MAR. An important aspect of the proposed approach is that its parameters are automatically tuned in a patient-specific manner, which makes it well-suited for potential clinical implementation. The automatic tuning accounts for inter-patient variations, in particular the extent of the artifact corruption, which is an important robustness benefit that allows for application to diverse cases and especially reduces potential over-correction in less Figure 11. (a) Convergence plot of MLTR and MLTR-k for the nine patients. Solid and dashed curves show the log of the absolute, voxel averaged change between iterations k − 1 and k for MLTR and MLTR-k respectively. (b) Quantitative analysis in the oral cavity and teeth for the FBP; conventional, prior-free MLTR; and MLTR-k, which uses the proposed predictive distribution as a reconstruction prior. Figure 12. Summary of the quantative evaluation. Shown side-by-side are the results for image inpainting, sinogram inpainting and MBIR in respectively the oral cavity (left) and teeth (right). The results for our MR-based algorithms are shown in yellow with green text indicating the type of algorithm. The p-values are from a repeat measurements t-test comparing the MR-based results to the nMAR gold standard. corrupted cases. The tuning also accounts for variations in the imaging settings; we saw this in the consistent performance of our algorithms despite variations in the MR sequence parameters.

Study limitations
While our MR-based method is applicable whenever an MR scan is co-acquired with a CT, it should be noted that such cases may be of limited frequency outside of head-and-neck radiotherapy, as MR at the time of this study is only used infrequently in the case of metal artifacts at other sites, such as hip implants. Due to the continually increasing importance of MR in radiotherapy, this may however change in the future (Lagendijk et al 2014).

Future work
The inclusion of the FBP for the CT value prediction in our image inpainting algorithm kerMAR helped to mitigate co-registration and alignment issues between the MR and CT images, but for clinical adoption the associated errors in especially the windpipe should probably further reduced (see figures 4(a) and (b)). A straightforward way to achieve this is to improve the co-registration, e.g. by using a deformable registration methods, which are routinely used in modern radiotherapy clinics. Further accuracy gains may be achieved by improving the noise model, which in its current form simply assumes that the noise introduced by the artifacts decreases sigmoidally with the distance to the metal implants. This is a rough assumption as especially streak artifacts may persist throughout the entire CT image (see, for instance, figure 4(a)). Our current model may also not translate well to other types of implants, such as metallic dual hip prostheses, where the severity of the artifacts also depends heavily on whether or not they are located between the prostheses. Future experiments may therefore investigate different formulations of the function f (x i ) that better capture the spatial dependency of the artifact noise.
In order to successfully translate the proposed techniques into clinical routine, the required computation time will need to be further reduced. The most time-consuming part of our model is the calculation of patch distances over regression points, which is currently implemented on a subset of 200 points ∀i ∈ T , found using a patch matching algorithm (Ta et al 2014). In our current Python implementation on a single CPU core (Intel Core i7-4712HQ @ 2.30 GHz), this process takes between 10-30 min. The algorithm is however parallelizable, and on a similar-sized dataset, Ta et al report results on the order of ∼1 min on a multi-CPU cluster (Ta et al 2014). In the future, we therefore intend to speed up our algorithm in a similar manner.
Our MBIR experiments used a relatively simple Poisson likelihood. While this model helps address the artifacts stemming from photon starvation of the metal projections (Nuyts et al 1998, Buzug 2008, it does not account for e.g. the important effect of beam hardening. This may have been the source of the artifacts that were sometimes introduced with MLTR-k (see e.g. the black rings in figure 10) (Nuyts et al 1998, De Man et al 2000, Buzug 2008, Van Slambrouck and Nuyts 2012. Future work may therefore consider using a more accurate likelihood model for the MBIR technique.