Joint reconstruction of PET-MRI by exploiting structural similarity

Recent advances in technology have enabled the combination of positron emission tomography (PET) with magnetic resonance imaging (MRI). These PET-MRI scanners simultaneously acquire functional PET and anatomical or functional MRI data. As function and anatomy are not independent of one another the images to be reconstructed are likely to have shared structures. We aim to exploit this inherent structural similarity by reconstructing from both modalities in a joint reconstruction framework. The structural similarity between two modalities can be modelled in two different ways: edges are more likely to be at similar positions and/or to have similar orientations. We analyse the diffusion process generated by minimizing priors that encapsulate these different models. It turns out that the class of parallel level set priors always corresponds to anisotropic diffusion which is sometimes forward and sometimes backward diffusion. We perform numerical experiments where we jointly reconstruct from blurred Radon data with Poisson noise (PET) and under-sampled Fourier data with Gaussian noise (MRI). Our results show that both modalities benefit from each other in areas of shared edge information. The joint reconstructions have less artefacts and sharper edges compared to separate reconstructions and the ℓ2-error can be reduced in all of the considered cases of under-sampling.


Introduction
In recent years the landscape of medical imaging has changed. For many decades single modality scanners to image either anatomy or function have been available and widely used in clinical practice. Anatomy can be imaged for instance by magnetic resonance imaging (MRI) and computer assisted tomography (CT). Often functionality is measured by radioactive labelled markers using positron emission tomography (PET) or single photon emission computed tomography. Based on these single modality scanners, multi-modality scanners were developed to combine the strength of both regimes and image both structure and function. While PET-CT scanners acquire the data for the two modalities sequentially some recently developed PET-MRI scanners are able to perform both scans simultaneously [1][2][3].
This development not only opens up new opportunities for clinicians but also poses a new kind of inverse problem. As 'function follows structure for the most part' [4] the images to be reconstructed are likely to share many edges. Some example images are shown in figure 1 where the reconstructions from PET, MRI and CT of a single patient are shown (using standard settings with the scanner software). The images were affinely registered to a common frame as they were acquired on a sequential PET-CT and on a separate MRI scanner. Despite the fact that all three images show completely different biological information, all three images show similar structures due to the same underlying anatomy. Therefore, we propose to jointly reconstruct both modalities with the a priori knowledge that the images are more likely to show similar structures. Some authors refer to this task as joint inversion [5,6].
We will focus on joint reconstruction of simultaneously acquired PET-MRI data. All methods discussed here are, however, not limited to this application. They take similar structures into account and can therefore be applied to any other multi-modality scanner or even another field of application where similarity in structure is to be expected.

Contributions
This work is, to the best of our knowledge, the first contribution to joint reconstruction in multi-modality medical imaging. We justify the joint objective function used in [5][6][7][8][9][10] via probabilistic modelling, which also allows arbitrary noise models to be used in a joint reconstruction framework. Moreover, we analyse a general class of joint priors [6][7][8][9][10][11][12][13] and show how they couple the reconstructions in a gradient based optimization scheme. Furthermore, our numerical results show that joint PET-MRI reconstruction can be beneficial to both of the modalities. Most of the results obtained via joint reconstruction are superior, both in visual quality as well as in ℓ 2 -error, compared to some widely accepted methods for separate reconstructions.

Related Work
Joint reconstruction was, as far as we know, first proposed by Haber and Oldenbourg in 1997 [5] where the application of combined PET and MRI reconstruction was already mentioned but the focus was on geophysical applications. Subsequently, different priors coupling two modalities have been proposed [6][7][8][9]14]. Some of these are related to the methods we use in this paper. We will discuss these relations in section 3.
There is a close connection between joint reconstruction and inverse problems in colour image processing as the colour channels can be seen as different modalities which share information about edges [10,12,13,[15][16][17][18][19][20]. All methods used in this work have been proposed first for colour image processing such as denoising and demosaicking [10,12,17].
Also closely related is the work where two different MRI data sets of the same subject are enhanced [21] and reconstructed [22].
Similarly, the reconstruction with a priori information from another modality has also been studied. This is sometimes referred to as model fusion [6]. In the example of PET and MRI one might want to reconstruct PET using the additional structural information of MRI. On the one hand, one may include this information via a structural prior which gives information about edges [11,[23][24][25][26][27][28]. On the other hand, theoretical information priors such as joint entropy and mutual information have been used to inform about likely tissue distributions [29][30][31][32]. None of these priors have been studied for joint reconstruction.

Problem setting and notation
This section is dedicated to introduce the problem of joint reconstruction of PET-MRI more formally and, in addition, it serves to state our notations.

Positron emission tomography
In PET a radioactive labelled and positron emitting tracer is injected into the patient. As the positrons annihilate with electrons of the tissue two photons in (almost) opposite directions are sent out which are recorded by detectors around the object. One seeks to reconstruct the tracer distribution from recorded events from pairs of detectors [33,34]. In a general continuous framework this corresponds to the the x-ray transform which coincides with the Radon transform in 2d [35,36]. This is of course a simplified model as effects such as scatter are ignored.
Let us denote the activity of the tracer by We want to reconstruct u on a discretized domain which we denote for the sake of simplicity again by  ∈ u M . The discrete PET forward operator   → A: M K describes how likely it is that a pair of photons is detected by one of the detector pairs, i.e., A k m , is the probability for an event at voxel m to be detected by the k-th detector pair. More accurate modelling would include the estimation of scatter, randoms and attenuation [34]. The acquired data in PET are commonly assumed to be an instance of a Poisson distributed vector with expectation Au [33,34,37].

Magnetic resonance imaging
MRI exploits the spin of hydrogen nuclei in the human body (or any other material). A magnetic field makes these randomly orientated spins align so that the position of the spin can be encoded by tiny variations in their resonant frequency. The collected signal corresponds to the modulated spin density integrated over the whole domain. The inverse problem in MRI is to reconstruct an image from these Fourier measurements [38,39].
MRI is a very versatile imaging modality. Depending on the imaging sequence used for acquisition, the MRI image can reflect the proton density of the tissue or some other tissue related parameter [38,39]. Independent of the imaging sequence we will denote the MRI image by  Ω → v: . We denote the discrete MRI forward operator by which consists of the Fourier transform followed by a measurement or sampling operator being usually a projection on the measured frequencies.
The noise in MRI is commonly modelled as additive Gaussian with standard deviation depending on, among others, the temperature, the receiver bandwidth and the field strength so that we acquire data  σ σ ∼ > g B v I ( , ), 0 [40][41][42]. It became popular to speed up the MRI data acquisition by omitting 'unnecessary' measurements and 'replacing' them using a priori knowledge. This concept known as compressed sensing [43][44][45][46] has been already applied to MRI with great success [22,47,48]. Successfully exploited a priori knowledge in MRI includes, for instance, sparsity in the gradient or wavelet domain [47], sparsity in a self-learned dictionary [48] or similarity from different MRI contrasts [22]. In our contribution, the additional a priori knowledge is the similarity of MRI to simultaneously acquired PET.
For simplicity we assume the phase in MRI to be zero and therefore restrict v to be realvalued.

Probabilistic modelling for joint reconstruction
As we have now introduced both modalities PET and MRI we can discuss how they can be linked. We formulate our beliefs using probabilistic modelling and graphical models [49,50]. The PET and MRI images u v , are random variables. Our intuitive belief about these unknown images u v , are described in the graphical model shown in figure 2. On the one hand, both u and v depend on a common object and are therefore not independent of each other. The data are modelled as random variables with expectation linearly depending on these images. Hence, also the data of PET f and MRI g are not independent of another. On the other hand, if we have knowledge about u, v does not provide extra information about f and vice versa. This means, that f and g are conditionally independent given u v , . Formally, this leads to the separation of the multi-modality likelihood The posterior probability of the PET-MRI image u v ( , ) for observed PET-MRI data f g ( , ) can then be simplified as using Bayes' rule. When we further assume white, complex Gaussian noise in MRI, Poisson noise in PET and a prior of the form denotes the ℓ 2 -norm.
Remark 2.1. The joint objective functional (3) but with Gaussian noise in both modalities coincides with the functional used in [5][6][7][8][9][10] up to a scaling of the data fidelity terms.

Methods for joint reconstruction
This section describes the models for our prior  which couples the reconstruction of PET and MRI.

Parallel level sets
3.1.1. Measuring parallelism. A general concept of coupling structure in a multi-channel image was proposed [10] with special cases including [6,7,12]. Structure in an image can be identified with its level sets In differentiable images the gradient is perpendicular to the level sets, so similar structures can be measured by similarity of the gradients' orientation. Therefore, parallelism of the gradients at each spatial location is a measure of structural similarity of two images.

Measuring parallelism of vectors. It is well known that for any vectors
with θ denoting the angle between them in the plane they span and = ∑ x y x y , : i i i the Euclidean scalar product. Consequently, one can determine how parallel x and y are by the 'distance measure' It is obvious that d is symmetric and ⩾ d x y ( , ) 0 with equality if and only if x is parallel to y. These properties guarantee that d is suitable to measure parallelism. Moreover, it can be generalized to , with arbitrary strictly increasing functions φ ψ ∞ → ∞ , : [0, ) [0, ). Obviously, φ ψ d , is still symmetric, non-negative and only parallel vectors have minimal 'distance'.
Hence, we end up with a global measure of how 'similar' two images u and v are , In order to make the notation more easily accessible we will suppress the explicit spatial dependence from now on. In homogeneous regions of v the gradient v vanishes and this functional does not provide any information for u. Therefore, one instead can take the modified version , , 2 proposed in [10] with the 'smoothed norm' for some β > 0 and the global measure reads , , Remark 3.1. The choice how to smooth the norm is rather arbitrary. Other common choices are the Huber or the log-cosh approximation.
Remark 3.2. In general, one might want to use different smoothing parameters β and γ for β x and γ y but to keep the notation simple we used the same for both modalities. In [10] we applied (11) with φ ψ = id , to denoising and demosaicking of colour images. We will refer to this prior as linear parallel level sets. Another choice which was used in [10] Due to the quadratic nature of ψ we call it quadratic parallel level sets. This functional was first introduced in [12,13] where it was motivated by high energy physics. It was observed in [10] that linear parallel level sets leads to sharper images compared to the quadratic version. We will give a theoretical justification for this observation in section 4.
Example 3.4. Another choice used previously in geophysics is (9) with φ = id and ψ = s s ( ) 2 [6]. This functional coincides with the squared magnitude of the cross product of the gradients proposed by [7][8][9]. Similar functions have been applied to registration [51,52] and face recognition [53]. We will only briefly discuss this choice further here because it possesses unfavourable noise suppression properties.
Remark 3.5. The functional proposed in [11] can be rewritten as It is not exactly a special case of the parallel level sets functional but bears lots of similarities. This functional favours alignment of the gradient of u with the normalized and a priori defined vector-field w. It is beyond the scope of this paper to investigate such asymmetric priors for joint reconstruction.

Joint total variation
There are many extensions of total variation regularization to multi-modality images. Of importance here is the coupling of the modalities so that one modality can provide information to the other. In the case of two modalities This functional was first considered as an extension of total variation for colour images [17]. Another generalization of total variation to colour images was proposed in [54]. While the former locally couples the support of the gradients, the latter only provides a global effect. Joint total variation is closely related to block sparsity in the magnitude of the gradient [55][56][57][58]. It is known that usual total variation favours sparsity in the gradient domain which is enforced by sparsity in the magnitude of the gradient. Joint total variation favours sparsity in the magnitude of the 'joint gradient'   u v ( , ). Therefore, gradients are more likely to occur at the same position. Another explanation why joint total variation can be useful for joint reconstruction is given in the next section.
We will use here a smoothed version of joint total variation so that we can apply tools from smooth optimization, i.e., for some β > 0. In the notation of the 'smoothed norm' introduced above we can write smooth joint total variation as where the vector-valued image  Ω → w: 2 is defined as = w x u x v x ( ) ( ( ), ( )) and the gradient is the Jacobian.

Analysis of the generated diffusion process
We have introduced the concepts of parallel level sets and joint total variation to couple the reconstruction of multi-modality images. Priors based on gradient information often lead to a derivative of diffusion type [10], i.e., if a prior is of the form  The generated diffusion process depends crucially on the diffusivity  K u ( ). Its analysis can give more insight how a prior influences the final reconstruction when an algorithm based on derivative information is used.
We will see in the following that in both cases the Gâteaux derivative leads to a nonlinear, inhomogeneous diffusion, i.e., the diffusivity depends on the image and the location. Moreover, in case of parallel level sets the diffusion will be anisotropic, i.e., it depends on the direction [59, 60].

Parallel level sets
In order to analyse the diffusion generated by parallel level sets we first need to compute the Gâteaux derivative. Most of the derivations have already been shown in [10].
Let us denote the space of continuously differentiable functions with zero Neumann boundary condition by Ω C ( ) * 1 , i.e., the derivative with respect to the normal of the boundary being zero.
with diffusivities x y x y y x x y ( , ) : ( , ) 0, The flow generated by (17) is difficult to analyse because of the cross-diffusivities τ. Therefore, we write an equivalent formulation which does not rely on cross-diffusivities but on matrix-valued diffusivities.
where the diffusivities are given by x y x y x y x y ( , ) : ( , ) , , 0.
Proof. The assertion for the first equation in (21) follows immediately from the fact that Interchanging the roles of u and v completes the proof. □ PLS . Under similar conditions on φ and ψ it also holds for PLS with β | · | being replaced by | · |. Hence, this proposition can be applied to PLS as well.
For the analysis of the diffusivity we need to remind the reader about the orthogonal complement and the span.
T □ To analyse the diffusion we need to introduce local coordinates for the image domain.
Proof. This result follows immediately from proposition 4.3 combined with lemma 4.6. □ Remark 4.9. Theorem 4.8 tells us that the diffusion for the image u generated by the parallel level sets functional has principal directions given by the other image v and vice versa. Moreover, the directions but not the amount of the diffusion are independent of the functions φ and ψ. ( ) the diffusivity in the direction of the image gradient can be rewritten as x y x y x y x y x y , , , , It is of interest to analyse the four cases β β x y x y x y (i) vanishing gradient information, i.e., 0, (ii) vanishing side information, i.e., 0, (iii) large gradients, i.e., 0, and (iv) large and aligned gradients, i.e., 0, , , where we always assume that β > x y x y , , | , |, 0. It is easy to compute that the principal diffusivities are then (iv) 0 .
The flow generated is very simple as there is only diffusion only along the edges. It stops when side information is flat and is therefore not able to remove noise in this case.

Joint total variation
In this section we discuss the diffusion which is generated by joint total variation. It follows immediately from the well known Gâteaux derivative of total variation (see [61]) that the Gâteaux derivative of joint total variation can be identified with where the diffusivity for both modalities is The diffusivity for both modalities are the same. It reduces to zero when either u or v is large, hence information about an edge can be transferred from one modality to the other. The diffusion of joint total variation is isotropic as the diffusivities are scalar-valued. Similarly to usual total variation [62] one can consider joint total variation with anisotropic diffusion but this is out of the scope of this paper.

Numerical experiments
5.1. Technical details 5.1.1. Implementation. In our experiments we aim to compute minimizers of (3) with and without an explicit prior. If no explicit prior is used we terminate the iterations before convergence to regularize the reconstruction. The priors we used include total variation, joint total variation (see section 3.2) and linear and quadratic parallel level sets (see section 3.1). As we want to compare the different priors for joint reconstruction we chose to minimize the functional with the same quasi-Newton method (L-BFGS-B [63], software available at [64] with mex wrapper for use in MATLAB [65]) in every case. The MATLAB implementation is available as supplementary data from stacks.iop.org/ip/31/015001/mmedia can be obtained from the authors' homepage [66].
Quasi-Newton methods rely on gradient information only and approximate the Hessian. The gradient of total variation, joint total variation (41), quadratic and linear parallel level (21) sets all are of diffusion type. Hence, we need to discretize gradients and divergences. Gradients are discretized with forward differences and the divergence with backward differences so that the discretized gradient corresponds to the gradient of the discretized functional. We consider the setting of joint PET and MRI reconstruction. The PET operator is modelled to be the discrete x-ray transform taken from [67]. To model the lack of resolution the operator consists also of a convolution in the sinogram domain in the radial direction with a Gaussian of full width half maximum of five pixels (for sinograms of size 128 × 300). The ground truth models for both modalities together with the blurry sinogram for PET are shown in figure 3. Six different cases of MRI undersampling were tested: the first case of sampling is a fully sampled MRI (full). For the second and third sampling the data were acquired along 20 and 15 radial lines in k-space (radial20, radial15). The fourth and fifth case are spirals with either uniform density (spiralUni) or a higher density in the high frequencies (spiralHigh). In the last test case we performed Cartesian sampling but only acquired every second line (lines2). The MRI data are presented next to the results. Gaussian noise with expected energy of 4% is added to the MRI. The PET data is an instance of a multi-variate Poisson distribution with an expected number of 1e+6 counts in total.
where the norms are only taken over the region of interest, see figure 3. This region of interest was chosen to give no weight to the background and less to large uniform regions as well as the outer boundary.

Results
The joint reconstruction approaches described in section 3 are compared against separated reconstruction which is performed either by total variation or having no explicit prior. The latter means early stopping for PET or minimal energy (zero filled) reconstruction for MRI. Figure 4 shows the results for the case full with the reconstructed images shown on the left and the error images (compared to the ground truth) on the right where blue indicates over and red underestimation. The MRI reconstructions are visually perfect for all priors as it can be seen from both the images themself and the error images. The results for PET differ a lot between the different priors. When the iterations have been terminated the reconstruction looks the worst. Most of the introduced artefacts are removed when using total variation but the resolution in PET remains more or less the same. Coupling PET and MRI with joint total variation results in a superior PET image compared to total variation. Shared edges are better reconstructed using quadratic parallel level sets. In the case of linear parallel level sets almost no errors at shared edges can be observed. In areas of shared information the images look almost indistinguishable from the ground truth. The edges which are not shared between the two modalities introduce a fake edge artefact. From the error images in PET we see that the errors are decreasing from top to bottom. The errors for linear parallel level sets are almost only in regions of non-shared edge information; both coincide with the visual impression on the left. The results for radial20 are shown in figure 5. With only 20 radial spokes a zero-filled reconstruction does not give a reasonable reconstruction for MRI any more but including total variation into the reconstruction resolves most of it. While joint total variation and quadratic parallel level sets are not able to improve on this linear parallel level sets is able to reconstruct with a lot less errors and only the finest lines can not be resolved. The PET reconstructions are similar to the fully sampled case but linear parallel level sets does show a few errors in regions of shared edge information. It can also be seen that the two blobs in PET not being present in MRI are blurrier than total variation and similar to early stopping. This is probably due to the parameter β being constant over and optimized for the whole phantom.

Visual assessment.
Reducing the number of spokes to 15 (see figure 6) the difference between separate and joint reconstruction becomes even larger with linear parallel level sets still being able to reconstruct reasonable images with only a few errors.
More results are shown in figures 7 and 8 where the MRI data are sampled along spiral trajectories.
The reconstructions from Cartesian undersampling lines2 are shown in figure 9. While the separated MRI reconstruction shows a two-fold ghosting of the object, joint reconstruction, especially linear parallel level sets, can help to complete the missing

Quantitative assessment.
Quantitative results using the relative ℓ 2 -error over the whole image can be found in figure 10. The errors for for PET are shown on the left and for MRI on the right. For PET it is easy to see that for all cases of sampling separate reconstruction is always worse than joint reconstruction with joint total variation being weaker than parallel level sets. Among the two parallel level sets priors the linear one performs much better. Furthermore, it can be seen that the results are quite robust on the kind and amount of under-sampling. For full sampling in MRI all methods give very good results. When MRI is reconstructed from under-sampled data using the zero-filled Fourier transform the results are always the worst. Total variation performs as well as joint total variation and quadratic parallel level sets

Conclusion
This work is the first contribution to joint reconstruction in multi-modality medical imaging. We have shown that the previously used framework for joint reconstruction coincides with our belief about multi-modality imaging systems like PET-MRI. Three different priors have been proposed for joint reconstruction of PET-MRI and the corresponding diffusion has been analyzed. This analysis gives more insight on what images are favoured by these priors. We presented results where blurry Radon data with Poisson noise (PET) and underdetermined Fourier data with Gaussian noise (MRI) have been combined in a joint reconstruction process. The numerical results show that our joint reconstruction framework can successfully couple information from these two imaging systems so that both modalities benefit. Similar to multiple priors for one modality [68][69][70] one might be able to include better the given a priori information by using multiple priors. Moreover, the optimization methods have to be adapted to the non-convexity of the parallel level set prior. The effect of noise and non-local parameter tuning for joint reconstruction will also be subject of further research.  which is again violated for β > s 8 . □