PET image reconstruction using multi-parametric anato-functional priors

In this study, we investigate the application of multi-parametric anato-functional (MR-PET) priors for the maximum a posteriori (MAP) reconstruction of brain PET data in order to address the limitations of the conventional anatomical priors in the presence of PET-MR mismatches. In addition to partial volume correction benefits, the suitability of these priors for reconstruction of low-count PET data is also introduced and demonstrated, comparing to standard maximum-likelihood (ML) reconstruction of high-count data. The conventional local Tikhonov and total variation (TV) priors and current state-of-the-art anatomical priors including the Kaipio, non-local Tikhonov prior with Bowsher and Gaussian similarity kernels are investigated and presented in a unified framework. The Gaussian kernels are calculated using both voxel- and patch-based feature vectors. To cope with PET and MR mismatches, the Bowsher and Gaussian priors are extended to multi-parametric priors. In addition, we propose a modified joint Burg entropy prior that by definition exploits all parametric information in the MAP reconstruction of PET data. The performance of the priors was extensively evaluated using 3D simulations and two clinical brain datasets of [18F]florbetaben and [18F]FDG radiotracers. For simulations, several anato-functional mismatches were intentionally introduced between the PET and MR images, and furthermore, for the FDG clinical dataset, two PET-unique active tumours were embedded in the PET data. Our simulation results showed that the joint Burg entropy prior far outperformed the conventional anatomical priors in terms of preserving PET unique lesions, while still reconstructing functional boundaries with corresponding MR boundaries. In addition, the multi-parametric extension of the Gaussian and Bowsher priors led to enhanced preservation of edge and PET unique features and also an improved bias-variance performance. In agreement with the simulation results, the clinical results also showed that the Gaussian prior with voxel-based feature vectors, the Bowsher and the joint Burg entropy priors were the best performing priors. However, for the FDG dataset with simulated tumours, the TV and proposed priors were capable of preserving the PET-unique tumours. Finally, an important outcome was the demonstration that the MAP reconstruction of a low-count FDG PET dataset using the proposed joint entropy prior can lead to comparable image quality to a conventional ML reconstruction with up to 5 times more counts. In conclusion, multi-parametric anato-functional priors provide a solution to address the pitfalls of the conventional priors and are therefore likely to increase the diagnostic confidence in MR-guided PET image reconstructions.


Introduction
The recently introduced simultaneous clinical PET-MR (positron emission tomographymagnetic resonance) imaging systems are able to provide molecular imaging data and complementary multi-parametric (MP) MRI information. The MRI data can be exploited to guide PET image reconstruction and therefore help reduce the noise and resolution blurring that usually degrade the quality of PET images. In current clinical practice, the PET data are mainly reconstructed using the maximum likelihood expectation maximization (MLEM) algorithm and point spread function (PSF) resolution modelling (Reader et al 2002). However, for routine PET scans, the true ML solution is very noisy and inclusion of PSF during reconstruction can lead to Gibbs artefacts near edges. Hence, Bayesian maximum a posteriori (MAP) image reconstruction has been explored to reduce noise and stabilize the ML solution using a priori knowledge of the unknown image, such as how smooth it is or what it should look like. In a smooth image, it is highly probable that neighbouring voxels have similar intensity values compared to distant voxels. A prior that encourages this property, such as a quadratic Markov random field (MRF) prior, attempts to suppress large local differences between voxels on the basis that they are probably due to noise. However, some of the local differences are associated with legitimate image boundaries which should be preserved during image reconstruction.
Over the last two decades, various MRF priors have been designed to improve upon the quadratic prior using non-quadratic edge preserving potential functions that assign a lower penalty to large local differences on the assumption that they are probably associated with valid boundaries. The priors that utilize non-quadratic potential functions, such as total variation (TV) (Huber 1981, Lange 1990, Rudin et al 1992, can preserve edges; however they are usually limited in distinguishing valid edges from noisy fluctuations as they rely mainly on short-range voxel interactions. In addition, due to the low-pass filtering nature of the PET scanners and the resulting partial volume averaging effect (PVE), some boundaries are lost and cannot be restored purely from the PET data. To enhance the performance of such priors, two main approaches have been explored in the literature that extend the local priors to non-local ones for the robust identification of edges using either the long-range interaction of image voxels or their intensity similarities (Yu andFessler 2002, Chen et al 2008). These approaches, which have their origins from bilateral filtering (Tomasi and Manduchi 1998) and non-local means (Buades et al 2005), have been successfully employed in MAP PET image reconstruction (Chen et al 2008, Wang andQi 2012) as well as post-reconstruction filtering (Dutta et al 2013, Chan et al 2014. However, as these methods rely only the PET information, they might not be able to handle PVE and to fully recover the lost boundary information. Another promising approach is to employ anatomical boundaries available in hybrid PET-MR or PET-CT to assign a lower weight on the smoothing of the PET image boundaries (Gindi et al 1993). Bowsher et al (2004) proposed to calculate the interaction (local differences) of each image voxel with its most similar neighbours predefined from an anatomical image. Essentially, the Bowsher method weights the local differences using zero-one weighting factors, thereby disabling the smoothing across boundaries, but encouraging it within anatomical regions. The main advantages of this method are that it does not require the pre-segmentation of anatomical images, and it can be easily incorporated into different MRF priors; however, as it relies on the 'binary' selection of each voxel's neighbours, it is particularly vulnerable to mismatches between the anatomical image and the true activity distribution. To improve the performance of the Bowsher prior in the case of the anato-functional inconsistencies, Kazantsev et al (2011) investigated the combination of the Bowsher's weights and those calculated from the PET estimate. Nguyen and Lee (2013) calculated the non-local weights from both PET and anatomical images to address the mismatches between PET and MR or CT information.
Based on the Lysaker-Osher-Tai (LOT) model (Lysaker et al 2004), another approach is to improve MRF priors using the normalized gradient vector fields (normal vectors) obtained from anatomical side information. Kazantsev et al (2014) included the normal vectors obtained from a complementary CT image into the LOT model to improve the TV prior based on the alignment of the PET and normal vectors. However, this proposed prior depends on the orientations of the anatomical and functional edges (e.g. in the case of hot and cold lesions). More recently, Ehrhardt et al (2016) derived a prior based on the structural similarity of the PET and MR images measured by the alignment of the PET and MR gradients. In fact, this prior generalizes the prior proposed by Kaipio et al (1999) for improving the quadratic prior. As will be discussed in section 2.1.4, the Kaipio prior in practice encourages edge preservation by weighting the magnitude of PET image gradients (i.e. local differences) with the angle between the PET gradients and MR normal vectors. One of the advantages of these methods is that the prior is reduced to a conventional MRF prior, if PET image boundaries do not have corresponding anatomical boundaries in MR images. However, similar to the Bowsher prior, they might induce false boundaries in the reconstructed PET images if MR boundaries do not have corresponding functional boundaries in the true PET image. Moreover, the quality of the MR image is an important factor in deriving accurate normal vectors, as clinical MR images typically suffer from noise and/or partial volume effects, especially when they are downsampled to match the voxel size and field of view of the PET images.
All of the above-mentioned anatomical priors put less weight or penalty on the smoothing of PET boundaries based on boundary side information. A different class of priors instead aims to maximize an information-based similarity metric, such as mutual information or joint entropy (JE), between PET and anatomical images (Nuyts 2007), based on the assumption that they are mutually informative and the images can be inferred from each other. Nuyts found that the JE prior results in lower bias in the reconstructed PET images compared to those reconstructed using mutual information (Nuyts 2007); similar results were also observed by Somayajula et al (2011). The JE prior does not require the pre-segmentation of the anatomical image and can be robust to inconsistencies between PET and anatomical images, as it relies on their joint probability distribution; however, it also ignores the spatial correlation between neighbouring voxels, which is an important feature for PET tracers that are confined in anatomical regions.
The availability of MP images in simultaneous PET-MR systems can provide a unique opportunity for improving PET image quality. These parametric maps are mutually complementary and informative, for instance, an early-stage lesion might not have any morphological manifestations in T1-or T2-weighted MR images but might appear metabolically active in PET images and at the same time show restricted/elevated diffusion patterns in diffusionweighted MR images. In this regard, the aim of this study was fourfold: (i) unification of the state-of-the-art anatomical priors in a common reconstruction framework, which is of importance in unveiling the similarity between different priors and therefore in developing robust and reliable reconstruction setups. The studied priors include the local Tikhonov and TV priors, non-local anatomically-guided Tikhonov prior, the Kaipio and Bowsher priors. In addition, we proposed a modified joint Burg entropy that outperforms the state-of-the-art Bowsher prior in the case of PET-MR mismatches; (ii) extension of the anatomical priors to MP anato-functional ones. Given the simultaneity and complementary nature of PET-MR data, we proposed the novel idea of MP priors to cope with PET-MR mismatches; (iii) evaluation of the MAP reconstructions using the same optimization algorithm and the same simulations and clinical datasets, given that a comparison of these priors is still missing in the literature; and (iv) application of anatomically guided PET reconstruction for not only partial volume correction (PVC) but also for reducing the PET scan time or injected dose. To the best of our knowledge, for the first time, the application of anato-functional priors is demonstrated for reducing scan time or injected dose while achieving the same image quality obtained using conventional reconstruction of high-count data. In this study, we focused on the application and evaluation of different priors for brain studies using simulated brain phantoms and clinical brain datasets using two different radiotracers. However, the developed methodologies should also be applicable to whole body studies.

MAP PET image reconstruction
Let y ∈ Z M + be the set of independently measured emission data collected during the PET scan of an object with the activity distribution represented by u ∈ R N . In Bayesian PET image reconstruction, the MAP reconstruction consists of maximizing the posterior probability of u given y, that is: where the probability distribution p (y i |u) of the ith measured data value given an image u is often best modelled using a Poisson distribution with an expected value of where g ij is the geometric probability detection of annihilation events emitted from the jth voxel in the ith PET detector,n i and a i are detector normalization factors and attenuation factors and E [r i ] is the expected number of scatter and random events in the ith detector. In the MAP framework, the PET image is also treated as a random vector with a probability distribution. As mentioned earlier, in PET imaging a smooth or noise-free image is always sought, in which local differences (interactions) between neighbouring voxels are small. The probability distribution of interacting voxels can be well modelled using a MRF and a Gibbs distribution as P (u) ∝ exp (−βR (u)), where β is a regularization parameter and R (u) is the Gibbs energy function, commonly known as a regularizer or prior. In this study, we considered the following prior, which generalizes most of the included MRF priors: (3) ψ and φ are potential functions, where ψ operates on the intensity differences between the j th voxel and its neighbouring voxels, and φ operates on the neighbourhood ( N j ) sum of those local differences. ξ bj and ω bj are coefficients that weight the intensity differences between voxels j and b based on their proximity (in this study the inverse of their Euclidean distance) and based on their similarity, respectively (see figure 1). j (i) and b (i) are the Cartesian coordinates of the jth and bth voxel. The Tikhonov prior is defined by setting ψ(s) = s 2 and φ(t) = t, while the smoothed isotropic TV prior is defined for ψ(s) = s 2 and φ(t) = √ t + δ 2 , where δ > 0 is a smoothing factor that makes the prior continuously differentiable and reduces the stair-casing and patchy artefacts usually observed in non-smooth TV regularization (the derivative of this smoothed TV prior is given in appendix A).
In the regularization of a 3D image volume, the prior R with a neighbourhood of 6 nearest neighbours (i.e. a first-order neighbourhood), and ω jb = 1, is referred to as a local regularization, while for a higher-order neighbourhood (e.g. of the size 342 for a 7 × 7 × 7 patch) and spatially variant ω jb s the prior is referred to as a non-local regularization method (Wang and Qi 2012). The ω jb coefficients encourage the smoothness along boundaries but discourage it across them. Generally, there are three approaches in the calculation of these coefficients, based on: (i) the PET image reconstructed from a previous iteration (Chen et al 2008), (ii) the MRI or CT side anatomical information as in the Bowsher method, which can be aptly referred to as side-similarity and (iii) both the PET and side anatomical information (Kazantsev et al 2011, Nguyen andLee 2013). In this study, we extend the latter approach to calculate the similarity kernels based on all available MP MRI data as well as the PET data.
Specifically, we evaluate the conventional local quadratic (Tikhonov) and TV priors, the non-local quadratic prior with Bowsher and Gaussian kernels (Chen et al 2008), the Kaipio prior and an modified joint Burg entropy prior. For the optimization in (1), we employed Green's one step late (OSL) MAP-EM algorithm (Green 1990), which extends the MLEM algorithm by adding the first derivative of the prior, evaluated at the previous iteration, to the sensitivity image. In the following we elaborate the above priors, their first-derivative and the extension of the priors amenable to the MP case.
2.1.1. Non-local quadratic prior with MP Gaussian similarity kernels. For this prior, the similarity coefficients between a given voxel of the PET image, u j , and its neighbouring voxels, u b , in the neighbourhood N j are calculated from a reference image x (e.g. a complementary MRI, a prior PET image or the PET image under reconstruction) using the following Gaussian functions: (4) where in ω V jb , the similarities are calculated based on their individual voxel intensity values (namely, Gaussian-V), while in ω P jb , they are calculated based on the intensity values of a patch of voxels, f l (x), centred on voxels j and b (namely, Gaussian-P), see figures 1(B) and (C). The σ and z are the shape (standard deviation) and normalization parameters of the Gaussian functions, respectively. Figures 1(D) and (E) shows the similarity kernels calculated using the Gaussian-V and -P methods from the T1-MR image. As can be seen, the Gaussian-P method is comparably stricter in identifying the similar voxels to the highlighted central voxel, therefore it is more robust to noise, especially when the similarity coefficients are calculated from the PET image itself. Note that the proximity coefficients in (3) can also be calculated using a Gaussian kernel as in bilateral Gaussian filtering (Tomasi and Manduchi 1998); however, this kernel will add an additional σ parameter that requires tuning by the user. In this study, we therefore calculated these weights based on the inverse of the Euclidean distance between voxels.
The Gaussian kernels were extended to MP ones based on the geometric mean of the individual kernels calculated from different parametric images. Let v (i) ∈ R N L i=1 be a set of L MP MR images, the MP similarity kernels were then defined as: where ω P jb (u) is calculated from the PET image using Gaussian-P kernels, which makes the resulting kernels robust to noise in the PET image, ω x jb v (i) is calculated from the ith parametric image using either Gaussian-P or Gaussian-V methods and z is a normalization factor. The derivative of the resulting quadratic prior is given by: 2.1.2. Non-local quadratic prior with MP Bowsher similarity kernels. In the Bowsher method, the most similar neighbouring voxels of a given voxel in the PET image (u j ) in the neighbourhood N j are identified from the same neighbourhood in a complementary MR image, as shown in figure 1(F). In this approach, the top B most similar neighbours are calculated based on their absolute differences in the MR image, i.e. |v b − v j | · · · |v B − v j |, thereby the resulting similarity coefficients will be 0 or 1, where 0 is assigned to the voxels not within the top B most similar values. For MP extension of the Bowsher method, we followed the approach proposed by Kazantsev et al (2011) in which the similarity kernels are obtained by the product of the Bowsher's binary coefficients, calculated from the anatomical image, and the Gaussian-P kernels calculated from the current estimate of the PET image (see figure 1(G)). The derivative of the resulting prior was calculated as in equation (6), which would correspond to an asymmetric Bowsher prior, as proposed in Vunckx and Nuyts (2010).

MP local joint Burg entropy.
In this section, we present a MP JE prior with local spatial interaction of voxels based on the Burg entropy (Byrne 1993). Let the vectors u, v (1) , . . . , v (L) be a realization of a set of random variables U, V (1) , . . . , V (L) . The JE is an informationtheoretic criterion that measures the information gained from this set of variables with the joint probability distribution function (PDF) of p U, V (1) , . . . , V (L) . The joint Burg entropy is defined as the integration of the logarithm of the joint PDF of the variables: where S is the number of bins at which the PDF is integrated with δ bin widths. Note the difference between this prior and the widely used Shannon JE is that the latter weights the logarithm of the joint PDF with the negative of the joint PDF in order to provide a weighted average of information content (Pluim et al 2003). The Burg entropy in fact measures the dispersion of the joint PDF of the variables. If the variables are mutually informative and can be inferred from each other, their joint PDF is more compact and therefore their entropy is low, whereas if they are independent, their joint PDF is broad and thus the entropy is high. Therefore, the aim is to minimize the joint Burg entropy in order to increase the correlation of the variables. The joint PDF can be approximated using the non-parametric Parzen density estimation with a multivariate Gaussian window function with a diagonal covariance matrix Σ = diag σ 2 u , σ 2 v (1) , . . . , σ 2 v (L) , as follows: where N is the number of samples used to calculate the PDF, which similar to Somayajula et al (2011) was set equal to the number of voxels in the images and G is Gaussian function as in (4). Since the logarithm tends to flatten the joint PDF, the integration in (7) can be approximated by the mean of the log-distribution, as follows: As pointed out in Somayajula et al (2011), the Shannon JE can also be approximated in the same way such that it can be defined as the above equation. Note that to be consistent with the notion of maximum entropy and the derivative of the quadratic priors used in this study, similar to Shannon entropy, we introduced a negative sign in (9). Thus, the aim would be to maximize the negative of joint Burg entropy. The OSL MAP-EM reconstruction requires the calculation of the first derivative of the employed prior. The derivative of JE evaluated at j th bin (or voxel) depends on all voxel intensities simultaneously, which leads to discarding regional information. As suggested in Somayajula et al (2011), the derivative can be approximated by evaluating the summation for the voxels that are in the neighbourhood of the jth voxel, that is (see appendix B for more details): where ω jb act as similarity weighting coefficients given by: In figure 1(H), the coefficients calculated from both the PET and MR images have been shown. We replaced ξ u in (10) with the proximity coefficients ξ jb in (3) in order to further improve the performance of this prior in spatial neighbourhoods. As a result of this modification, the range of the regularization parameter, β, for this prior will be approximately the same as for the other Tikhonov-based priors. (2016), the Kaipio prior is a quadratic prior improved by inclusion of the squared inner product of the PET image gradient and MR normal vectors. Let g j ∈ R D and n j ∈ R D be the gradient vector of the PET image u and the normalized gradient vector of an anatomical image v at the jth voxel, respectively, where D is the size of the neighbourhood of the jth voxel. The bth elements of these vectors are given by:

Kaipio prior. Based on the formulations in Ehrhardt et al
The Kaipio prior can be defined as (see appendix C for more details): This prior can also be described based on the alignment of the vectors g j and n j and the angle θ j formed between them, as follows (see appendix B for more details): Basically, a non-zero n j indicates the presence of an edge in the anatomical image, therefore if the vectors g j and n j are aligned, that is, θ j = 0 (parallel) or θ j = 180° (anti-parallel), the regularization of the corresponding PET edge is switched off leading to its preservation. Note in the cases that n j = 0 (e.g. where there is no anatomical edge in the jth voxel of the MR image), the prior is reduced to the quadratic prior. The main advantage of the Kaipio prior over the Bowsher and Gaussians priors is that it is free of any shape parameter (e.g. B and σ), however, it has two disadvantages: firstly, as the prior relies on MR normal vectors, it is very susceptible to noise and PVE artefacts in the MR images (which are usually down-sampled to the size of PET images) due to the normalization in (12), therefore even low-amplitude noise can indicate the presence of an edge, which makes the prior unable to discern valid edges. Therefore, the de-noising and de-blurring of MR images is of high importance for the performance of this type of prior (see figure 9 in the Results section). Secondly, its extension to a MP prior is not straightforward as one needs to calculate the resultant normal vectors from MP MR images and more importantly to estimate the normal vectors from the noisy PET image itself, which requires designing an additional regularized estimation approach (Lysaker et al 2004). For the above reasons, we evaluated this prior for only T1-weighted MR guided PET image reconstruction. The derivative of the Kaipio prior is the same as in (6) with the weights given by (see appendix C for details): In table 1, the derivatives of the aforementioned priors and their similarity kernels are summarized. Note the similarity kernels of anatomical priors are shown for conventional MR-guided priors.

Optimization and parameter selection
The MAP problem in (1) was optimized using the Green's one-step-late (OSL) MAP-EM algorithm, as follows: where r i is expected value of random and scatter coincidences, obtained from a delayed coincidence window (Noll 1999) and model-based scatter simulation (Watson 2000). This iterative algorithm differs from the ordinary Poisson (OP) ML-EM algorithm principally in that the first derivative of the prior R, evaluated at the previous (latest) image update, is added to the sensitivity image (which is obtained from the back-projection of the product of the attenuation and normalization sinograms). However, it is should be noted that the OSL MAP-EM algorithm has been shown to converge to the MAP solution only for certain types of priors, particularly, those for which the potential functions have a bounded derivative (Lange 1990). Alternative approaches to OSL include using De Pierro's convexity lemma (De Pierro 1995) to define a separable surrogate for the prior R, which can guarantee the convergence to the maximizer of the a-posterior density.
The priors included in the present work have a number of user-defined parameters that determine their performance. Table 2 summarizes these key parameters, along with those with pre-defined values commonly used in our simulation and clinical data reconstructions. For the local priors including Tikhonov, TV, a first-order neighborhood consisting of 6 immediate neighbors was chosen, while for the anatomical priors a higher-order neighborhood (a search window of size 7 × 7 × 7 ) consisting of 342 neighbors was chosen. A local window size of 3 × 3 × 3 was chosen for the calculation of Gaussian kernels with patch-based feature vectors.
The ML-EM and MAP-EM image reconstructions were declared generally converged when the relative difference between successive image estimates fell below a tolerance of τ = 1 × 10 −4 , otherwise, if the total number of iterations reached a maximum number of n = 150. The relative difference was defined as the normalized Euclidian distance between the current, u n+1 , and previous, u n , image estimates, as follows:

Prior
Similarity kernels: ω jb Description Local Tikhonov 1 Results in uniform smoothing Local TV Encourages piece-wise smoothing based on the inverse of the magnitude of the local differences Improves upon the Tikhonov prior by weighting the local differences based on their voxel-wise Gaussian similarity in the reference image x Calculates the Gaussian similarity of voxel intensities of reference image x based on their local neighborhood (patch) b∈Nj n jb ξ jb Preserves the edges wherein the vector of local differences and MR normal vectors (n jb ) are (anti-) parallel Selects the B most similar neighbours of a given voxel in the MR image, v, based on their absolute differences Joint Burg entropy Calculates the similarity of voxel intensities as the normalized product of their Gaussian similarities in both the PET, u, and the MR image v a Gaussian similarity kernels with voxel (V) and patch (P) based feature vectors.
u and v represent PET and MR images. σ u and σ v are the standard deviation of the PET and MRI Gaussian kernels. G : the standard Gaussian kernel defined in equation (8).
To objectively evaluate the performance of the studied priors and their MP versions with respect to a ground truth, extensive 3D realistic simulation studies were performed. The BrainWeb phantom (Cocosco et al 1997) was utilized to simulate the bio-distribution of [ 18 F]fluorodeoxyglucose (FDG) radiotracer in the brain together with T1-and T2-weighted MR images and attenuation maps, all with a matrix size of 344 × 344 × 127 and voxel of size 2 × 2 × 2 mm 3 . A representative sagittal slice of this phantom is shown in figure 2. To further simulate the existence of mismatched boundaries or abnormalities between PET and MR images, a few regions of the T1 and T2-MR images were uniquely and commonly removed in such a way that the anatomical inconsistencies were simulated among all three parametric images (see arrows 1 and 3 in figure 2). Moreover, four unique and shared lesions were considered in the PET activity and T2-weighted MR images (each contains a small and a large lesion, see arrows 2 and 4 in figure 2). The only shared lesion is shown in the true activity map in figure 2. For the true activity map, the ratio of mean activity concentration in grey matter (GM) to white matter (WM) and that of lesions to WM were simulated to 3:1 and 7:1, respectively. The signal intensity of the T1-and T2-weighted MR images were simulated according to the BrainWeb's simulated brain database. To simulate PVE and noise, the MR images were smoothed using a Gaussian filter, 3 mm full width at half maximum (FWHM) and Gaussian noise with standard deviation equal to 1% of the largest voxel intensity was added to the images.The simulated attenuation map consists of air, soft tissue and bone tissue classes, with linear attenuation coefficients of 0, 0.098, 0.13 cm −1 , respectively. For this phantom, two sets of reconstructions were performed including the conventional T1-MR guided PET image reconstruction and the MP MR-PET-guided PET image reconstruction. In the latter, we considered the priors that could be extended to include MP data such as Bowsher, Gaussian kernel methods and joint Burg entropy. List of the parameters of the regularization methods included in this study. The pre-defined values of some of the parameters commonly used in this study are also presented.

Priors Parameters
Local Tikhonov a Gaussian similarity kernels with voxel (V) and patch (P) based feature vectors.
|N n |, |F n | : number of included neighbors in a n × n × n neighborhood ( N ) or feature vector (F ).
In our 3D simulations, we modelled a forward projector based on the native geometry of the Siemens Biograph mMR scanner (Siemens Healthcare, Erlangen), which included attenuation, normalization factors, randoms and scatter coincidences, as well as PSF modelling. The normalization factors were estimated from a 5 h 68 Ge phantom scan using a component-based approach (Belzunce and Reader 2016). The scatter sinograms were simulated by Gaussian smoothing in the radial and angular directions (with Gaussian kernels of 20 cm and 8 radians in FWHM) of the direct planes (segment zero) of net-true emission sinograms (obtained from forward projection of the true activity which was then attenuated and normalized by the attenuation and normalization factors). The resulting simulated 2D scatter sinograms were then extended to 3D sinograms using the so-called inverse single slice rebinning and scaled to simulate a scatter fraction of 50%. A randoms fraction of 30% was also simulated using Poisson noise derived from a uniform mean value. The PET system's PSF was modelled using an image-space shift-invariant Gaussian convolution kernel of 4 mm in FWHM. This approximation is appropriate for brain imaging in a large-bore clinical PET scanner, where the degradation of the scanner's resolution is mainly an issue towards the periphery of the FOV (Rahmim et al 2013). In our numerical simulations, a low-count PET scan with 10 million prompt coincidences was simulated to demonstrate the application of anato-functional priors in especially low-dose or short-time PET scans, where the need for regularization is greater. The geometric component of the PET forward projector was implemented based on a single-ray Siddon's algorithm in C++ with MATLAB (The MathWorks, Inc) interface. The reconstructions were performed on a 12-core Intel Xeon 3.5 GHz workstation with 128 GB RAM with multithreading capability. Nonetheless, owing to the high computational burden of 3D MAP reconstructions, particularly for bias-variance evaluations (see next section), the mMR PET simulations were performed with axial compression (span) of 35, leading to 40% speed-up in each EM update compared to span 11, which is the default configuration of the mMR scanner. However, it should be emphasized that the simulated sinograms were generated by a PET projector built for span 35 and reconstructed with the same projector. Therefore, there was no resolution and image quality loss due to axial compression of sinogram planes. Note that if the simulated sinograms had been created using a span 1 projector and then compressed to span 35, there would have been resolution loss, however, in our simulation the sinograms were created using a span 35 projector and reconstructed without any axial compression. In supplementary figure 1 (stacks.iop.org/PMB/62/5975/mmedia), simulations with sinograms created using span 11 and span 35 projectors and sinograms created with span 1 and axially compressed to span 11 and 35 has been compared, showing that resolution loss occurs only for axially compressed sinograms. It should be emphasized that these simulations are an approximation of the real data case, but the reconstruction of the idealised span 35 is not approximate, so no resolution degradation occurs. In our workstation, the calculation time for the derivative of the prior (which is performed using a single core) is around 6.5 s, while the reconstruction time for one MAP-EM update for span 11 sinograms is about 2 min, which is reduced by 40% for span 35 sinograms.

Clinical PET-MR datasets.
The performance of the included priors was also evaluated for the reconstruction of clinical PET-MR brain datasets, acquired from the Siemens mMR scanner at King's College London & Guy's and St Thomas' PET Centre. In this study, two datasets consisting of [ 18 F]florbetaben and [ 18 F]FDG PET scans were included. The study was approved by the institutional review boards and the research ethics committee. Written informed consent was obtained from all study participants. The [ 18 F]florbetaben was used as a myelin marker for MS patients as these tracers have significant affinity for myelin (Bodini et al 2016). The patient referred for myelin imaging was administered 300 MBq [ 18 F]florbetaben and a single-bed dynamic PET scan was immediately started and lasted for about 95 min. A 3D T1-magnetization prepared rapid acquisition gradient recalled echo (MPRAGE) and a dual-point Dixon MR sequence were acquired simultaneously with the PET data and respectively used for diagnostic anatomical imaging and attenuation map generation for PET attenuation correction. The T1-MPRAGE data were acquired using a 5-channel head and neck coil with the following parameters: repetition time (TR): 1700 ms, echo time (TE): 2.63 ms, inversion time (TI): 900 ms, number of averages (NEX): 1, flip angle: 9°, pixel bandwidth of 199 Hz, reconstruction matrix size of 224 × 256 × 176 and voxel size of 1.05 × 1.05 × 1.1 mm 3 and. The Dixon data were acquired using the spoiled gradient recalled (SPGR) sequence with the following parameters: T1: 3.6 ms, TE: 2.46 ms, NEX: 1, flip angle: 10°, pixel bandwidth of 946 Hz, reconstruction matrix size of 192 × 126 × 128 and voxel size of 2.06 × 2.06 × 3.12 mm 3 . In this study, we histogrammed the list-mode data and used the last 5 min of the whole scan with the aim of evaluating the MAP reconstructions in low-count and short-frame PET scans. In the resulting PET emission sinogram, a total count of about 37 million prompts (25 million net trues and 12 million delayeds) were recorded. In the second dataset, the patient referred for an [ 18 F]FDG PET scan was administered ~200 MBq of [ 18 F] FDG and after 75 min' uptake underwent a 30 min single-bed static PET scan. A total number of counts of about 580 million prompts including 450 million net trues and 130 million delayeds were recorded in this scan. Similar to the [ 18 F]florbetaben scan, the T1-MPRAGE and Dixon MR sequences were acquired for this dataset. Figure 3 shows the representative coronal slices of the PET and MR images of the datasets. The florbetaben and FDG images shown in figure 3 were reconstructed using both the vendor-provided reconstruction software (Siemens e7 tools, 3 iterations, 24 subsets, with and without sinogram-based PSF) as well as our own software (with 72 iterations, 1 subset, and 150 iterations and 1 subset of MLEM, with 4/3 mm post-reconstruction Gaussian kernel).
To evaluate the performance of the MP priors in the case of mismatches between the PET and MR images, two artificial hot lesions with different sizes (small and large) and activity levels were simulated in the FDG PET dataset (see figure 13 in the Results section). For this simulation, two lesion masks were drawn on the original lesion-free PET images. The activity of the voxels within the lesion masks were set to that of the corresponding voxels in the original PET images increased by 40% and 80%. The simulated lesions were smoothed by a 2.5 mm Gaussian kernel and then forward projected to obtain a lesion sinogram, which was then attenuated and normalized. A 10 million-count Poisson noise realization was generated from this sinogram. Finally the noisy lesion sinogram was added to the original lesion-free sinogram.
To demonstrate the performance of the anato-functional priors using the proposed joint Burg entropy in low count scans, the full-statistics sinogram of the FDG PET dataset was randomly down-sampled to achieve datasets with 20%, 8%, 3% and 1% of the data, while preserving the Poisson statistics of the data.

Evaluation metrics
To objectively evaluate the performance of the regularization methods in the reconstruction of simulated datasets, a region-of-interest (ROI) based bias-variance analysis was performed using 10 noise realizations. Three ROIs were considered, each one included all voxels belonging to WM, GM and tumours in the true activity map. The bias was calculated in each ROI as follows: Nr Nr r=1 x r j represents the ensemble mean value of each voxel calculated for all N r noise realizations. N ROI is the total number of voxels in the ROI. x true is the PET ground truth image. The variance was calculated using the average of the pixel-level coefficient of variation (COV) for each ROI: For a single-noise realization, the reconstruction methods were also compared based on the normalized mean square difference (NRMSD) between an image estimate x and the ground truth as follows: As in clinical datasets, there is no ground truth image we qualitatively evaluated the performance of T1-MR guided and MP MR guided PET image reconstruction compared to the standard MLEM reconstruction. Figure 4 compares the reconstruction results of the brain phantom using the studied algorithms including: the MLEM algorithm with 4 mm post-reconstruction Gaussian filtering, Tikhonov and TV priors and T1-MR anatomical priors, i.e. the Kaipio, Gaussian-P/V, Bowsher and the joint Burg entropy. This figure shows a representative sagittal slice comparing the performance of methods in the recovery of a metabolically active lesion and a region of cortex with mismatched MR anatomical boundaries (see arrows), respectively. In supplementary mat erials figure 2, the reconstruction results are also presented for other views. As can be seen, the MAP reconstruction methods have substantially reduced noise compared to the MLEM algorithm as the included priors are especially designed to enforce smoothness and, if applicable, to preserve edges. The results show that the Gaussian-V, Bowsher and Burg methods achieve the best performance for the region where the PET and MR image have common boundaries. However, as shown, it is noticeable that the Gaussian-P/V and Bowsher priors tend to completely suppress the PET-unique lesion compared to the Kaipio and JE priors, which also exploit MR anatomical information. As mentioned earlier, for PET regions that do not have corresponding MRI regions, the Kaipio degenerates to the Tikhonov prior, therefore for this simulated lesion, they perform similarly. On the other hand, the Burg prior in essence relies of both MRI and PET information, therefore it is able to preserve PET unique feature. As shown in figure 4, the Gaussian-P prior leads to the blurring of the edges compared to Gaussian-V, which should be attributed to the fact that the similarity kernels are calculated based on patches centred on each voxel, which can rigorously exclude dissimilar voxels (see figures 1(D) and (E). By proper selection of the σ u parameter, the Gaussian-V prior can approach the Bowsher prior with a given B number. In this experiment, the regularization parameter β was selected based on minimizing the NRMSE over the whole brain. Figure 5 shows the NRMSE performance of the algorithms versus the regularization parameter for the whole brain as well as the tumours. The optimal β values of 1, 200, 0.1, 8, 8, 10 and 10 were found for Tikhonov, TV, Kaipio, Gaussian-P, Gaussian-V, Bowsher and JE methods, respectively. Supplementary materials figure 3 shows the NRMSE versus the actual values of β. The σ u parameter of the Gaussian-P/V priors were heuristically set to 0.08. For the Bowsher prior, 70 most similar neighbours out of 342 (B = 70) were considered. The σ u and σ v parameters of the JE prior were heuristically set to 5 and 1 × 10 3 . The combination of these parameters were selected such that the reconstructed images are free of artefacts (such as the individual or group of voxels that get extremely higher values and noise-like artefacts), especially in the case of Kaipio and Gaussian priors. Figure 6 show the results of MP guided PET image reconstructions in the brain phantom dataset for the Gaussian-P/V, Bowsher and Burg JE priors in comparison with T1-MR guided reconstructions. Figure 4 in supplementary materials also shows other views. As shown, there are a missing tumour and a mismatched anatomical region that is partly complemented by the T2-MR image. In the T2-MR image however there are two lesions of which the smaller one matches the PET lesion. The results showed that the inclusion of T2-MR and as well as PET information can lead improved recovery of the lesion in the PET images. As can be seen, the large lesion in T2-MR image has not significantly induced false edges in the MP-guided PET images, however, in all reconstructions there is a faint trace of the inferior edge of the lesion, especially in the JE prior, where noise has been sparely preserved. Figure 4(B) in supplementary materials also shows that the MP-guided reconstruction improves the performance of the methods in the case of anatomical mismatches, especially for the Bowsher method. Supplementary materials figure 5 compares the activity profiles of the reconstruction methods shown in figures 4 and 6.  Table 3 presents the NRMSE results of the reconstruction methods calculated over GM, WM and tumours of the brain phantom. The results shows that in GM, the Bowsher gives rise to the lowest errors especially with its MP extension. In the WM of the simulated FDG phantom, where there is less uptake, both MP Bowsher and Burg JE prior outperform the other methods, whereas in the tumours, the TV and JE priors achieve the lowest errors. As can be seen, the Bowsher and Gaussian priors lead to the highest NRMSE in the tumours, which is consistent Figure 5. The selection of optimal regularization parameter for the brain phantom based on minimization of the NRMSE over the whole brain (left). The NMRSE results are also shown for the tumours (right). with findings in figure 4. The ability of the TV prior in preservation of the active tumours should be ascribed to the fact that for the voxels that the magnitude of their local differences is large, the prior assigns lower weights on those differences (see equation (A.2) in appendix A), as a result, the smoothing is suppressed in their neighbourhood. Due to the lower uptake of FDG in WM, the reconstructed images are noisier in this region and hence the NRMES results of the methods are relatively higher in this region. Figure 7 shows the NRMSE performance of the reconstruction method as a function of iteration in the GM, WM and tumours. The algorithms were further quantitatively compared based on their bias-variance performances using 10 noise realizations of 3D reconstructions. In figure 8, the bias-variance results have been presented for both T1-MR guided and MP-guided priors. The results show that in the grey and WMs the Gaussian-P/V, Bowsher and Burg JE priors present the best bias-variance tradeoff compared to the Kaipio and the conventional Tikhonov and TV priors. In these ROIs, the performance of the Bowsher and Burg JE priors is comparable, however, the JE prior results in less bias at the expense of a slightly increased variance due to the presence of high-valued isolated voxels in the images reconstructed when using this prior. As shown, the Gaussian-V method outperforms the Gaussian-P method in terms of bias (as their variance performance is almost the same) but it lags behind the Bowsher method. In the tumour ROI, the Gaussian-V/P and Bowsher methods show the largest bias, as they tend to suppress the PET unique features, while the TV and JE priors achieve the best bias-variance trade-off. The results of MP guided reconstructions show that the bias-variance performance of the MAP reconstruction is slightly improved in terms of bias in WM and GM regions, whereas in the tumour ROI the MP extension of these priors substantially improve their performance, while the performance of the MP-Burg prior nearly remains the same as T1-Burg. Figure 9 shows the impact of MR image quality on the performance of the Kaipio, Bowsher and Burg priors. In this evaluation, the true T1-MR image was smoothed using 3 and 4 mm FWHM Gaussian filters and corrupted by Gaussian noise with standard deviation equal to 1% of largest voxel intensity and the resulting MR images were used for the above-mentioned anatomical priors. In these reconstructions, all the involving parameters were kept constant and the same as before. As could be predicted, the performance of the Kaipio prior is deteriorated by the degradation of MR image quality. With the high-quality and noise-free true MR image, this prior performs closely to the other priors and results in a comparable edge enhancement in the PET image. However, with the low  Figure 10 shows the reconstruction results of the [ 18 F]florbetaben dataset using the studied reconstruction methods for a representative sagittal slice. In this experiment, the regularization parameter β was heuristically selected. The β values of 1 × 10 6 , 1 × 10 5 , 3 × 10 5 were set for Tikhonov, TV and Kaipio priors, respectively, while the β value of 2 × 10 7 was set for Gaussian-P/V, Bowsher and Burg JE methods. As in simulations, the σ u parameter of the Gaussian-P/V priors were set to 0.08. For the Bowsher prior, the B value was set to 70 as in simulations. The σ u and σ v parameters of the JE prior were set to 1 and 10. The results reveal the MLEM reconstruction suffers from a considerable noise and non-uniformity within the WM. The Tikhonov and TV reconstructions can partly reduce noise so that some sulci become more visible. In comparison with these conventional MAP reconstructions, the anatomical MAP methods result in noise reduction and recovery of the anatomical boundaries of the WM given that at late-frame PET scans the florbetaben washes out of GM and accumulates in WM. For this dataset, as can be seen the Kaipio prior gives rise to better reconstruction compared to Gaussian-P method, however, this method leads to a slight underestimation of activity.

Simulations
In this experiment, all of the reconstructions were terminated after 150 iterations. For higher number of iterations, the results of the Kaipio prior remained the same indicating that the algorithm has reached a stationary point. Consistent with our simulations, the results show that the Gaussian-P prior is being outperformed by the Gaussian-V prior, which has a very similar performance to the Bowsher prior, of course, depending on the selected parameters. For this dataset, the Bowsher and Burg JE priors demonstrate the best performance in terms of mean activity estimation and the details of the cold GM. The MP extensions of these priors did not reveal substantial differences (results not shown here), since in this dataset the PET tracer is uniformly distributed in the WM and there are not notable mismatches between the PET images and the T1-MR images.  Figure 11 shows the impact of the shape parameters of the Gaussian-V, Bowsher and Burg priors, as the three best performing priors included in this study, in the anatomical reconstruction of the florbetaben dataset. For the Gaussian-V and JE priors, these shape parameters are the standard deviation of Gaussian kernels used to measure voxel similarities in the MR Figure 9. The impact of MR image quality on the performance of the Kaipio, Bowsher and joint Burg entropy anatomical priors. The true MR image was blurred using 3 and 4 mm Gaussian smoothing filter and Gaussian noise was added to the resulting images. images, σ v , while for the Bowsher prior it is the number of most similar neighbours, B. With increasing σ v , the calculated similarities between a given voxel and its neighbouring voxels is increased, since the Gaussian function would relatively map a given local difference to a higher value. As a result, the similarity kernels become more and more uniform and as can be seen in this figure, the Gaussian-V and Burg priors approach an isotropic Tikhonov prior that suppress both noise and valid edges. Similarly, as the B-value of the Bowsher prior is increased, more and more neighbours are included for the calculation of local differences, as a result, the Bowsher prior also becomes an isotropic Tikhonov prior. It can be seen that for small shape parameters the priors tend to preserve noise and even introduce noise-like artefacts. Note in this experiment, all images were reconstructed with the same regularization parameter, β = 2 × 10 7 , and shown at the same iteration number n = 150. The joint Burg entropy prior has an additional shape parameter defined for the PET images, σ u , for all JE reconstructions, this parameter was set to 1. Figure 12 compares the reconstruction results of the FDG brain dataset using MLEM, conventional MAP and anatomical MAP reconstructions using the T1-MPRAGE MR image. The results demonstrate that the overall resolution and quality of the PET images have been improved by inclusion of anatomical information, especially using the Gaussian-V, Bowsher and JE priors. As a result, the spill-over and partial PVEs visible in the MLEM and Tikhonov and TV reconstructions have been considerably reduced by the anatomical priors. In this experiment, the regularization parameter β of 1 × 10 4 , 5 × 10 2 , 1 × 10 3 was heuristically set for the Tikhonov, TV and Kaipio priors, respectively, while a value of 2 × 10 5 was set for β for the Gaussian-P/V, Bowsher and Burg JE methods. The σ u parameter for the Gaussian-P/V priors was set to 0.1. For the Bowsher prior, the B value was set to 70. The σ u and σ v parameters of the JE prior were set to 0.1 and 5 respectively. Figure 13 shows the reconstruction results of the FDG dataset with simulated lesions shown in the T1-MR image. The reconstructions with anatomical priors lead to improved definition of the GM and its separation from the cold WM. Note that the MLEM reconstructions were smoothed using a 3 mm Gaussian kernel compared to the standard 4 mm kernel. As a result, there are some residual normalization artefacts in the MLEM images, which are removed in the MAP-EM images as a result of regularization. Figures 13(A) and (B) compare the activity profiles of the reconstructed images along the lines shown on the large and small lesions, respectively. In these profiles, the result of the MLEM reconstruction without postreconstruction smoothing has also been included. The anatomical priors, Kaipio, Gaussian-P/V and Bowsher notably suppress the activity of the lesions compared to the anato-functional joint Burg entropy prior and conventional priors. These results highlight the value of MP anato-functional priors for improved PET image reconstruction while preserving PET unique lesions.

FDG dataset. 3.2.2.1. Partial volume correction (PVC).
The MP extension of the Bowsher prior was further evaluated in comparison with the proposed JE prior. Figure 14 compares the PET images reconstructed by MLEM with the cases of using the Bowsher, MP Bowsher and JE priors for two transverse slices through the simulated lesions. In figures 14(A) and (B), the regional mean activity of the tumours at each iteration of the reconstruction algorithms are shown for the large and small tumours, respectively. The images have been shown with the same display range of grey scale intensities in order to highlight their relative performance in preserving the PET unique tumours. As can be seen, the conventional Bowsher prior, relying on the T1-MR image only, has notably reduced the estimated activity of the tumours, especially the smaller one which has been completely suppressed. On the other hand, the MP extension of the Bowsher prior, relying on both the T1-MR image and the PET image reconstructed at each iteration, gives rise to the improved performance of the prior. In comparison, the proposed joint Burg entropy, which by definition relies on both the PET and MR images, is capable of preserving the tumours and at the same time, similar to the Bowsher prior, improving PET image quality and details. In figures 14(A) and (B), the graphs show that the mean activities of the tumours estimated by the Bowsher-MP prior are still far from those estimated by the MLEM algorithm for iterations greater than 70 iterations, where the regional activity has almost converged. However, the results show that compared to post-smoothed MLEM algorithm the JE prior resulted in 5.3% and 9.3% errors in estimation of mean activity of the small and lesion large tumours, respectively, whereas the Bowsher-MP resulted in −25.1% and −24.4% errors in those mean activities, respectively.

Improved image quality of low-count PET scans.
Apart from PVC of PET images, the MAP image reconstruction using anato-functional priors can be exploited to achieve comparable image quality to that obtained by the conventional MLEM reconstructions but with less data. To demonstrate this new application, the performance of the joint Burg entropy, as the best performing anato-functional prior introduced in this paper, was first evaluated against the MLEM algorithm for reconstruction of the full-statistics FDG dataset for a range of regularization parameters. Figure 15 shows the results for representative transverse and sagittal slices. The MLEM images are shown without post-reconstruction smoothing to reveal the presence of noise and normalization artefacts. Next, the MAP reconstruction with β = 7 × 10 4 was selected for comparison with MLEM reconstruction at different count levels. The reason for choosing this β among those considered was in order to obtain images comparable to the MLEM reconstruction of high-count data. As described before, the full-statistics sinogram data were randomly down-sampled to achieve datasets with 20%, 8%, 3% and 1% of the data. The datasets were then reconstructed using the MLEM and MAP-EM algorithms. Figure 16 shows the reconstruction results for different percentages of the data. Figure 6 in the supplementary materials compares the images in representative transverse slices. The MLEM algorithm post-smoothed with Gaussian kernels with different FWHMs are also shown. Since the MAP reconstructions with the considered β-value were still suffering from noise, they were post-smoothed using a 2.5 mm Gaussian kernel. The β-values used for the 20%, 8%, 3% and 1% datasets were respectively chosen to be 3.5 × 10 5 , 8.5 × 10 5 , 3.5 × 10 6 and 2 × 10 7 , which were estimated by scaling  the considered β-value used for the full-statistics data (i.e. β = 7 × 10 4 ) with the ratio of the total counts in the full-statistics to that of the down-sampled sinograms. The reconstructed images were scaled by the same ratio used as a calibration factor to account for differences in total count level. As a result, all images in figure 16 were displayed using the same grey scale intensity range. For a given percentage of used data, for instance 100%, as the Gaussian kernel's FWHM is increased, the noise level of the MLEM images is reduced, however, at the expense of resolution degradation. For the 100% dataset, post-smoothing with a 3 mm Gaussian kernel appears to give the best compromise between noise and resolution, while for other datasets, broader kernels are required to reduce noise. On the other hand, the results of the MAP reconstructions show that the images reconstructed by very low amounts of data are fairly comparable to the MLEM reconstructions with higher count levels. For instance, the MAP-EM with 8% of the data is comparable to, or for some details even better than, the MLEM + 3 mm smoothing when using only 20% of the data (see also supplementary materials figure 6 for this comparison). For very low-count levels, the results show that the MAP reconstruction is capable of the recovery of the details that are lost even with higher count levels, as shown for instance in the results using just 1% of the data.

Performance of the anatomical priors
Our realistic 3D simulation results, presented in figure 4 and supplemental materials figure 1, showed that for the conventional MR-guided PET image reconstruction, the Gaussian-P/V and Bowsher methods lead to a substantial suppression of PET unique lesions and boundaries. In comparison, the proposed joint Burg entropy prior not only can well preserve the PET boundaries with matched MR anatomical boundaries but also the PET unique lesions. This promising aspect of the joint priors is brought by the fact that they rely on the joint distribution of MR and PET images, which leads to the exploitation of both PET and MR information in the identification of both common and unique features. In addition, the results presented in figure 6 and supplemental materials figure 3 showed that the proposed MP extension of the Gaussian and Bowsher priors can significantly improve the performance of these priors in the preservation of PET lesions and boundaries. The NRMSE results presented in figure 7 show that the MP anato-functional extension is more beneficial for the Gaussian and Bowsher priors than that of the JE prior. For instance, for tumour ROIs, this extension leads to 14%, 15%, 20% and 10% reduction of errors for the Gaussian-P, Gaussian-V, Bowsher and Burg priors, respectively.
The images reconstructed by the Burg prior show some isolated voxels with high intensity values, especially in its MP extended version. This issue has also been previously reported for the case of the original Shannon JE prior in Somayajula et al (2011) and is evident from the results in Tang and Rahmim (2009). As pointed out in Somayajula et al (2011), for low count datasets, the prior tends to increase the variance in the distribution of PET image intensities, thus causing isolated high-valued voxels. However, depending on the selection of the regularization parameters and Parzen bandwidths the occurrence of these voxels can be reduced.
The reconstruction of clinical datasets using the anatomical priors also showed the clinical feasibility of these advanced reconstruction methods especially in low-count and short-frame PET scans, where the conventional MLEM reconstruction fails to fully recover the anatofunctional details. In the reconstruction of the florbetaben and FDG PET datasets, it was found that the anatomical MAP reconstructions notably compensate for partial volume effect in the estimated uptake of the white and GMs, respectively.
In terms of computational complexity, the MP version of the Gaussian-P prior is the most time consuming prior since the PET component of the similarity kernels needs to be recalculated after each iterative update for a small patch around each voxel in a large neighbourhood. The MP version of the Gaussian-V, Bowsher and the joint Burg entropy priors come second in terms of computational cost, and require the same computations in the recalculation of the PET part of the voxel-based similarity kernels. Finally, the similarity kernels for conventional Bowsher and Gaussian-P/V priors can be precomputed from the MR images and so does not add substantial computational load to the reconstruction.

Comparison with the prior work
In Vunckx et al (2012), three different anatomical priors, including an MR segmentationbased prior, the Bowsher and the Shannon JE priors, were evaluated in brain PET imaging. The authors reported that in comparison with the Bowsher prior, the Shannon JE often converges to a local maximum and its performance highly depends on the section of the many involved parameters (i.e. the number of bins by which the joint probability distribution is integrated, bandwidths of the Parzen windows, weight of the prior). To avoid convergence to an undesirable local maximum, the author proposed the initialization of the JE reconstruction using an image reconstructed by a few iterations of the Bowsher-MAP method or the gradual increase of the regularization parameter. It was concluded that the Bowsher prior can fairly outperform the Shannon JE prior as it behaves more predictably and its parameters can be much more easily selected, which make this prior practical for clinical routines. Moreover, it does not require the segmentation of MR images, as the segmentation-based priors heavily rely on the accuracy of the segmentation, especially in the case of lesions and abnormalities. In comparison, our results showed that the proposed Burg JE prior has a number of favourable properties. As shown in figure 8, this prior has very similar convergence properties as the Bowsher method in the anatomical regions (i.e. WM and GM). Note all of our reconstructions were initialized using a uniform initial guess. Based on the results in figures 6 and 8, it performs quite similar to or even slightly better than Bowsher prior in WM and GM and more importantly it outperforms the Bowsher and other priors in the reconstruction of PET unique lesions and edges. Compared to the Shannon JE, the proposed Burg JE prior is free of the selection of number of bins and computationally much less expensive, as the prior is degenerated to a weighted non-local Tikhonov prior, where the weights can be precomputed for multi-contrast MR images.
In this work, the performance of the Kaipio prior, as a new class of anatomical priors were also studied. As we elaborated, this prior can also be interpreted as a non-local Tikhonov prior in which the similarity kernels are calculated based on MR normal vectors (see equation (15)). Our results in figure 9 showed that the performance of this prior highly depends on the quality of the MR image as it relies on MR normalized gradient vectors. The normalization of this vectors makes them independent of their magnitude but very sensitive to noise and blurring artefacts. In Ehrhardt et al (2016), it was shown that the Kaipio prior can outperform the Bowsher prior in 2D PET image reconstruction for a 3D 3 × 3 neighbourhood. However, our 3D realistic reconstruction results showed that this prior is far being outperformed by the Bowsher prior for a 3D 7 × 7 × 7 neighbourhood. This could be ascribed to the size of the neighbourhood and the selected number of most similar neighbours. As pointed out in Vunckx et al (2012), the Bowsher prior with larger neighbourhoods results in improved results compared to smaller ones. In Vunckx et al (2012) and Ehrhardt et al (2016) and our study, this prior was evaluated using 4 neighbours in a 3 × 3 neighbourhood, 80 neighbours in a 5 × 5 × 5 neighbourhood and 70 neighbours in a 7 × 7 × 7 neighbourhood respectively, which can explain the improved performance of the Bowsher in our results. Note that in this study, we did not include the parallel level set anatomical prior proposed in Ehrhardt et al (2016), as similar to TV prior, it is not continuously differentiable and its derivation is not as straightforward as the Kaipio or a quadratic prior which can be suitably employed in one-steplate optimization of the MAP-EM algorithm. In Ehrhardt et al (2016), the authors employed a limited-memory BFGS, quasi-Newton type algorithm for the optimization.
In this study, we explicitly emphasized two applications of anato-functional priors for PVC of PET data as well as reducing PET scan time or alternatively reducing the injected dose of radiotracer. For low-dose PET scans, TV regularization has been the primary focus of study in the literature, however as can be seen in Müller et al (2011), at such low-statistics the reconstructed images suffer from stair-casing artefacts in spite of improved signal to noise ratio. In this study, a smoothed TV prior was included due to the differentiability requirement of the OSL optimization algorithm, therefore those artefacts were eliminated at the expense of reducing the edge-preserving properties of the TV prior. Nonetheless, in this study, we have highlighted low-dose PET image reconstruction using anato-functional priors, which can find immediate application in dynamic or follow-up PET scans.

Parameter selection
The performance of the MAP image reconstruction highly depends on the selection of the involving hyper-parameters, which in turn depends on the task of the image reconstruction, that is, lesion detection, tracer quantification, and etc. These parameters in fact control the properties of the prior functions and their impact on the image being reconstructed. In this study, the shape parameters of the studied priors, listed in table 2, were subjectively selected, while for simulation the β parameter was objectively selected with respect to a ground truth. As could be expected, in our experiment, we found that the performance of the MAP reconstruction highly depends on the regularization parameter, β, which varies depending on the type of prior and the object's activity level, scan duration and body weight. With increasing β and thus the strength of the anatomical prior, the PET edges that have matched MR edges are strongly preserved while the PET unique edges are suppressed. Therefore, it is expected that with a lower β, the bias-variance performance of the priors in tumour ROIs can change, however at the expense of increased variance in GM and WM. Nonetheless, with the MP extension of the prior, it is possible to still use high-valued β to reduce noise, yet better preserve PET unique features. Another important parameter that is specific to the non-local prior is the size of the neighbourhood. In Vunckx et al (2012), it is been reported that the performance of the Bowsher prior is improved with increasing the neighbourhood size, which can be attributed to the fact that in larger neighbourhoods the similarity between voxels and the continuity of edges are better captured. However, the suppression of PET unique features is presumably reduced with smaller neighbourhoods. Additionally, most of the anatomical priors included in this study have additional shape parameters that considerably affect the quality of the reconstructed images, as shown in figure 11. These B, σ v parameters are defined for the MR part of the similarity kernels prior to the PET reconstruction. In this study, we also predefined the σ u parameter which is related to the PET part of the kernels. Adaptive selection of this parameter has also been explored in Tang and Rahmim (2009) based on the intensity of the PET image estimates at each iteration. Some of these parameters such as neighbourhood size N , feature vector size, F , standard deviation of MR Gaussian kernels, σ v , can be fixed and standardized between patients, however the others such as the regularization parameter, β, and standard deviation of PET Gaussian kernels, σ u , should be adjusted for each patient based on the count level, particularly β.

Future work
In this work, we studied the Gaussian similarity kernels as a commonly used radial basis function (RBF). However, other RBFs such as the Bessel function, multiquadrics, thin-plane splines and Matern can also be investigated in order to find the best kernel with least shape parameters Fasshauer and Zhang (2007). The proposed MP regularization method can also be evaluated for post-reconstruction denoising of the PET images (Turkheimer et al 2008, Chan et al 2014, MR-guided deconvolution of PET images (Yan et al 2015) or multi-scale resolution recovery methods (Shidahara et al 2009) in order to preserve the PET unique features. Finally, future work includes the assessment of the added value of the MP MAP PET image reconstruction in PVC and improved quantification of tracer uptake in brain imaging of patients suspected of Alzheimer's disease using a large patient population. In Loeb et al (2015), the anatomical Bowsher prior has been successfully applied for low-dose dynamic PET-MR image reconstruction with matched PET and MR lesions in both simulation and clinical datasets. Hence, the improved performance of the proposed MP joint Burg entropy prior remains to be demonstrated in dynamic PET-MR scans where there are mismatched anato-functional features.

Conclusion
In this work, several state-of the-art anatomical priors were studied and extended to MP anatofunctional priors in a common framework. Moreover, a modified MP joint Burg entropy prior was introduced. In addition to PVC, application of anato-functional priors to the reconstruction of low-count PET data for achievement of comparable image quality to that obtained by MLEM reconstruction with high-count data was demonstrated. In both our simulation and clinical results, the conventional anatomical priors resulted in the suppression of PET unique features, which was notably reduced by the MP extension of these priors. The results showed that the Tikhonov priors with Gaussian similarity kernels, calculated using voxel-based feature vectors and with the Bowsher similarity kernels and the proposed prior result in the most accurate recovery of PET details. It was also found that the proposed prior is more robust in preserving PET unique features.
where ω jb are normalized Gaussian weighting coefficients given by: .

Appendix C
The Kaipio prior reformulated in Ehrhardt et al (2016) can be defined as following: x, x and x, y = i x i y i = x 2 y 2 cos(θ). In this prior, the L 2 norm-square of the weighted gradient vector, g j = [u j − u 1 , . . . , u j − u D ] T , of the image u at jth voxel is subtracted by the square of the inner product of the normal vectors n j = [n j1 , . . . , n jD ] T and the vector ξ j g j , in which the elements of g j are weighted by ξ j = diag ξ jb , where ξ jb are the proximity weighting coefficients. The (C.1) can be expanded as: By the expansion of the second term in (C.1) and knowing that n j 2 = 1, one can also show that this prior can be formulated as in (9), that is: where θ j is the angle between the vectors g j and n j . The derivative of the prior defined in (C.2) is given by: