Joint reconstruction of dynamic PET activity and kinetic parametric images using total variation constrained dictionary sparse coding

Dynamic positron emission tomography (PET) is capable of providing both spatial and temporal information of radio tracers in vivo. In this paper, we present a novel joint estimation framework to reconstruct temporal sequences of dynamic PET images and the coefficients characterizing the system impulse response function, from which the associated parametric images of the system macro parameters for tracer kinetics can be estimated. The proposed algorithm, which combines statistical data measurement and tracer kinetic models, integrates a dictionary sparse coding (DSC) into a total variational minimization based algorithm for simultaneous reconstruction of the activity distribution and parametric map from measured emission sinograms. DSC, based on the compartmental theory, provides biologically meaningful regularization, and total variation regularization is incorporated to provide edge-preserving guidance. We rely on techniques from minimization algorithms (the alternating direction method of multipliers) to first generate the estimated activity distributions with sub-optimal kinetic parameter estimates, and then recover the parametric maps given these activity estimates. These coupled iterative steps are repeated as necessary until convergence. Experiments with synthetic, Monte Carlo generated data, and real patient data have been conducted, and the results are very promising.

Keywords: dynamic PET reconstruction, joint reconstruction, direct estimation, total variation, dictionary sparse coding (Some figures may appear in colour only in the online journal)

Introduction
It has been the fundamental goal of many efforts in the dynamic positron emission tomography (PET) imaging community to elucidate underlying biological and physiological processes in living tissues through reconstructing the spatiotemporal distribution of an injected radio tracer. With a suitable kinetic model, such as a compartmental model, the additional temporal information can be extracted from measured data-as in direct methods, or reconstructed distribution-as in indirect schemes, and can be generalized into physiological parametric images which are of greater clinical use in tracking treatments of diseases and tumor diagnosis compared to conventional static PET imaging techniques [1,2]. In practice, however, the data collecting process for each time frame is limited, due to the temporal resolution demand of acquiring a sequence of data at several time points during a continuous period of time. Therefore, in dynamic PET reconstruction problems, the inner connection between different time frames can no longer be ignored, and the use of conventional image reconstruction methods-which deal with the measured data frame by frame-will suffer from the low counts. In this work, tracer kinetics are integrated into the traditional frame-by-frame reconstruction algorithm, and thereby a direct estimation of the kinetic parameters of radio tracer is simultaneously performed during the optimization of the activity distribution.
The conventional image reconstruction problem has been considered for many years by many researchers. The majority of active reconstruction falls into statistical iterative configurations [3], being usually a specific model for the measurement statistics, and the physical systems with designed priors [4]. Among these iterative algorithms, two methods are often mentioned: these are the maximum a posteriori (MAP) [5][6][7][8][9] and the maximum likelihood (ML) based reconstruction algorithms, which have been widely examined by researchers ever since Shepp and Vardi presented their work in 1982 [10][11][12][13][14][15][16]. Based on the research in [17,18], with extra constraints, it is possible to reduce image corruption caused by low counts (via such methods as TV regularization [19,20] and hierarchical regularization [21], which is also effective in edge-preserving) or further achieve a joint reconstruction task by integrating other information into the image reconstruction problems, as described in [22,23].
Basically, there are two groups of approaches to the reconstruction of tracer kinetics from dynamic PET data: indirect and direct reconstruction. Indirect reconstruction efforts treat the image reconstruction and the dynamic parameter estimation as two different problems. This is based on the premise that the activity distributions are known as prior information, and the challenge is to estimate the kinetic parameters by fitting a mathematical model to the time activity curve (TAC) extracted from activity distributions [24]. On the other hand, the direct method combines those two steps into one problem, and the parametric images are directly estimated from the raw emission data [25][26][27][28][29]. In 1985, Carson and Lange first presented their research on direct parametric reconstruction based on expectation maximization (EM) algorithm [30]. Inspired by their work, later in 2005, Kamasak et al presented their research on the direct reconstruction of kinetic parameter images [31]. According to Kamasak, direct reconstruction methods often outperform indirect methods, whereas the solution becomes more complicated. In both direct and indirect methods, a suitable model is needed to fit the tracer kinetics and represent TACs by mathematical functions. Among these models, the nonlinear compartmental model is widely used to represent the connection between kinetic parameters and the underlying physiological processes. Representative works include spectral analysis [32], DEPICT [33] and sparse Bayesian learning [34]. Based on the theory of the compartment model, TACs are the sum of the activity concentration within each compartment. By pre-defining a suitable range of the basis function, the model fitting procedure can be treated as a regression problem with a biologically reasonable sparsity prior, which makes the model more robust to noise. Distinct from either indirect or direct estimation method, in which only one variable of activity distribution and parametric image can be directly connected with the raw sinogram data, we propose a joint estimation framework, tackling PET image reconstruction along with the direct tracer kinetic estimation process. The algorithm for our whole framework iterates between the alternating direction minimization process (providing the current estimation of the activity image), updating coefficients for the compartmental model through iterative thresholding algorithm (solution of which gives the current estimation of the coefficients for the system impulse response function) and finally minimizing the objective function.
In this paper, the joint reconstruction problem couples the traditional frame-by-frame image reconstruction method and the direct data-driven estimation of parametric images. Unlike the traditional objective function, for the reconstruction of either activity maps or kinetic parameters, our data fitting term is formed from both measured data and a biologically based compartmental model, and regularization such as total variational (TV) is added so as to improve the robustness of the results [35,36]. As for the dictionary sparse coding (DSC), it contains an over-completed dictionary composed of kinetic basis functions and an additional sparse constraint for the determination of the most related subset of the basis function. Moreover, since both statistical and biological constraints are added in our work, the results have the potential to give a more accurate and better fit for the tracer's fate in vivo. The hypothesis that we can use emission sinograms to track the activity distribution and tracer kinetics simultaneously is confirmed by the results of our experiments using synthetic, Monte Carlo simulated, and real data. The visually and quantitative results for the activity distribution and the suitable macro parameter are given in the section for experiments.

Problem formulation
2.1.1. TV based dynamic PET image reconstruction model. Unlike static PET, dynamic PET measurements are acquired as a sequence of temporal frames. During the processing of one time frame, a coincidence event comes from the annihilation process, which converts all the mass into energy and thus emits two photons, collected by annihilation coincidence detection. The counts of near-simultaneously acquired photons collected by the opposite detector pair are organized in the format of a sinogram. Hence for the mth time frame, in which = … m M 1, 2, , (with M being the total number of the frames), the sinogram can be denoted by a vector R ∈ y m I containing all the data collected by the I detector pairs in all, and the corresponding activity distribution for the mth time frame can also be denoted by a vector R ∈ x m J , where J is the total number of discrete pixels.
In general, the aim of dynamic PET reconstruction can be reduced to the estimation of a radioactivity map R ∈ × X J M from the raw projection data R ∈ × Y I M measured from the detector probes of PET through a transform I J is a geometry-based probability matrix and each entry g ij of G represents the possibility of detecting an emission from pixel site j at detector probe i.
Total variation regularization is then applied into the reconstruction and the sub-objective function for X can be formulated as a sum of TV regularizer and a data fitting term through the transform [37,38] in which ∥ ∥ ⋅ 2 refers to the square of Euclidean norm and α denotes a relative weighting parameter of data fidelity term ∥ ∥ − GX Y 2 .
( ) ⋅ TV is the total variation and ( ) X TV represents the sum of the discrete gradient of activity map X of every pixel j at each frame m. It can be calculated through where ( ) R ∈ Dx m j 2 is defined as the discrete partial derivatives of x m at jth pixel. In this work, a Gaussian based noise model, instead of the Poisson noise model, is used to unify the data fitting term and the TV norm, which will further simplify the solving process of activity distribution X.

Modeling of tracer kinetics.
One widely used model of tracer kinetic fitting is the compartmental model, which is tightly related to the underlying biochemical process [39]. Based on the theory of compartmental modelling, the time activity curve (TAC) is the sum of radioactivity within each tissue compartment obtained by convolving exponential functions with the blood input function or reference tissue input function.
Normally, a PET tracer compartmental model can be better defined as a plasma input model or a reference tissue input model, and it can be generally represented by a set of firstorder equations, each of which describes the tracer concentration of each tissue compartment. The solution is that the impulse response function consists solely of exponentials and a delta function [39]: where ( ) C t T represents the tissue concentration time course at time point t, while ( ) C t I refers to the input (plasma or reference tissue) concentration time course. q is the total number of tissue compartments and l represents the lth tissue compartment, ⊗ is the convolution operator, φ l and θ l represent, respectively, the coefficient and the exponent of the general impulse response function, for the lth compartment. ( ) δ t is the Delta function and the convolution of ( ) δ t , and the input function represents the concentration time course when there is no tissue compartment involved.
The measurement data for dynamic PET are obtained as a sequence of temporal frames and the tissue concentration value x jm for the jth pixel at the mth time frame can be generated through an integral over the individual frames from the continuous functions, i.e. the concentration time course of radioactivity in the target tissue. Therefore, the tissue radioactivity observations, in this case each entry x jm of the matrix X, can be further defined as in which t m s and t m e denote the start and end times of the mth frame and ( ) C t j T is defined as the tissue concentration time course ( ) C t T for the jth pixel. The continuous function ( ) C t j T must be normalized to the frame length to correspond to the data sampling process.
Based on the equation (4), ( ) C t T is the sum of the concentration in the target tissue and hence can be further expressed as an expansion on a basis in which N denotes the number of pre-defined basis functions, and corresponds to the scale set of since it may be pre-chosen from a physiologically plausible range ⩽ ⩽ θ θ θ min m ax and each θ c is spaced in a suitable manner within the proposal range so as to elicit a suitable coverage of the kinetic spectrum. In addition, the range of θ is closely connected to the tracer, i.e. the same set of θ may not fit for all situations. φ c is defined as the corre- The pre-defined basis functions, referring to the convolution of a set of exponential functions with the input function, are discrete, correspond to the time frames and are organized in the form of a matrix M N 1 as below, which is also known as the dictionary of the proposed framework [33].
The data fitting process can be achieved through the equation ≅ ΨΦ X (9) in which is defined as the matrix for the coefficient parameters of the general impulse response function, which is related to the tracer kinetics and the matrix for Φ corresponding to X is hence reformulated as Usually, a relatively large N is chosen for the matrix to ensure a good coverage of the kinetic spectrum, which would lead to an over-complete dictionary of Ψ. With the prior knowledge about the solution that the activity distribution is only composed by a few compartments, a sparse constraint for the coefficients of Φ can be added to select the most related subset of the basis functions; the process of fitting the data can be therefore transformed into the regression problem Based on the compartmental model, given a TAC, only a few elements of the dictionary should be enough to restore the parametric images, which leads to the following sub-objective function in which ∥ ∥ ⋅ 1 refer to the 1 norm and μ is a weighting parameter for the sparseness. The 1 norm can be replaced by the 0 norm to constrain the sparsity of Φ. Here, the 1 norm is chosen to make the problem easier to solve than the 0 norm and achieve basis pursuit denoising. Then, the appropriate macro parameters, for example volume of distribution of the target tissue where ( ) V j d represents the jth element in V d , which is related to the jth pixel in the activity distribution field.

Solution
Our aim here is to reconstruct dynamic PET activity map X and kinetic coefficient parameter Φ simultaneously. Therefore, we propose the objective function in our model by combining equations (1) and (11) as follows.
The weighting parameter γ is incorporated to control the relative ratio for the terms generated from kinetic modelling.
( ) X TV is the total variation of X and used to provide edge-preserving smoothing. ∥ ∥ − GX Y 2 and ∥ ( ) ∥ − ΨΦ X T 2 are the data fitting terms for X and Φ. The last 1 norm for Φ is the sparsity regularization term.
The model proposed above is then optimized by alternatively minimizing ( ) F Φ X, with respect to X and Φ as follows.

Activity reconstruction sub-problem.
Extracting the components of ( ) F Φ X, that concerns variable X gives Thus, the sub-problem here can be formulated as By introducing an auxiliary variable w jm , problem (15) is equivalent to the following constrained problem Therefore, the corresponding augmented Lagrangian function for problem (16) is in which ν jm is the vector of the Lagrange multipliers and β jm denotes the penalty parameters (for all j and m). An iterative alternating minimization scheme is then applied to solve the problem above. Minimizers X and w jm are achieved alternatively in the proposed algorithm. First, for a fixed X, the minimizers w jm can be achieved through which is formulated by extracting the relevant components from ( ) L w X ,

A jm
. In β > 0 jm case, the kth updated minimizer is then given by the 2D-shrinkage formula as follows [40] Next, for the fixed + w jm k 1 , noting that the terms ∥ ∥ − GX Y 2 and ∥ ( ) ∥ − ΨΦ X T 2 can be linearized, the update of X k+1 is estimated through approximately minimizing the following linearized alternating direction method of multipliers (ADMM) scheme for X In addition, the Lagrange multipliers also need to be updated with w jm and X through After each steepest descent step, w jm is updated and the process is repeated until the problem (14) is approximately solved within a prescribed tolerance.  (13), so the sub-problem of Φ is to minimize ( ) S Φ X, with respect to Φ to get an equivalent expression of ( ) S Φ X, as To treat the sub-problem , ( ) Q Φ X, can be linearized and the iterative thresholding algorithm [41] is then applied to minimize the sub-problem µ σ S is the soft shrinkage operator, and σ is the step size [41].

4: ν
The overall algorithm of the proposed model is summarized in algorithm 1. The algorithm terminates when both the variable achieve global convergence.

Experiments
In this section, the proposed algorithm is evaluated through experiments based on the Zubal phantom [42] synthetic data, Monte Carlo simulated data, and real patient cardiac data. Simulation results demonstrate both robustness and accuracy of reconstructed activity and parametric images under different conditions (noise level and counting rate). In addition, the selection of the tuning parameters mentioned in the previous section will also be presented as a form of variable controlling experiment in the following section.
For a part of the activity images, the reconstruction result of the proposal method is compared to the traditional ML-EM method, and quantitative evaluation is presented as a relative bias and relative variance for all the M frames, which are defined as where x jm denotes the true value of the jth pixel in the mth time frame, x jm is the reconstructed value for the pixel and x jm is defined as the mean value of all the pixels that belong in the same cluster with the pixel. Quantitative results for bias, variance, and the relative MSE (mean  square error) mentioned below can only be presented for synthetic and Monte Carlo simulated data with ground truth known.
To evaluate the estimation accuracy of parametric images, the result achieved by our joint framework is compared with data-driven estimation of parametric images based on the compartmental theory (DEPICT) method [33]. The input image for DEPICT is generated through the ML-EM method so as to unify the form of input data (sinogram) for our algorithm and the DEPICT method. The parametric image of V d (volume of distribution) is presented and quantitatively evaluated by relative MSE.

Synthetic Zubal phantom data
We first conducted the experiment on synthetic emission Zubal thorax phantom data with known radioactivity concentration as shown in figure 1. A dynamic PET image sequence (18 frames) was generated using the two-compartment model and the resolution for each frame was 128 by 128 pixels. The simulated tracer was 18 F-FDG and kinetic parameters were set as in [43]. Then, the sinogram was generated based on the system matrix, which was calculated based on the tomography geometry, and different levels of random noise were modeled so as to evaluate the robustness of our algorithm. The reconstructed activity images are partially demonstrated in figure 2, in which 'joint + TV/joint' refer to the proposal framework with ('joint + TV') and without ('joint') TV regularization.
It is clear that images reconstructed by the ML-EM method are blurry, with low contrast; joint has much less noise compared to ML-EM and the updated joint + TV framework achieved the most faithful representation of the ground truth compared to the other methods. Moreover, the improvement in the accuracy of the reconstructed image is clear from the quanti tative data shown in tables 1 and 2. From table 2, we can see that the results for joint + TV maintain a promising value when the noise level goes higher. Figure 3 shows the true value and the reconstructed value of V d . Compared to the ground truth, our results present a better structure and the contrast is higher so the different ROIs can be identified clearly. Table 1 shows the relative MSE comparison between the joint + TV algorithm and DEPICT.

Monte Carlo simulated data
In this section, we verify our algorithm on Monte Carlo simulated brain and cardiac phantom. The Monte Carlo method can simulate the physical imaging process of PET, thus providing realistic sinogram data, and more importantly, with the known concentration distribution, reconstruction results of Monte Carlo simulated data can be evaluated against a highly reliable criterion. 3.2.1. Brain phantom. Experiments on Monte Carlo simulated brain phantom are first presented. The template of the brain phantom is shown in figure 4, in which a tumor is inserted as ROI3. In this experiment, an image sequence ( ) × × 64 64 18 based on the original brain phantom is first generated through the two-compartment model. Kinetic parameters k1, k2, k3, k4 (min −1 ) and the temporal distribution of time frames are defined according to [44]. The PET scanner is simulated as real Hamamatsu SHR-22000 and the tracer is set as 18 F-FDG. The generated measurement data has × 64 64 projections. In addition, the data set is simulated with four different counts ( × × × × 5 10 , 1 10 , 5 10 , 1 10 ) in order to evaluate the robustness of the proposed algorithm.      Figure 5 shows the 9th and 12th frames of the reconstructed image sequence of four different counts. As the counts becomes higher, the noise is less involved and the reconstructed images preserve the structure of the ground truth better. To be more specific, the quantitative results for relative bias and variance exhibiting significant improvement compared to the ML-EM method are shown in table 3 .
As for the parametric images, reconstructed V d images are shown in figure 6. The inserted tumor is visually hard to identify even with counts of × 1 10 6 . The improvement for V d in higher count cases is quite obvious compared to that in lower cases, whereas they all present a promising result against the other reconstruction algorithms shown in table 4.

Cardiac phantom.
Next, we evaluate the proposed framework on a Monte Carlo simulated cardiac phantom. The template of the phantom is shown in figure 7. Since its ROI structure is simple, this time we mainly focus on the tunable parameters. The dynamic image sequence is also generated using the two-compartment model and the relative parameters (kinetic parameters and template distributions) are fixed as described in [45]. As for Monte Carlo simulations, the PET scanner and the tracer are the same as in the previous section. The total number of coincidence events is set as × 1 10 6 . The quantitative results are shown in table 5, and it is clear that our algorithm can provide more accurate results both in activity and parametric images.

Real patient data
The real patient data presented in this section is a sequence of a dynamic PET scan offered by a volunteer from a local hospital. The scanner is a Hamamatsu SHE-22000 whole body PET scanner with 32 crystal rings, and can be operated in 2D or 3D modes. The transaxial  resolution of the central FOV is about 3.7 mm. 18 F-FDG is injected as the tracer and a dynamic acquisition sequence starts just after injection. The acquisition sequence consists of 19 frames for 60 min. As for the dictionary and other tunable parameters: these stay the same as in the previous section, since they both use 18 F-FDG as injected tracer. The reconstruction results for both activity and parametric images are presented together in figure 8. A bright ring considered as the myocardium (like that in the Monte Carlo simulated cardiac phantom) can be clearly identified from all these images. However, as there is no ground truth to compare, we can only say that the reconstruction results are visually close to that of the ML-EM method (as in the activity map case) and the DEPICT method (as in the V d case). Furthermore, it is quite clear that the contrast between myocardium and other tissues is more explicit in the joint + TV case.

Tunable parameters
Four parameters need to be manually tuned to achieve a promising estimation. The parameter α, which controls the relative weight of the TV regularization, and the penalty parameter β in the optimizing scheme are preferred to be in the range of [ ] 2 , 2 4 13 as mentioned in [40]. In this paper, we empirically find that the algorithm performs well whem α is set at about 2 8 and β at around 2 7 . To determine the ranges of the other two tunable parameters γ µ , and examine their robustness within the determined range, relative bias, variance, and MSE are calculated based on different values of these parameters. In our experiment, the ranges for γ and μ are pre-determined as [0.5,3.5] and [0.005,0.17] respectively. When one of the parameters is tested, the other is fixed at a suitable value so that relatively promising results can be achieved. In practice, the fixed value for γ is 1.5, and μ is set to 0.01.
As shown in figure 9, the values of relative bias and variance stay in stable and proper regions across the whole testing range for all these parameters, while the relative MSE is more sensitive to these parameters. The lowest point for relative MSE can be reached when γ and μ are selected with the fixed values mentioned before.
Beyond that, the stop criteria for the algorithm are satisfied when it reaches the max iteration time, which is fixed at 300 in the experiments, or the changing ratio for both X and Φ is lower than 10 −3 .

Discussion
In this paper, we have proposed a TV constrained dictionary sparse coding framework that allows us to reconstruct the activity concentration and parametric images simultaneously. Combining TV regularization and biological based regularization (two-compartment model), the reconstruction process can be viewed as an optimization problem solved through an iterative process, and the generated images can be both statistically and biologically meaningful.
In our experiments, the reconstruction algorithm is verified on synthetic Zubal thorax phantom data, Monte Carlo simulated brain and cardiac data, and real patient data. For comparisons, a TV eliminated version of the proposal framework, i.e. reconstruction algorithm only based on DSC, as well as the ML-EM method and DEPICT are also presented. First, we test the robustness to noise on the synthetic Zubal phantom data and the joint + TV method presents a much better image of activity distribution under all noise levels. As for V d , the additional TV regularization makes the whole system less sensitive to noise. Next, in Monte Carlo simulations, we present the results on different counts (brain phantom) to again evaluate the robustness of the algorithm and examine the proper range for tunable parameters (γ, μ). The results demonstrated in the experiments section show that as the counts get lower, TV regularization keeps the error of reconstructed images in a low and stable range compared to other methods. The Monte Carlo simulated cardiac phantom, with simple structure and truly reflecting the imaging physical process of PET, is set as the training data for determining the tunable parameters. As shown in figure 9, the planned γ and μ did not achieve the optimal results for both activity and parametric images. However, by balancing the error on both categories of the results, it is believed that they have reached a relatively suitable value for the whole framework. For the final experiments, in which the data comes from a real patient in a local hospital, we demonstrate the results for a cardiac case; the reconstruction images preserve the structure of both the activity and parametric image well, as compared to the ML-EM method and DEPICT method.