4D image reconstruction for emission tomography

An overview of the theory of 4D image reconstruction for emission tomography is given along with a review of the current state of the art, covering both positron emission tomography and single photon emission computed tomography (SPECT). By viewing 4D image reconstruction as a matter of either linear or non-linear parameter estimation for a set of spatiotemporal functions chosen to approximately represent the radiotracer distribution, the areas of so-called ‘fully 4D’ image reconstruction and ‘direct kinetic parameter estimation’ are unified within a common framework. Many choices of linear and non-linear parameterization of these functions are considered (including the important case where the parameters have direct biological meaning), along with a review of the algorithms which are able to estimate these often non-linear parameters from emission tomography data. The other crucial components to image reconstruction (the objective function, the system model and the raw data format) are also covered, but in less detail due to the relatively straightforward extension from their corresponding components in conventional 3D image reconstruction. The key unifying concept is that maximum likelihood or maximum a posteriori (MAP) estimation of either linear or non-linear model parameters can be achieved in image space after carrying out a conventional expectation maximization (EM) update of the dynamic image series, using a Kullback-Leibler distance metric (comparing the modeled image values with the EM image values), to optimize the desired parameters. For MAP, an image-space penalty for regularization purposes is required. The benefits of 4D and direct reconstruction reported in the literature are reviewed, and furthermore demonstrated with simple simulation examples. It is clear that the future of reconstructing dynamic or functional emission tomography images, which often exhibit high levels of spatially correlated noise, should ideally exploit these 4D approaches.


Introduction
Positron emission tomography (PET) and single photon emission computed tomography (SPECT) provide a powerful array of capabilities for functional and molecular imaging in the human body, through detection of trace concentrations of radioactively-labeled compounds within any chosen volume of interest. Emission tomography reveals functional processes relevant to a range of fields which are of great importance to human health, such as neurology (Herholz et al 2004, Jones andRabiner 2012), oncology (Muehllehner and Karp 2006) and cardiology (Mc Ardle et al 2012), and there is considerable versatility arising from the growing array of radioactively-labeled compounds (called radiotracers) which are in existence and also under development (Schirrmacher et al 2013 (MRI), and with the increasing establishment of simultaneous PET-MR imaging (Judenhofer et al 2007, Schlemmer et al 2008, Delso et al 2011, the powerful capabilities of PET are now becoming available in combination with the anatomical and functional imaging capacities of MR (such as functional MRI (fMRI)).
Both PET and SPECT usually involve intravenous administration of a radiotracer to the patient or research participant being studied, followed by external detection of the high-energy photons which are emitted from the distribution of the radiotracer within the body. The chemical sensitivity of PET and SPECT is very high (nanomolar or even picomolar concentrations of a radiotracer can be imaged and quantified), but there are however sensitivity limitations for detecting the high-energy photons, with often well below 5 percent of the total emitted photons being detected. Consequently, the number of photons detected by the scanners is typically in the range of 10 4 -10 9 counts (depending on the study type and duration), which is low when compared to the count levels encountered in conventional non-dynamic transmission computed tomography (CT) for anatomical imaging.
It is the relatively limited number of photons detected in PET and SPECT, combined with the desire to reconstruct 3D images (with anything in the range of ~64 3 to ~256 3 image values), which means that noise in the 3D images is often, by far, the dominant concern for emission tomography. This has led not only to hardware developments to improve detection sensitivity, but also to developments in the algorithms used to reconstruct these images. It has now been amply demonstrated that the choice of reconstruction algorithm can have a significant impact on image quality (e.g. Comtat et al 1998), and extensive research over more than two decades into emission tomographic 3D image reconstruction methods has resulted in noise reduction and even resolution improvements (Rahmim et al 2013) via statistical iterative image reconstruction techniques (Leahy and Qi 2000). These iterative methods permit more accurate system and statistical modeling of the count-limited emission data to be incorporated into the reconstruction algorithm. Nonetheless, even with these advances in 3D reconstruction, the problem of limited photon counts persists, and in particular is exacerbated when dynamic emission imaging is carried out, as the limited counts are divided among many different time frames (figure 1). In dynamic emission imaging the time course of the 3D radiotracer distribution after injection is sought, so that biologically useful parameters can be estimated from these time courses (such as blood flow, metabolic rates or binding potentials). Conventionally this involves independently reconstructing 3D time-frame images followed by post-reconstruction temporal modeling or kinetic fitting, in order to estimate kinetic and functional parameters within voxels (known as parametric imaging) or within regions of interest (ROIs). The noise in the 3D image frames can lead to unnecessarily high noise levels (and even bias) in the final fitted parameters, and sometimes the kinetic modeling can even fail. Ideally the noise distribution in the reconstructed 3D images, which is spatially-correlated and can also be object dependent, should be accurately accounted for in such post-reconstruction analyses. However, for the frequently-used iterative image reconstruction algorithms these noise distributions are rarely known precisely, being dependent on the number of iterations employed and other reconstruction parameters.
It is in this context that one of the recent surges of research interest in the field of reconstructing images from PET or SPECT data has occurred: the area of fully 4D image reconstruction. These 4D reconstruction approaches, which will be reviewed in this article, seek to reduce the aforementioned problems of limited counts in dynamic emission tomography by incorporating a model of the temporal behavior of the radiotracer directly into the image reconstruction algorithm. These models are usually chosen to provide physiologically-meaningful constraints within image reconstruction, and as a special case the kinetic model which is normally used on a post-reconstruction basis can itself be placed within the reconstruction algorithm. Often these temporal models use far fewer parameters to describe the time-course of the radioactive distribution compared to the number of time frames. For example, 3 coefficients or kinetic parameters can be used to describe a time-activity curve (TAC) instead of 26 distinct time frame samples. This reduction in the number of parameters to estimate, and the corresponding constraints the temporal models impose on the TACs during the image reconstruction process, deliver notable noise reduction in the time series of 3D images obtained. Nonetheless, it is important to note that the conventional independent-frame reconstruction approach is often followed by post-reconstruction fitting of the TACs which also provides noise reduction, but any benefits of TAC constraints during reconstruction are lost and furthermore accurate post-reconstruction modeling of the noise in the TACs can be difficult as previously mentioned. In contrast, if fully 4D reconstruction uses the kinetic model which is normally applied after reconstruction within the reconstruction, it has the advantage of estimating the end-point kinetic parameters directly from the raw data. This means the known noise-model of the raw emission data (which is often modeled extremely well as a space-time dependent Poisson process) is accurately accounted for when estimating the final parameters, sometimes offering variance and even bias reduction compared to post-reconstruction fitting of independently reconstructed images.
It is also worth mentioning at this point that in particular for SPECT imaging, fully 4D reconstruction offers the additional advantage of accounting for the changing activity distribution during a single rotation of the SPECT camera (Gullberg et al 2010). This is an important feature because whenever the activity distribution changes during the course of camera rotation, inconsistent projection data will be obtained (e.g. the projections taken at 0 degrees are not consistent with the projections taken a few moments later at 90 degrees, due to the dynamically changing activity distribution in the field of view (FOV)). Inconsistent projection data can lead to biases in the reconstructed images when the changing activity is not taken into account during the reconstruction. Emission tomographs which collect all projection data simultaneously (as is most typically the case with PET scanners) do not of course have this problem of inconsistent projections, but can nonetheless still benefit from the noise reduction available with 4D reconstruction.

Brief overview of statistical image reconstruction
To lay a foundation for the current review of 4D reconstruction in emission tomography, it is beneficial to briefly review the key concepts of statistical image reconstruction, which apply whether it is 2D, 3D, 4D or even 5D/6D reconstruction (5D and 6D applies when respiratory and/or cardiac gating is used in addition to dynamic 3D imaging, (e.g. Verhaeghe et al 2007)). We are concerned here with statistical image reconstruction, rather than analytic reconstruction methods such as filtered backprojection. Analytic methods, generally speaking, have less accurate models of the imaging system than statistical image reconstruction, but the noise in the reconstructed images can be relatively easily calculated (e.g. Watson 2009). There are five key components to be considered for statistical image reconstruction: i) the parameters to be estimated, ii) the form of the acquired data, iii) the forward model, iv) the objective and v) the algorithm. These five components are implied in the 3D reconstruction review by Lewitt and Matej (2003).
At its core, all image reconstruction is concerned with spatially-localized parameter estimation, so it is important to start with a clear definition of the parameters which are being sought. Once the desired parameters have been chosen, the format of the acquired raw PET or SPECT data needs to be carefully considered, as in general the required parameters do not Figure 1. Noise in dynamic 3D imaging (4D tomography). The raw emission data may be in list-mode format, or more conventionally stored as sinograms or projections for each time frame. A time series of 3D images is reconstructed, and time-activity curves (TACs) can be extracted for a voxel or region of interest (ROI), so that a kinetic model can be fitted to find functional parameters. Nine illustrative time frame images (out of 17, covering 60 min of a [ 18 F]flumazenil PET scan) are shown along with the 17 point TACs. Central sagittal slices of every other frame (i.e. frames 1, 3, 5, …, 17) of the 3D frame reconstructions are shown, demonstrating the elevated noise levels at the short time sampling intervals (the earlier, shorter frames, are noisier). The voxel-level TAC, due to noise, demonstrates an erratic time course of radiotracer concentration which is physiologically implausible. Depending on the study type, there may be up to 30 or more separate time frames, with frame durations ranging from a few seconds up to many minutes (in this figure the frame duration is represented by the width of the grey vertical lines). Kinetic parameters (such as BP) can be fitted to these TACs directly relate in a one-to-one fashion with the measured data. A typical choice for the parameters is the mean radioactive concentration in each voxel in the FOV of the scanner, whereas instead the measurements are a set of sinograms, projections or list-mode data. We therefore need to describe the mapping from the parameters of interest (e.g. the mean radioactive concentration in each voxel) to the space in which the acquired data were acquired. This mapping, which should accurately capture the physics of radioactive emissions as well as the scanner's response, is often broadly referred to as the forward model, or the system matrix, as will be seen further below. A simple example is the discrete 3D x-ray transform (Defrise et al 1995), which takes line integrals through an object represented by voxels to deliver sets of 2D parallel projections (so the discrete x-ray transform maps voxel values to projection data values).
Having clearly defined the parameters which are sought, and how these map to the acquired data format, we then need to decide on the objective function (also known as the cost function). The objective function defines the way in which we do, or do not, require the estimated parameters to agree with the measured data. For example, one might require the forward projection of the voxel values to agree with the measured sinograms by requiring the sum of the square difference between each forward projected value and each measured value to be minimized (a least squares objective). Finally an algorithm is needed which is capable of generating estimates of the desired parameters which are in accordance with the chosen objective function. All of these aspects, summarized in figure 2, will be explored in detail in turn, with a specific focus on the topic of 4D image reconstruction and the current state of the art in the field, for both PET and SPECT imaging. As will be seen however, the crucially different components needed in 4D reconstruction primarily relate to the parameters to be estimated, and the algorithms used to achieve this. Therefore the primary focus will be on those two core components in reconstruction, with less emphasis on the objective, data and system model, as these latter aspects remain relatively unchanged for 4D reconstruction and hence the current reviews of 3D statistical image reconstruction (such as those by Lewitt and Matej (2003), Qi and Leahy (2006) and Reader and Zaidi (2007)), already provide excellent coverage. Note further that there are many 4D data processing methods, which in a strict sense are not really 4D reconstruction methods at all (e.g. preprocessing of sinogram data through use of principal components analysis (Razifar et al 2006), factor analysis and related techniques (Wernick et al 1999), and filtering methods (e.g. Turkheimer et al 2003, McLennan andBrady 2009)) and so are not the focus of this review of 4D reconstruction. Such alternative 4D data processing methods are already reviewed by others in previous reviews of 4D reconstruction (see Tsoumpas et al 2008a, Rahmim et al 2009. Finally, many motion correction strategies also fall under the category of 4D or higher dimensional image reconstruction, but the vastly growing volume of work on motion-corrected and/or gated reconstruction (e.g. Li et al 2006) is beyond the scope of this review. Nonetheless, genuinely 4D reconstruction methods with simultaneous motion estimation (i.e. not ad hoc inter-reconstruction or post-reconstruction motion estimation methods) are now emerging (e.g. Jacobson andFessler 2006, Grotus et al 2009).
One final point to note for this section is the acquired data format. Emission tomography reconstruction algorithms have often required the data to be in sinogram or projection format, whereby each event (a pair of coincidence photons in PET, or a single photon in SPECT) is histogrammed into a projection bin, with each projection data bin approximately associated with a line of response (LOR) through the scanner FOV. However, with increasing information now being measured for each event (e.g. photon detection positions, arrival times and energies), the use of projection data format becomes increasingly sub-optimal. As an example, for dynamic imaging, an entire projection data set is needed for each time frame (a time sampling interval), so that not only is considerable storage required for multiple time frames, but furthermore there is loss of information as events within a certain time sampling interval are grouped together. The same applies spatially: events can be grouped into the same projection bin through use of axial spans (Fahey 2002), leading again to information loss. For this reason list-mode data can be both more efficient and more accurate (see e.g. Parra andBarrett 1998, Reader et al 1998), whereby each event with all its attributes (position, time, energy, and for example the time of flight (TOF) estimate for PET, or for example the gantry position for rotating SPECT systems) is simply recorded sequentially in a file. Both dynamic sinogram data and list-mode data can be used for 4D image reconstruction.

3D case
At the heart of 3D reconstruction is a parameterized model of the spatial distribution of the radioactive concentration in the scanner FOV. A typical example in PET imaging would be the distribution of the radiotracer [ 18 F]FDG in the human brain. One can model, or represent, the 3D radioactive distribution as a function f of position r = [x y z] T in the FOV by summing a limited number of j=1…J spatial functions b j (r): where each function is parameterized by its own unique parameter vector θ j , consisting of p=1…P parameters. In general, each spatial function θ ( ) b r; j j may itself also consist of one or The five core components in image reconstruction, shown in five boxes. The parameters are used to generate a representation of the emitting radioactive distribution in the FOV, which is then forward modeled to deliver a model of what the mean measured data would be for such a distribution. The objective function assesses the agreement of the noisy measured data with this model of the mean data (possibly also assessing agreement with prior knowledge), and an algorithm then updates the parameters in a way which will lead to an improvement in the agreement required by the objective function. Components 1 and 5 (in italics) require significant changes compared to 3D reconstruction in order to achieve fully 4D (and 'direct') image reconstruction. more functions of linear or non-linear parameters. The overall parameter vector θ on the left hand side of equation (1) is therefore a JP-dimensional vector, composed of P-dimensional subvectors θ j , one for each spatial function b j . Image reconstruction then involves estimating P parameters for each one of the J spatial functions in (1). The common practice in 3D emission tomography is to predefine the precise size, shape and position of the spatial functions used in equation (1) before image reconstruction, and we will refer to this fixed collection as a set of j=1…J spatial basis functions ϕ j (r). This leaves image reconstruction with the task of only finding a scaling parameter (i.e. the coefficient) θ j for each of the predefined functions ϕ j (r). In this case, with only one parameter per spatial function, P=1, equation (1) simplifies to An example of equation (2) would be to predefine a set of various volumes or regions of interest, and to use these as the spatial basis functions. However, by far the most widely-used case is to simplify still further, by using predefined functions that are shifted copies of each other: where the dependency of ϕ on j is no longer present, as the uniqueness of each spatial basis function ϕ is now only its particular fixed position in 3D space, specified by r j . In equations (1)-(3) the spatial functions can possibly overlap with one another in space, but the typical choice of ϕ, as illustrated in figure 3, is the pixel (for 2D reconstruction) or the voxel (for 3D reconstruction). For these cases ϕ is a 2D or 3D rectangle function: with positions r j chosen to avoid overlap of any of the J functions. With this choice of spatial basis function the overall representation of the radioactive distribution is in effect captured by the vector of parameters θ, merely a list of numbers which are arranged for display purposes into a 3D array of values to form a grey scale image (see again figure 3). Whilst the pixel or voxel are common, many other spatial basis functions have been proposed, as of course there are infinitely many 3D object representations available. Some examples include spherically symmetric basis functions ('blobs') (Lewitt 1992), or as mentioned, choosing regions of interest (ROIs). If one knows a priori the precise regions and locations in the radioactive distribution which are of interest, then these can be used as spatial basis functions, for direct estimation of the radioactive concentration contained in those regions (Carson 1986). Note that even for that case, just as for 3D voxel or 'blob' functions, just one coefficient parameter θ j is required for each spatial basis function, as the model of equation (2) still applies. Equations (2)-(4) cover the majority of models in the 3D reconstruction literature, and these are all examples of linear parameter estimation, as the parameters being estimated are merely coefficients which scale the amplitudes of predefined basis functions. However, more generally, one should not insist on such a simplification, and instead use equation (1) which allows parameters to be estimated which not only specify the coefficient (amplitude) of the spatial basis functions, but which also specify the shape of the spatial functions. This generalization will help a great deal with a unified understanding of 4D image reconstruction in emission tomography, as in 4D tomography choosing more general temporal functions has great utility.
To illustrate the point for 3D spatial functions, equation (1) could be used to describe the 3D radioactive spatial distribution as a collection of 3D Gaussian functions, each function having P=5 parameters contained in the subvector θ j = [θ j1 θ j2 θ j3 θ j4 θ j5 ] T : Use of equation (5) means we are no longer seeking to estimate just the coefficient θ j1 of the spatial function, but furthermore its position [θ j2 θ j3 θ j4 ] T in the 3D FOV along with its width (proportional to the variance) θ j5 . In static image reconstruction in 2D and 3D, the generality of equation (1), with the example case of equation (5), has remained largely ignored. However, for 4D reconstruction where such a generalization has great utility in the temporal domain, this is far from the case. A final point to note for this section is that any given representation of the distribution of the radiotracer will have approximations compared to the unknown true distribution. Consideration of the issues involved with representations of functions of continuous variables is dealt with rigorously and at length in Barrett and Myers (2004), and an excellent overview of 3D spatial basis functions for emission tomography reconstruction is contained in Lewitt and Matej (2003).

4D case
The aforementioned generalized representations for 3D reconstruction help to clarify the current state of the art in 4D image reconstruction for emission tomography. First, we will extend the general 3D object representation equation (1) to now include the time dimension: Figure 3. Left: 2D object representation using a J × 1 vector of parameters θ in conjunction with a choice of basis function ϕ(r). In the example static case a 2D rectangle function (pixel) is used as the spatial basis, and each pixel has a coefficient θ j to represent the level of radiotracer concentration within that pixel-sized region of the FOV. Right: for the dynamic 2D case a 3D rectangle function is used as an example spatiotemporal basis (pixel with extension into time as a third dimension). In this case a JP × 1 vector θ is needed, which can be organized as either i) being composed of subvectors θ j , with each subvector containing P parameters (in this example P matches the number of time frames T, and θ j can be interpreted directly as a TAC for pixel j) or ii) being composed of subvectors [θ] p , with each subvector being a parametric image (in the example each such subvector is a time-frame image composed of J pixels). The illustration extends naturally to the case of voxels and dynamic 3D imaging. NB: a simplification which is widespread in the literature will be adopted after section 3 in this article: voxels will be used as the spatial basis functions, so that J=V, where V is the number of voxels.
In equation (6) the time course, or the dynamics, of the 3D radioactive concentration in the scanner FOV is now represented through the use of a 4D function f(r,t), which is a superposition of j=1…J spatiotemporal functions, each of which has P parameters stored within a subvector θ j of θ. These functions can in general overlap in space and time. Unlike for the 3D case in the previous section, it is far more common to use more than one parameter (P > 1) for a given function b j . It is convenient, according to the context, to order and access the elements of θ by considering either subvectors θ j or subvectors (see for example figure 3, where these two cases correspond to TACs and image frames respectively). A simplification of equation (6), assumed by the entire current literature on 4D emission tomography reconstruction in PET and SPECT, is to first factorize each spatiotemporal function into a fixed choice of spatial basis function ϕ j (r) in product with a temporal function Ψ(t;θ j ).
With the factorization we have: The spatial basis function ϕ j (r) is now scaled by the temporal function Ψ (instead of the parameter θ j in equation (2)), and hence the parameterization of the spatiotemporal function is now incorporated exclusively into the definition of Ψ(t;θ j ), with ϕ j (r) merely indicating the spatial volume occupied by Ψ. If the spatial basis functions do not overlap, then the temporal function Ψ is in fact the TAC for that given spatial volume. If the spatial basis functions do overlap, then the TAC at a given sub-volume in the FOV can be found by summing over all the overlapping contributions, as will be shown below. With the factorizing simplification of equation (7), all 4D reconstruction methods in the literature proceed to focus exclusively on a specific parameterization of the temporal function Ψ. The parameterization of Ψ(t;θ j ) with P parameters can be written more explicitly as: ; , , ..., The goal will be to estimate these P parameters for each and every temporal function indexed by j, to find the whole JP-dimensional parameter vector θ. With this vector, one can then form the 4D radioactive distribution via equation (7) . For representation in computer memory the 4D continuous model of equation (7) will be discretized and the following vector notation can be used where the vector f is a time-series (t=1…T frames) of 3D images (v=1…V voxels) contained in a VT-dimensional vector f. Each element of f is a sample of the 4D continuous radioactivity concentration modeled by (7), averaged over the time interval represented by time frame index t, and averaged over the sub-volume represented by voxel index v. The spatial functions ϕ may of course not be voxels, and so the JV-dimensional vector ϕ contains all the v=1…V voxelvolume samples of each of the j=1…J spatial basis functions ϕ j (r). The JT-dimensional vector Ψ contains the t=1…T time-frame samples of all the j=1…J temporal functions θ Ψ t ( ; ) j j . With this discretization scheme both f and Ψ can be linear or non-linear functions of θ, but when f is evaluated and sampled for all v=1…V and all t=1…T for a particular parameter vector θ, the time series of images contained in f can then be operated on by a conventional linear forward model (system matrix), irrespective of whether a linear or non-linear parameterization of f was used. For this reason we have deliberately defined equation (9) in terms of vectors rather than matrices, in order to preserve the use of matrices for system modeling in section 5 below. Sampling f prior to applying the system model may be sub-optimal for some choices of ϕ (such as 'blobs'), but this pre-discretization approach is highly practical for implementation of 4D image reconstruction algorithms.
One final point to note is the special simplified case used in the majority of the 4D literature, where non-overlapping pixels or voxels are used for ϕ, such that each voxel-volume sample v corresponds directly to a single basis function ϕ j (r). This means that the indices v and j are equivalent, J and V are equivalent, and ϕ=1 within the voxel volume. Therefore the temporal function Ψ can be directly interpreted as the TAC in a given voxel (where a voxel can be labeled by v or j): With this direct association of a unique TAC modeled by θ j with each unique voxel j, we can obtain a set of P parametric images, with each parametric image given by the subvector [θ] p ={θ jp , j=1…J}. Thus estimation of the JP-dimensional (or equivalently VP-dimensional) vector θ can also be referred to as parametric image reconstruction. For the remainder of this review we will focus our attention on this simplified and widely-used case of employing nonoverlapping voxels.

Linear parameterization
A common choice of parameterization of the temporal function Ψ is the linear case, using the parameters θ to define scale factors. The parametric images are therefore coefficient images, [θ] p , one for each of the p=1…P temporal basis functions. The conventional case of separate time-frame 3D reconstructions corresponds to using temporally-shifted rectangle functions Π(t) as the basis functions: In equation (11) there are the same number of parameters (P) as there are total time frames (T), and the p=1…P rectangle basis functions each cover unique non-overlapping time sampling intervals (see table 1), with start and end times such that the entire time duration of the scan is covered. The start and end times of each rectangle function are chosen according to the expected temporal dynamics of the emission data, and it is common to have up to 20 or 30 such frames to cover 60 min of data.
Less trivial temporal basis functions can be chosen to describe the temporal function, but still using a linear parameterization: This linear case, equation (12) combined with equation (7), actually covers a significant portion of the current 4D image reconstruction literature. The key difference between a large number of the 4D reconstruction methods is in the precise choice of the temporal basis functions B(t) (e.g. from splines (Nichols et al 2002) through to basis functions related to radiotracer kinetics (Wang et al 2008)) along with the number P of these functions to use. Finding the coefficients provides direct estimates of the functional parameters θ 1 = c and θ 2 = K i .

Spectral-analysis basis
Considering table 1 there is an important distinction of two categories of linear temporal basis functions: those that do, and those that do not have a direct link with biological/functional parameters of interest to medical imaging. The cases that do not have such an interpretation can only really be viewed as a means of limiting noise in the temporal data by imposing correlations between the various time points in the data, and after reconstruction there will still often be the need to perform a kinetic analysis to estimate actual biological parameters of interest, such as the distribution volume (DV), binding potential (BP) or the tracer influx rate (K i ). Reviews of these biological and functional parameters of interest are given in Bentourkia (2007) and in Innis et al (2007), and are not the focus of this present review. Note that postreconstruction estimation of kinetic parameters can result in bias issues if the reconstruction algorithm does not permit negative values (Reilhac et al 2008, Verhaeghe andReader 2010).
If the basis functions are just a means of reducing noise, then one could also consider the alternative approach of limiting noise through appropriate definition of the objective function (e.g. using temporal roughness penalties) for the reconstruction task, as will be covered in section 6 below. It is an interesting observation that for 3D reconstruction the use of different objective functions (which include a spatial roughness penalty, (e.g. Hebert and Leahy 1989)) has proven to be a more popular approach to noise reduction than changing the spatial basis functions (e.g. use of blobs, sieves (Snyder and Miller 1985) and even ROIs). In contrast, the current 4D literature reveals the opposite trend: relatively little emphasis on temporal penalties, with instead a large emphasis on changing the temporal basis functions. Conceivably, the best approach would be to combine spatiotemporal penalties in the objective function with an appropriate choice of parameterized spatiotemporal functions, but this remains an interesting area of research which has not really been addressed in the literature thus far.
Returning to the case of basis functions which do have a direct functional / biological interpretation, there are three main choices of temporal basis functions with a linear parameterization, as shown in table 1. These correspond to the cases of Patlak, Logan and spectral analysis, as will be described in detail below. When such functions are used one is implicitly carrying out direct kinetic parameter estimation. This means the parameters found by the reconstruction algorithm directly from the raw emission data will be the end-point biological measures of interest, without the need for any further processing (unlike the case for merely noise-reducing basis functions). Here the word 'direct' means the avoidance of an additional fitting stage after reconstruction (post-reconstruction processing often introduces approximations in modeling the noise in the images), it does not mean the estimation is non-iterative.
An important component used by these functionally meaningful basis functions is an input function, or alternatively a reference function. The input function, which will be denoted by C PL (t) in this review, refers to the concentration of free and intact radiotracer in the arterial plasma, as a function of time. It is of fundamental importance as one normally needs to know both the input C PL (t) to a system (the system being biological tissue in our case) as well as the output from the system (which is the TAC Ψ(t) in our case), if any information about the system itself is to be inferred. The presence of the radiotracer in the arterial plasma provides the input supply for tissue to be able to take up the tracer. The tissue will then either allow the tracer to be readily returned to the blood, bind the tracer (reversible binding) or somehow (e.g. metabolically) trap the tracer in the tissue (irreversible binding). However, measuring the input function is an involved and invasive process, and for this reason reference tissue methods (Lammertsma andHume 1996, Gunn et al 1997) have been devised. In this latter case, the uptake of radiotracer in a tissue of interest is assessed relative to a reference tissue, so as to infer functional / biological parameters of interest in the tissue, provided that the behaviour of the reference tissue is indeed a good reference (e.g. provided that it is known a priori that the reference tissue contains little or no capacity for any form of binding of the tracer, whilst at the same time having the same degree of non-specific binding as the target tissue of interest).

Patlak basis functions.
For the case of modeling irreversible uptake of the radiotracer into a tissue, Patlak devised a linear model (Patlak et al 1983, Gjedde 1995 relating any time activity curve C(t) to the tracer influx constant K i : where the format of equation (13) is designed to allow a conventional linear regression with slope K i and offset c. In effect the TAC can be modeled by our parameterized temporal function Ψ as the sum of just two basis functions (Tsoumpas et al 2008b): where t ss is the time at which the steady state is reached, C PL (t) is the input function, and the two coefficients, θ j1 and θ j2 , of the two basis functions, can be relabeled as c and K i respectively. The radiotracer net influx rate constant K i has particular significance: for example when the PET radiotracer [ 18 F]FDG is used, it is proportional to the glucose metabolic rate. In  (Logan 2000) for reversible uptake of tracers (e.g. radioligands binding to neuroreceptors). However, in its original form the Logan method does not constitute a linear model for a TAC. Nonetheless, linear forms based on the Logan approach have been devised and used as basis functions to permit direct kinetic parameter estimation (Wang and Qi 2013) where the derivative of the input function C PL (t) after the steady state time t ss is negative, and so the right most term in parentheses is a positive basis function, such that θ j1 and θ j2 can be directly interpreted as the distribution volume (DV) and the negative of the intercept c of a Logan plot respectively. The DV has a useful physiological interpretation as being the level to which tissue concentrates the radiotracer relative to the blood at equilibrium (so a DV of 2 means the tracer is twice as concentrated in the tissue as in the blood). This is pertinent for imaging of receptors in the brain, and the model finds use for example in PET, in modeling the binding of [ 11 C]raclopride to available D 2 /D 3 receptors, so that images of DV allow one to infer the levels of endogenous dopamine (which competes with raclopride for binding to D 2 /D 3 receptors). The one example in the literature of direct use of Logan-derived basis functions for direct reconstruction is Rahmim et al (2012), but in that work an integrated form of equation (15) is used.

Spectral analysis basis functions.
A widespread model for the uptake of a radiotracer into tissue for PET and SPECT is that of compartmental modeling (see figure 4). By considering the transfer of the radiotracer between the blood and the tissue, and transfer between bound and unbound states within the tissue, one can devise a set of differential equations describing the time course of radiotracer concentration in the tissue. The solution to these equations turns out to be a convolution of the input function with a sum of exponentials (with the number of exponential functions related to the number of compartments in the model (Gunn et al 2001)). The so-called spectral analysis model (Cunningham and Jones 1993) exploits this solution by using a set of exponential functions which cover all plausible tissue behaviours. The method makes no assumptions of reversible or irreversible uptake, and models the TAC as a large collection of basis functions, each of which is an exponential with a different decay constant convolved with the input function: where ⊗ denotes convolution. From this it is easy to compute K 1 as simply the sum over all the coefficients of the basis functions, with DV found by summing the coefficients with each one divided by its respective decay constant β. If appropriate K i can also be found (see for example Meikle et al (1998)).

Non-linear parameterization
Returning to equation (8), we can of course also consider a non-linear parameterization of the temporal function Ψ, which is very useful for kinetic modeling, as many functional / biological parameters of interest are in fact non-linear. The most commonly encountered kinetic models for the dynamically changing radiotracer distribution at a given location in the scanner FOV are shown in figure 4. All these models are captured by one general expression:

Voxel in reference region, TAC
Voxel in region with specific uptake, TAC Voxel in region with specific uptake, TAC Ψ (t) Just as equation (12) expressed a general form of spatiotemporal basis which covers a significant portion of the 4D reconstruction literature (linear parameterization), so also equation (17) expresses a general form which covers the vast majority of the remaining cases (non-linear parameterization) in the 4D reconstruction literature.
Equation (17) indicates that the overall TAC at a position in the scanner FOV is composed of a spatially-invariant TAC C 1 (t) (which in all known models is either the time-course of radiotracer concentration in the whole blood C WB (t) or else a reference function C RF (t)) added on to a convolution of a second TAC C 2 (t) convolved with a tissue response function h(t). This second TAC, in all known models, is either the input function C PL (t) or the reference function C RF (t). The tissue response function in essentially all currently devised models is either one exponential or a sum of exponential functions. Table 2 summarizes the literature for the important case of equation (17) (with table 3 providing definitions of terms), showing the models used for each spatially localized TAC in the FOV. Figure 5 shows an example TAC from this model.

Joint estimation methods (linear models
). An approach which strictly comes under the category of using basis functions to denoise the reconstruction, but which is inspired by the physiologically known processes of radiotracer uptake, is to jointly estimate both the coefficients for a set of temporal basis functions as well as a discrete function which resembles the input function.
The model is inspired by spectral analysis (see previous section 4.1.3.), by predefining a fixed number of p=1…P exponential basis functions, each with a fixed decay constant β p , where for each exponential basis function we need to find a coefficient image [θ] p . However, in addition, we need to find a discretely sampled 'generating function' (which can be viewed as a surrogate for the unknown input function), which is defined by its values at a fixed number N of time points. Hence we need to estimate P parameters for each voxel, giving a total of JP parameters to find, as well as an additional N parameters (the generating function, just as the input function, is assumed to be the same for all voxels). Therefore, the TAC for a voxel is modeled by: where C GF (t) is the generating function (parameterized by N values contained in the parameter vector θ*), as described in Verhaeghe and Reader (2013), based on the idea of Reader et al (2007). The approach is to alternate between estimation of the coefficient images and estimation of the generating function, by holding the generating function constant and then the coefficients constant during each estimation respectively. This work has led on to improved methods for direct kinetic parameter estimation reliant on joint estimation of a reference TAC as well as kinetic parameters (Wang and Qi 2009b). Related methods, even more ambitious than the above, seek to estimate the true arterial input function which has physiological meaning (Yetik and Qi 2006).

Factorization approaches (linear models).
Other joint-estimation methods, that estimate both coefficient images as well as a set of temporal basis functions directly from the data (i.e. fully data derived, not imposing the constraint of a convolution of a generating function with decaying exponentials), can be considered as matrix factorization methods. Review of non-linear models for the temporal functions Ψ used in 4D image reconstruction for emission tomography. Model name How to obtain kinetic parameters from the θ parameters Example references of application to 4D reconstruction 1 TCM, no blood volume Such factorization has been considered as part of the 4D image reconstruction in Reader et al (2006a) and Sitek et al (2001). Note as mentioned before that there are many factorization approaches in the literature, but the focus here is on methods that have been incorporated into true 4D image reconstruction algorithms.
To describe these methods it is convenient to use the voxel-spatial basis function model given previously by equation (10) in combination with a discretization of equation (12) to obtain: Rate constant for return of tracer from tissue to blood (min -1 ) k 3 Rate constant for transfer of tracer from free to bound compartment (min -1 ) k 4 Rate constant for transfer of tracer bound to free compartment (min -1 ) F v Fractional volume of blood in the tissue K' 1 Rate constant for transfer of tracer from plasma to reference tissue (mL.min -1 . cm -3 ) R 1 Ratio of the delivery in the tissue region of interest compared to that in the reference region (ratio of influx, K 1 /K' 1 sometimes referred to as R I ) BP ND Binding potential -the ratio of the concentration of radioligand in the tissue to that of the reference region at equilibrium C WB (t) Time-variant concentration of tracer in the arterial blood (including plasma and blood cells) (Bq.mL -1 ) Time-variant concentration of tracer in plasma, the "input function" Time-variant concentration of tracer in reference tissue region (Bq.mL -1 ) Figure 5. Example TAC generated from the non-linear model for the case of 3 tissue compartments, showing the components that are added to generate the overall TAC. The components are where B={B pt } is a PT-dimensional vector holding the temporal samples t = 1…T for each of the p = 1…P temporal basis functions (usually P < T). Hence the discrete f(θ) is decomposed into coefficient images [θ] p for each of the p = 1…P temporal basis functions, [B] p. These two components are treated as unknowns so that both θ and B are estimated, usually in an alternating fashion by holding one constant whilst estimating the other, and vice versa.
The differences between the various implementations lie in i) the constraints on the vectors θ and B, ii) the cost function to quantify the approximation of the true object by equation (19) and iii) the algorithm used to find the two components. In Reader et al (2006a) the Kullback-Leibler distance objective is optimized using an EM type algorithm, so that all components of θ and B are inherently non-negative, which in itself also provides noise reduction. In Sitek et al (2001) a quadratic penalty for negative elements in θ and B is added to a least-squares objective that was optimized using a conjugate gradient method. It is important to note that the estimated basis functions are not unique and are not physiologically meaningful. Therefore the decomposition should only be seen as a means to reduce noise in the final spatiotemporal image.

Shape constraints.
A somewhat different approach to using a pre-defined or estimated temporal parameterization is to consider conventional rectangle basis functions but to additionally introduce a shape constraint so that the final TAC is either just decaying, just increasing or else increasing up until a given time point (or interval) where the peak value is reached, after which the TAC is only allowed to decay. This approach was primarily developed for SPECT and has been termed the dSPECT (Bauschke et al 1999, Celler et al 2000, Farncombe et al 2001, Blinder et al 2004 but the approach can be applied to PET imaging as well. Different implementations also differ in cost function and optimization strategy.

Joint estimation and mixed models.
An interesting mixed example of combining linear with non-linear parameterization in SPECT has been presented by Reutter (Reutter et al 2005). This approach combines a B-spline model for some regions (i.e. using the model of equation (12) with splines) with a one-tissue compartment model parameterization (a model like equation (17)) for the others, and furthermore seeks to estimate the input function and the region boundaries from the projections.

System model
Having taken care to define the parameters of interest which represent the 4D spacetime radiotracer distribution in the FOV, we now move on to consider how this parameterized object representation is imaged by a PET or SPECT scanner. The process of modeling the physics of radioactive emissions from the radiotracer distribution, along with how these emissions are detected and processed by the tomograph, constitutes the process of system modeling. This is also known as the forward model in the field of inverse problems, where, in general, one ultimately seeks to obtain some approximate inverse of this model, in order to estimate parameters representative of the object which gave rise to the measured data. Whilst system modeling can be based on the object representation function f of the continuous variables r and t, we will focus on system modeling for the VT-dimensional vector f(θ), which is the space-time discretization of the representation f(r,t;θ) of the 4D radiotracer distribution.
In statistical image reconstruction the system model is required to model the mean or the expectation of the measured emission data for a given object vector f, where, for 4D, f is a time-series of 3D images. For the majority of emission tomographs the space of the measurements, whether noisy or not, can be regarded as an IT-dimensional vector space. The I dimensions typically correspond to the number of distinct sinogram bins (including the timeof-flight bins if appropriate), and T corresponds to the number of time samples or frames. So for dynamic imaging one needs to map the VT-dimensional f (a time series of images) to a space of IT dimensions (a time series of sinograms, for example). As will be seen below, this will be achieved using a very large IT × VT matrix, called the system matrix.
Whilst we have an infinite number of choices for the modeling and parameterization of f (via θ and the spatiotemporal functions chosen) we are in contrast necessarily constrained by the scanner hardware in terms of modeling the mean of the measured data. In effect, there are for most scanners only a finite number of possible event types that can be detected. For example, there are only a finite number of discrete sinogram bins for a given scanner, with each bin being sensitive to a particular weighted volume of the FOV, defined by a sensitivity function. Both list-mode data and sinogram / projection data can be regarded in this same way, as there will nearly always be a finite precision (effectively a sampling interval) for any measured event. We will denote the dynamic measured data for an emission tomograph by a single vector m, of IT dimensions, which can be regarded as a time-series of I-dimensional subvectors, each subvector being a dataset from a static acquisition.
Any given component m it of the total measured data vector m has a time index t, corresponding to a time sampling interval of start time t s , and end time t e , with duration Δt. This could be anything from 1 ms (e.g. the high resolution research tomograph (HRRT) PET scanner (de Jong et al 2007) records list-mode data to within 1 ms time sampling) to 60 min (e.g. a non-dynamic acquisition, collecting a static set of sinograms). However, in the following we will focus on the commonly used case of matched time sampling intervals for both f and m.
Assuming that the emission tomography scanner can be represented by a system matrix A = {a it , vt } IT × VT operating on this vector f, we will model the mean of m by a vector q: where ρ={ρ it } IT is a model or estimate of any background component not accounted for by A (such as randoms, or scatter). Each component of q is given by With the practical pre-discretization of f used here (i.e. using f(θ) rather than f(r,t;θ)) the system matrix A contains elements a it,vt specifying the intersection of each voxel volume v at time t with each potentially time-dependent sensitivity function ξ it (r) associated with each data element i. This models the probability that a radioactive emission from within a given voxel v, during time sampling interval t, gives rise to a measured count in an element it of the IT-dimensional data space for the same time sampling interval t. A common example of ξ it (r) would be a tube or line of response through the FOV, so that a it,vt gives the intersection of a tube or line of response i, at a time t with a voxel v at time t. This permits a practical and fast on-the-fly calculation of the elements of matrix A, avoiding the phenomenal storage requirements, using methods such as the Siddon algorithm to calculate intersection lengths of lines with voxels (Siddon 1985, Zhao andReader 2003). But more advanced models of the scanner geometry (Markiewicz et al 2005), resolution (Panin et al 2006, Kotasidis et al 2011 and even photon scatter (Markiewicz et al 2007) can in principle be included in A. Furthermore, a commonly adopted simplification when implementing (21) for 4D imaging is to use exactly the same system matrix for every time frame, so that a it,vt is obtained simply by a i,v , but note the important need to account for radioactive decay (see equations (22) and (23) below).
However system models should in general account for scanner responses varying with time, which can occur for many reasons, including i) scanner motion (e.g. camera rotation in some SPECT systems), and ii) the total activity in the FOV changing with time, as scanners do not respond in a strictly linear fashion to such changes, due to limitations in scanner instrumentation/electronics. Even if only rarely done to date, incorporating a time-varying A into 4D reconstruction is not conceptually difficult to do. Notice that the system modeling equation (20) presented here for 4D reconstruction is more general than conventional modeling in 3D image reconstruction, as f can be either a linear function of θ (such as a discretization of (12)) or a non-linear function of θ (such as a discretization of (17)) whilst nonetheless preserving a linear forward projection model.
More realistic system models often use the simple line intersection model as just part of the system matrix. One proceeds, stage by stage, to model what happens with the radioactive emissions from the approximate representation f(θ). For PET, the system matrix A is normally factorized into a resolution modeling component (e.g. a convolution matrix H to model positron range ), a geometric detection probability component (e.g. the discrete x-ray transform which performs line integrals, X), and normalization, attenuation and radioactive decay components (diagonal matrices N, L and D respectively), A=LNXHD, giving: The diagonal matrix D, modeling radioactive decay, can also be modified to account for frame length variations in the temporal sampling of f. Finally any background components such as the model of the mean of the time-dependent scatter and random events (for PET) are added on: Of course, the above system model for PET can easily be extended to account for time of flight, primarily by modifying the matrix X to hold time-of-flight kernels rather than lines, to project into different time-of-flight sinogram bins. For SPECT the attenuation cannot be handled by a diagonal matrix, and the discrete x-ray transform is also too approximate: it is necessary to account for depth-dependent sensitivity functions of the collimator holes, and depth-dependent attenuation factors in the system model (Zeng et al 1991).
As mentioned at the beginning of this section, the system model concerns the mean, or the expectation, of the data obtained from the overall emission tomography measurement process. Conceptually, the mean data can be regarded as what would be obtained by repeating precisely the same noisy emission scan a very large number of times and then taking the average of these independent data set realizations. The count limited effects of a single scan, resulting in noisy measured data, are not modeled in the system model, but are dealt with statistically in the objective function, covered in the next section for the 4D reconstruction context.

Objective functions for 4D reconstruction
The objective function for image reconstruction describes the goal for the parameter estimation process: how should the estimated parameters agree, or not, with the actually measured emission data m? The objective functions for 4D image reconstruction remain relatively unchanged from the 3D cases, and are simply extended in the obvious way to include the time dimension. For both 4D PET and SPECT imaging, the probability of obtaining m it counts in data bin i at time index t if on average there are μ it counts for that data bin, is well modeled as a Poisson distribution: In image reconstruction, we are modeling the mean counts μ it in bin i at time index t by q it (θ) as described in the previous section, so we can now find the probability of obtaining measured m it counts in data bin i at time t for a given parameter vector θ: Now, to find the parameters of interest θ, we can define the likelihood of θ given the fact that m it occurred by: and so we need to find a θ which maximizes this likelihood. However, we have of course many more measurements than just one m it for one data bin, so we can define the likelihood of a given parameter vector θ given the entire measured data vector m by: where it is assumed that the integer number of counts in each element of the measured data vector m are all independent of one another, hence the product of the individual probabilities. The objective of the reconstruction is therefore to maximize λ with respect to the parameter vector θ, where any given θ defines a 4D radiotracer distribution f and therefore also a mean data vector q by the system model of the previous equation (20), q(θ)= Af(θ)+ρ. It is much simpler for optimization purposes to consider the natural logarithm of equation (27), the Poisson loglikelihood, and to ignore the constant term which is independent of the parameters: which will attain its maximum at the same value of θ as for (27) since the logarithm is a monotonically increasing function. It is very useful to note at this point that the Poisson loglikelihood (28) is related to a distance measure known as the Kullback-Leibler (KL) distance (Barrett and Myers 2004): Therefore maximizing the Poisson log-likelihood (28) is equivalent to minimizing the KL distance (29): As will be seen later, the KL distance is an extremely important measure of agreement between two vectors, as in fact both Poisson maximum likelihood and penalized likelihood estimation of linear or non-linear choices of parameters θ from tomographic emission data can legitimately be achieved in image space provided this distance measure is used. The key, as will be shown, is the use of an EM tomographic image update f (EM) (in place of the noisy measured data m in (28)) for fitting the modeled f(θ), using the KL distance as the objective. We will come back to this in the next section.
A commonly reported problem with seeking an estimate of θ which maximizes (28) (i.e. the maximum likelihood estimate of θ), is that with a noisy measured vector m, the estimate of θ is also noisy. It should first be noted that this frequently reported problem presupposes a choice of space and time functions and a parameterization θ of the radiotracer distribution such that noisy solutions are even permitted. This does not have to be the case, and in fact one can select spatiotemporal functions which implicitly limit the representations of the radiotracer distribution f(r,t) to a subspace which permits only smooth reconstructions (analogous to the method of sieves for the purely spatial case, (Snyder and Miller 1985)). If for example one chooses regions or volumes of interest as the spatial basis functions, with a positivity constraint on θ, then the reconstructed image can only consist of these uniform regions, no matter how noisy the data are, such that the famous 'night sky' images cannot even be obtained. Likewise, if temporally extensive temporal basis functions are chosen with a positivity constraint on the coefficients of these functions, it is impossible to obtain visually noisy TACs.
Nonetheless, a popular approach to noise reduction has been to keep a choice of spatiotemporal functions which does permit very noisy reconstructions (representations of f(r,t)), and instead to place restrictions on the parameter vector θ by regularizing the objective function. Therefore instead of seeking just to maximize the likelihood (the ML estimate of θ) one can use a maximum a posteriori (MAP) objective. In this case one includes the presumed probability of any given θ: where Z is a normalizing constant which can be ignored since we will subsequently only be concerned with maximization with respect to θ, and β is the regularization parameter, controlling the level of importance of the prior (β=0 implies no prior probability, all θ being equally probable). U is any function which gives a large value for parameter vectors θ which are deemed improbable, and low values for vectors θ which one presumes to be more probable, based on prior knowledge of what should be expected from a PET or SPECT scan. A common choice is to define U as an energy function, as will be described below. Using this probability in product with the likelihood gives what is known as the posterior probability of the parameters θ, given m: The log of the posterior probability is therefore given by Equation (33) is known variously as the Poisson log-posterior or the penalized Poisson loglikelihood, and from the previous discussion can also be seen as the negative of a penalized KL distance (to within a constant). There are countless possibilities for the choice of U, primarily viewed as an energy function, and the reader is referred to the 3D image reconstruction literature for consideration of example choices for 3D spatial regularization (see for example the review of Qi and Leahy (2006)). Temporal penalties have also been proposed, such as penalizing differences between the reconstructed TAC and its best compartmental model fit curve (Kadrmas and Gullberg 2001), as well as Gibbs priors (Lee et al 2005, Gravier et al 2006. In fact for direct parametric image reconstruction some interesting possibilities arise Qi 2012, Wang and: one can regularize according to an energy function defined by the activity images f(θ) and/or regularize according to an energy function defined by the parametric images [θ] p = {θ jp, j=1…J} for p=1…P, as will now be explained.
First, the basic form of the energy term used for 4D parametric reconstruction, when using voxels and regularizing by the activity, is given by Kamasak et al (2005), Wang and Qi (2012), Rakvongthai et al (2013): where the total energy is given by the weighted (w jl ) sum over the neighbourhood N j of every voxel j=1…J, summing the square difference between the activity at voxel j, and the activities at voxels l within the neighbourhood of voxel j. More generally one can use a function T to transform the parameter vector θ j for a given voxel j into one of the kinetic parameters (for example, mapping θ j to obtain k 4 , as shown in table 2 for the 2 TCM model). This allows regularization of the kinetic parametric images as well With equations (34) and (35) it is important to realize that the relative weighting of the energy term will matter: one might need to assign more weight to parametric images which are known to suffer from more noise, such as the microparameters k 2 and k 4 . Also, as mentioned, both parametric and activity based regularization are simultaneously possible. With regard to the choice of weighting factors w jl , all the work on 3D regularization can be applied, such as in particular the use of MR images as anatomical priors which can guide the choice of these weighting factors, as proposed by Bowsher for 3D reconstruction (Bowsher et al 1996). Tang  Despite the attraction of MAP estimation, the ML objective still remains popular primarily because many implementations never seek an actual ML estimate, but rather an incompletely converged reconstruction which often has a more desirable signal to noise ratio than the ML estimate.

Algorithms for 4D image reconstruction
An iterative image reconstruction algorithm seeks to improve a given parameter vector estimate θ (k) such that when the estimate is used to generate a 4D image f (k) and subsequently mapped by A to give a model of the mean data q (k) , q (k) agrees in a better way with the measured data m. For Poisson data this better agreement is specified by obtaining a new estimate θ (k+1) which reduces the KL distance between m an q (k) , as was seen in the previous section. Now, since the parameters being estimated are often non-linear for 4D reconstruction, the algorithms require a more comprehensive coverage for the 4D case, as most 3D reconstruction algorithms for emission tomography are limited to a completely linear relationship between θ and q.

ML and MAP estimation for linear parameters
We nonetheless start with the case of linear parameter estimation, as this corresponds to using any of the temporal basis functions previously given in table 1, representing a significant portion of the 4D literature. For these cases, conventional 3D iterative image reconstruction algorithms can be readily adapted, just by modifying the linear forward model to account for the choice of spatiotemporal basis functions. One of the most popular 3D iterative image reconstruction algorithms in PET and SPECT is the expectation maximization (EM) algorithm (Dempster et al 1977, Shepp andVardi 1982) used to find ML estimates of θ. Its popularity can be understood due to the extreme simplicity of its implementation and its robust performance, often delivering visually appealing images if not run to the point of convergence (see for example Angelis et al (2011a)). For 3D reconstruction, with the parameter vector θ corresponding directly to 3D image voxel values, f(θ) = θ, the ML-EM algorithm is given by: where vector division and multiplication is taken to be component-wise. The model of the mean accounts for a model or estimate of the mean scatter and random events, contained in ρ: Practical implementation of (36) is relatively straightforward. A current 3D image estimate θ (k) is forward projected by A to give q (k) , the ratio of the measured data m to q (k) is backprojected by A T to give an image which is used, after first dividing by the sensitivity image A T 1 (a backprojection of unit data), to multiplicatively update θ (k) . This sequence of steps provides a new estimate θ (k+1) .
For the 4D case with linear parameters to estimate (such as all the cases shown in table 1), equation (36) can be adapted in an obvious way: Hence the main difference for 4D reconstruction with linear parameters compared to 3D reconstruction is the use of a matrix B = {b vt,vp } VT × VP containing the time-sampled temporal basis functions for every voxel volume. The size of matrix B is considerable, but it is nearly always simplified by assuming the temporal basis functions to be identical for every voxel, such that b vt,vp = b t,p . The measured data vector in equation (38) now contains elements for every time point ( = m m { } it ), and hence the forward model is required to generate timedependent estimates of q. This 4D MLEM approach has been widely used, starting in the late 1990s with (Matthews et al 1997), and continuing with many others more than a decade later (Tsoumpas et al 2008b, Hong andReader 2008). All the cases in the literature shown in table 1 can be used with equation (38). However, the method can be computationally demanding: if the temporal basis functions cover all time frames, then the entire dynamic data is implicated in each and every image update, and furthermore convergence can be extremely slow for nonorthogonal basis functions, such as the two Patlak temporal basis functions.
In contrast, the nested EM method, proposed by Wang and Qi (2010) and based essentially on the original work of Carson and Lange (1985), is more computationally straightforward to implement. It consists of two far less demanding stages which separate the tomographic part of the problem from the image-space based part of the problem. First, the current estimate of the parameters θ (k) is used to form the current 4D image according to: Then a provisional EM update for each and every 3D image frame is obtained: where it is important to note that B does not appear in (41), and the comparison to the regular 3D ML-EM update equation (36) should be clear. Equation (41) accounts for the tomographic system model A and the measured data m to deliver a provisional 4D image update f EM . Finally, a simple image-space update can be carried out to obtain a new estimate of the parameters: Equation (42) can be viewed as an image-space EM algorithm: the noisy data are now given by f EM (where the f EM image accounts for the tomographic data and the system model), and the 'model of the mean' now only needs to be concerned about the image-space model, without concern about the tomographic component. This greatly aids implementation.
Not only do equations (41) and (42) separate the image model from the tomographic model, they are also much more memory efficient and practical to implement than the fully 4D version of MLEM (equation (38)). Only a time series of images, rather than a full timeseries of projection data, needs to be handled in memory for finding the coefficients of the temporal basis functions, and remarkably the results are equivalent to the more demanding equation (38), under the modest requirement that the system matrix A be time-invariant. Proof of this equivalence can be seen by substituting (41) into (42): which requires A to be the same for all time frames in order to equal (38). The image-space update equation (42) can be applied more than once, in order to provide acceleration if desired. Not only can the EM algorithm be easily adapted to a 4D version, but also least-squares methods are just as readily adapted, such as for example the simultaneous algebraic reconstruction technique (SART) (Andersen and Kak 1984). As for EM, one simply uses a 4D system model rather than a 3D one: where λ is an optional relaxation parameter (λ=1 in the original version). SART can offer potential benefits as it does not have the non-negativity constraint of the EM algorithm, a constraint which can be viewed both as beneficial (reducing noise in low count regions) and detrimental (introducing bias in low count regions). Nonetheless, due to the popularity of EM, Of course, in addition to the above examples, other 3D reconstruction algorithms have also been modified for 4D in a similar fashion, including the preconditioned conjugate gradient algorithm (PCG) (Wang et al 2008, Rakvongthai et al 2013, for accelerated convergence. In fact the nested EM algorithm itself has also been further accelerated through use of conjugate gradients (Wang and Qi 2010).
It is interesting to note at this point that, in the same spirit as the nested EM algorithm, MAP estimation has also been previously proposed as an image-space operation, providing parameter updates expressed in terms of the EM update image f EM . A popular and simple example of this, though not theoretically robust, is the one step late (OSL) EM MAP algorithm (Green 1990): where Uʹ is the partial derivative of the energy function evaluated at the current estimate of the parameters (hence OSL, with a well-known choice being the so-called median root prior (Alenius and Ruotsalainen 1997)), and s is the sensitivity image defined by a backprojection of unit data, A T 1. This OSL method has itself been applied to fully 4D reconstruction (Tsoumpas et al 2013).
Many other MAP estimation methods implicitly or explicitly use the EM update image to achieve updates of the parameters via image space. Examples include the very early method of Levitan and Herman (1987), generalized EM (Hebert and Leahy 1989) and the more robust approach of DePierro (1995). All these approaches use image-space updates which require s and f EM but without need of the tomographic data m, and hence one can increasingly appreciate the great utility of the EM update image.

Non-linear parameters
Since non-linear parameter estimation in 3D emission tomography image reconstruction has been largely overlooked, there have been no 3D reconstruction algorithms ready and waiting for simple adaptation to 4D non-linear parameter estimation. Specialized algorithms have needed to be developed. Given the necessity of these developments for 4D reconstruction, there is now in fact scope for transfer of methodology in the opposite direction: 4D non-linear estimation methods (to be described below) can serve to increase the generality of 3D image reconstruction algorithms for emission tomography. In fact this has already started for the example of resolution modeling with conventional linear parameter estimation, where exploitation of a 4D method has allowed advances to be made in a nested 3D method (Angelis et al 2013).
In the following coverage one key methodology proposed for PET will be focused on, as in principle it can be applied to any of the TAC models previously described, whether linear or non-linear, and also applies to temporal, spatial or spatiotemporal parameters. The method is essentially a nested EM approach, but with an image-space update that accommodates nonlinear parameters. The principles of the method date back to 1985, to an algorithm proposed by Carson and Lange named 'EMPIRA' (the EM parametric image reconstruction algorithm, (Carson and Lange 1985)). However, no concrete results were shown in that work, nor in the slightly earlier pioneering work for compartmental kinetic parameter estimation from dynamic time of flight PET data (Snyder 1984). An implementation of the original 1984 work of Snyder was explored more recently (Schottlander et al 2006), but unfortunately without use of any tomographic system model and hence again no concrete results. Returning to EMPIRA, others, e.g. Chiao et al (1994), did recognize the potential of the EMPIRA approach, even using the method to estimate ROI boundaries. The principles of the 1985 EMPIRA method have, 25 years later, been independently proposed afresh, two years apart, by two key research groups in this area: Matthews et al (2010) and Wang and Qi (2012), the latter group referring to their method as OTEM (optimization transfer EM). Curiously a similar situation occurred with the widespread EM algorithm for 2D and 3D reconstruction, which was proposed independently by two groups (Shepp and Vardi 1982) and (Lange and Carson 1984), also two years apart! Whilst the first appearance of the method about to be presented with demonstrated concrete results was in 2010, an identical (but not statistically-derived) nested EM image-space fitting approach was nonetheless proposed even earlier in 2006 (Reader et al 2006b).
The method proceeds in two key stages, with the two stages forming a cycle which is iterated.
Stage 1: An EM-type update of the images from the raw measured data. Starting with a given estimate of the parameters θ (k) , which in the current context are taken as kinetic parameters (but of course, this is not a requirement), the corresponding time series of 3D images f(θ (k) ) (e.g. a TAC for every voxel) is created from those parameters. The PET or SPECT forward model A is then applied to generate the estimate of the mean measured data, Af(θ (k) ), for use in a conventional EM-type update: Equation (46) is as simple to implement as the well-established MLEM algorithm applied individually to each 3D image frame to generate an intermediate dynamic (4D) image update.
In the original EMPIRA proposal of Lange and Carson in 1985, this update step is in effect obtained by combining equations (8) and (10) in that article (where also a time dependent system matrix A is permitted).
Stage 2: Image space fitting of the parameters. Stage two involves updating the parameters, θ (k+1) , by finding parameter values which fit the EM update image series obtained from equation (46). Just as the regular MLEM algorithm seeks a maximum of the Poisson log-likelihood, equivalent to minimizing the KL distance, so also a sensitivity-image weighted KL distance is now used in the image space fitting. We will keep this as a maximization problem, given by: where the maximization considers all VT samples in the discrete 3D image time series. The link with the previously described KL distance in (28) and (29) is clear, but with the substitution of the EM-type dynamic updated image f EM in place of the measured data and the non-linear modeled f in place of the linearly modeled q. Equation (47) corresponds to equation (11) in Carson and Lange's 1985 article, and it does not necessarily have to be maximized, so long as an increase is however achieved. The link with the nested EM algorithm for linear parameter estimation is clear: equation (41) corresponds to (46), and the image-space EM update (42) increases the objective function of (47). Having carried out stage 2, stage 1 is then repeated using the new parameter estimates. Note that due to the use of non-linear parameters, convergence to a local maximum only is assured. Figure 6 illustrates the 2 stage process.
These two stages, which are essentially identical in three independent publications from 1985, 2010 and 2012, constitute a powerful framework for estimation of non-linear parameters in emission tomography, whether 2D, 3D, 4D or of even higher dimensions. Having identified the common framework, it is worth mentioning again some useful distinctions in the independent proposals: i) the Lange and Carson (EMPIRA) 1985 derivation explicitly accommodates a time-dependent system matrix, ii) the Wang and Qi 2012 derivation (OTEM) explicitly includes a MAP (rather than just ML) objective function, and iii) the Matthews et al 2010 proposal suggests an approximate least squares approach to solving (47) as will be described below.
Finally, if regularization is required, then this can be incorporated in the image-space fitting via a modification as follows: which is effectively a non-linear generalization of the linear-modeled generalized EM for MAP estimation (Hebert and Leahy 1989).

Approaches to implementation
We can now review various implementations for stage 2, as this constitutes the more challenging step, with stage 1 involving nothing more complicated than a series of conventional 3D EM image updates.

Linear model.
Reverting back briefly to the use of linear θ parameters: in this case the objective of equation (47) can of course be maximized through iterative use of an EM-type algorithm, with its linear forward model. This approach actually corresponds to the nested EM algorithm. One can proceed with as many or as few image-space EM updates as desired, recalling that equation (47) does not necessarily have to be maximized, before updating again from the raw measured tomographic emission data (equation (46)). So the task specified by equation (47) is achieved by: Figure 6. Illustration of one iteration of the two stage process for fully 4D or direct parametric image reconstruction in emission tomography, based on the methods of Carson and Lange (1985), Matthews et al (2010) and Wang and Qi (2012). In stage two, the maximizing of the negative KL distance should be weighted by the time-dependent sensitivity images, but this is not necessary for TAC fitting at the voxel level with a time-invariant system matrix.
where the general model f(θ) has been replaced by an explicitly linear model, matrix B operating on the parameter vector θ.

Non-linear models.
For non-linear parameters θ, which is the primary concern here for kinetic parameter estimation, a number of options have now been proposed and practically implemented in the literature for the stage 2 optimization. The first approach is to use a weighted least squares approximation (Matthews et al 2010). This is motivated by examining the form of the objective function (47) -it corresponds to minimizing the KL distance measure which also naturally arises from the Poisson log-likelihood, as was shown in equations (27) to (29). Given that a Poisson distribution for a sufficiently large mean value can be well approximated by a Gaussian (with matching mean and variance) then one could, as an alternative to the KL distance, use a Gaussian log-likelihood. This corresponds to a least squares distance, weighted by the modeled mean. However, the modeled mean depends on the very estimate being sought, so instead the estimate from the previous iteration is used for the weights in the weighted least squares (WLS) objective (hence it can be referred to as 'one step late'): where, as usual, vector multiplication (including squaring) and division is understood to be component-wise. The benefit of this approach is that existing non-linear weighted least squares algorithms can be used to solve (50), as an approximate alternative to solving (47). For example the generalized linear least squares (GLLS) method has been successfully applied (Angelis et al 2014, Kotasidis et al 2012. An alternative approximation would be to use f (k) EM instead of f(θ (k) ) for the weights. In contrast, Wang and Qi tackle the stage 2 non-linear optimization of equation (47) directly, using a modified Levenberg Marquardt algorithm (Marquardt 1963, Wang andQi 2012) and other earlier work, picking up the proposal of EMPIRA, also used the method of Marquardt (Chiao et al 1994).
Of course, many other algorithm options exist for the problem of maximizing equation (47), with for example a Newton-Raphson approach as originally suggested for this problem by Carson andLange in 1985 (Carson andLange 1985), and very recent work on 4D reconstruction has proposed using an EM-inspired preconditioned gradient ascent approach (Rahmim et al 2014).
It is interesting to note that prior to their 2012 OTEM work, Wang and Qi had previously proposed another algorithm (Wang and Qi 2009a), which used paraboloidal surrogate functions for the penalized log-likelihood. Since their paraboloidal surrogates method was surpassed by their OTEM method (which is identical in principle to the method previously proposed by Matthews et al and covered above), we do not consider it further here. An excellent review of OTEM compared to the paraboloidal surrogate method is given in Wang and Qi (2012), where it is also compared with the one step late approximation shown in equation (50). A further review is available in their review article (Wang and Qi 2013).
We will now move on to consider a derivation of the above two stage approach to non-linear parameter estimation for PET and SPECT, showing how the non-linear image-space update arises naturally from a so-called complete data framework, which will be covered in the next section. Before doing so, we briefly note here that there are other recent examples of algorithms which also handle the non-linear parameter estimation problem through a complete data description, also using EM. Specifically, the very first and original work of Snyder (Snyder 1984), the work of Yan et al (2012) and more recently that of Su et al (2013). One particular choice of complete data for the EM framework, and how it results in the image-space objective function given by (47), will be presented next.

Derivation of the EM approach for non-linear parameters
The aforementioned algorithms rely heavily on the EM framework. Given the scarcity of derivations of the EM algorithm in the literature for both PET and SPECT, it is beneficial for this review to provide not only a derivation of the EM framework for the 4D case, but also for the case of non-linear parameters. The derivation given here is in the original spirit of the EM algorithm, which has its origins in an intuitive approach for problems in areas such as genetics, and which was subsequently shown to have theoretically appealing properties (Dempster et al 1977, Laird 2010. In essence one first designs a set of data which would make the parameter estimation problem very simple indeed. These data are referred to as complete data, and there are often many ways in which these data can be formulated, according even to the creativity of the researcher. Good examples in the field of 4D image reconstruction for the case of kinetic parameter estimation can be seen in the work of Schottlander et al (2006) (who followed the approach of Snyder (1984)) and Yan et al (2012) (who devised a novel formulation of complete data). The key point is to formulate a complete data set that makes the problem very simple to solve. The EM approach then alternates between two key steps: expectation, and then maximization. The expectation step involves generating a complete dataset (formally the conditional expectation of the complete data), based on a current estimate of the parameters (θ (k) ) and based on the actual measured data (m). The maximization step involves using this generated complete data set (which, remember, has been deliberately designed to make the optimization very simple), to find a new estimate of the parameters which optimizes the desired objective (e.g. which maximizes the likelihood). With the new estimate of the parameters, the expectation step can then be applied again to create new complete data, the objective function maximized again, and so on. Note that this EM framework is a special case of a more general approach known as optimization transfer. In optimization transfer one seeks to optimize a simpler (surrogate) objective function, which when optimized yields a parameter estimate which is closer to maximizing (or minimizing) the actually desired objective function. In this context, the EM method uses the conditional expectation of the complete data Poisson log-likelihood as a surrogate function for the measured data Poisson log-likelihood, as will be seen below.
The reason for needing a method like EM is that a closed form solution for finding the maximum of the Poisson log-likelihood (for the case of estimating many parameters θ from many measurements m) is not possible for any realistic choice of the system matrix A. So, how can we make it easier to find the maximum of the Poisson log-likelihood, when our parameters to estimate are the mean number of emissions from each voxel for each time frame, and all we have is the number of counts in each sinogram bin for each time frame? Well it turns out that the maximum likelihood estimate can be found in one very simple step if we were to be in possession of a Poisson-distributed complete data set, = z z { } ivt , which indicates exactly how many counts (z) detected in sinogram bin i came from voxel v at time index t. If we had a dataset like that, then it is easy to show that maximizing the Poisson log-likelihood is achieved by just summing the appropriate counts for each voxel, and then normalizing for the number of contributions using the sensitivity image.
Imagine for now that we do have access to such a complete data set (we will deal with the fact that don't have this data later on). First it is useful to relate the designed complete data to the actually measured data m it (which can now be called 'incomplete data'). Since z ivt is like having counts in a single sinogram bin i with individual labels indicating which voxel v they came from, we can relate them back to m it just by summing over all the voxels which contributed to sinogram bin i at time frame t: It then follows that the mean of m it , which we model by equation (21) (here we will ignore ρ), is and so the model of the mean of a complete data element z ivt is given by: Now, we will re-write the Poisson posterior probability (previously given in equation (32) for the incomplete data m it ) using the complete data: The log-posterior for the complete data is therefore (ignoring the constant terms which will not affect the subsequent maximization with respect to θ): At this point we note that in the majority of the literature a linear model is assumed for f vt (θ), and equation (55) is therefore relatively easy to maximize for a given element of θ by setting the partial derivative equal to zero and solving. When β = 0 the solution is very simple (which was the whole point of using complete data), but for non-zero β either a quadratic needs to be solved (e.g. Levitan andHerman 1987, Depierro 1995) or else a previous estimate of θ needs to be used in U (e.g. the one step late method (Green 1990)). However, before going further, we recall that we don't actually have the desired complete data z. This is where the EM method comes in. In the expectation step we generate an estimate of the complete data, based on the measured data m and a current estimate of the parameters θ (k) . In essence, by claiming to know already the parameters θ, and having the data m, it is straightforward to find what the corresponding elements of z would have to be. Specifically, we can come up intuitively with the following ratio (see the explanation in figure 7), which requires that the complete-data fractionation of the incomplete data be equal for both the noisy components (left hand side of the below equation) and the modeled mean components (right hand side of the below equation): As figure 7 demonstrates, by knowing the decomposition of the mean of each element q it of the incomplete data, we can apply exactly the same fractionation to each element m it of the noisy incomplete data to obtain an estimate of z ivt : Alternatively, equation (57) can be seen as taking the noisy number of counts m it for a sinogram bin i at time t, and dividing these counts amongst all the possible voxels which could have contributed to m it . These intuitive perspectives on forming an estimated complete dataset are entirely consistent with the intuitive origins of EM. However, formally, (57) is the conditional expectation of the complete data, given θ (k) and m, and it should be noted that (57), being a conditional expectation, is unlikely to deliver an integer value for a given z ivt . Now we have all the components for EM. The conditional expectation of the complete data for the current parameter estimate θ (k) (equation (57), (E-step)) as well as a forward model that can model the mean of the complete data (equation (53)) for any choice of θ. Hence, parameters θ for the forward model are sought so as to maximise the objective function of equation (55) (M-step), which is relatively easy to do with complete data.
By substituting the conditional expectation of the complete data (57) into our complete-data objective function (55) we obtain: It is helpful to note, in terms of the terminology used in the literature, that equation (58) can be considered either as i) the conditional expectation of the complete data log-likelihood (logposterior probability for β > 0), or ii) the complete data log-likelihood (posterior probability) using the conditional expectation of the complete data given θ (k) and m. Noting that the second Figure 7. Graphical explanation of how to obtain expected complete data z (for the case of just 3 voxels), given an example measured number of counts m for a bin i, and under the assumption that the estimate of the parameters, θ (k) , is correct. The complete data decomposes the measured incomplete data into its constituent parts, indicating which voxel v gave rise to which counts in bin i. By examining the fractionation of the model mean counts, q it , one can apply the very same fractionation to the actual measured counts to obtain an expected, usually non-integer, z.
term in the brackets on the right hand side of (58) is independent of θ (and so is a constant that will not affect the value of θ which maximizes (58) Equation (59) where equation (57) has been used to obtain the right hand side of (60). Given the centrality of (60) to much of the 3D image reconstruction literature, and hence its widespread adoption in both PET and SPECT image reconstruction, we will specially label it as the EM update image: where (EM) has been placed in superscript to allow subscript indices to be shown, but the image is the same as the vector f EM k ( ) . Equation (61) provides a time series of 3D EM image updates, and the regular time-independent 3D EM update is found for the special case of discarding the time frame index t. For most cases in the literature the image f vt (θ) is given directly by the contents of θ, but importantly in this formulation f vt (θ) could be that simple or indeed a far more involved function of non-linear parameters θ. Hence we can now rewrite equation (59) as: This indicates that we can obtain an improved parameter estimate θ (k+1) by maximizing (62) which is none other than the previous equation (48), where we have identified the time-dependent sensitivity images as: To within a constant, the function being maximized in equation (63) is the conditional expectation of the dynamic complete data Poisson log likelihood (or log posterior if β > 0). It can also be viewed as a surrogate function in optimization transfer terminology-optimizing (63) is simpler than directly optimizing the incomplete data version of the Poisson log posterior probability. Furthermore, the function being maximized in (63) is recognizable, by comparison with the previous equations (28) and (29), as none other than the negative of the KL distance between the modeled image f vt (θ) and the regular tomographic EM update image f (EM) vt . This is an extremely important result, that has in effect been shown a number of times by many researchers in the literature already, dating back to Carson in 1985. As such, equation (63) is a very general and powerful formulation for 4D / direct image reconstruction in PET and SPECT imaging, unifying much of the EM literature in both 3D and 4D PET, as demonstrated in table 4.
The following important observations about the generalized equation (63) (with (61) or (46)) and links to the reconstruction literature need to be made: 1) Both ML and MAP estimation are covered by equations (63) and (61), for a linear or non-linear parameterization of the time-series of images f(θ), accommodating also a time-variant sensitivity image. Carson's EMPIRA accommodates time-dependent sensitivity images (Carson and Lange 1985), which can be required for SPECT. 2) When β = 0 and when f is the most simple linear case of θ , then f EM is the one step maximizer θ + k ( 1) of (63), meaning that just regular MLEM is carried out for each time frame, as would be expected. When β > 0, the generalized EM MAP formulations developed in the 1980s and 1990s are obtained (see Qi and Leahy (2006)).

3) When β = 0 and a general linear model of f(θ) is used, such as
, then the EM algorithm can iteratively be applied to progressively and monotonically decrease the KL distance of (63). This is the nested EM method for 4D (Wang and Qi 2010) or 3D (Angelis et al 2013). 4) Equation (63) is completely free from any tomographic aspect: just a time series of sensitivity images and a time series of regular 3D EM updates are needed. This has considerable practical significance. The system model, the huge time-series of Poisson distributed acquired data, along with the estimates of scatter and randoms, do not even appear in equation (63), since they are all handled by the regular EM update of (46). This conveniently isolates any non-linear aspects, and any regularization (MAP) requirement, into just an image-space fitting process requiring minimization of a KL distance.

Model-specific algorithms
The aforementioned approach based on an EM framework is a relatively straightforward and adaptable method in the literature for non-linear parameter estimation in emission tomography. Nonetheless, other methods have been proposed (notably Kamasak et al (2005), Yan et al (2012), and more recently Su et al (2013)), but with the caveat that they have been presented in model specific form-for specific choices of the non-linear parameters θ. The method of Yan et al relies on a 1 tissue compartment model, and the method of Kamasak et al is presented for a 2 tissue compartment model and uses a coordinate descent method, although the method should be equally applicable to any non-linear model. Excellent reviews of these methods are already given in (Wang and Qi 2013).

Early methods developed for SPECT
In terms of practical implementations for estimating non-linear parameters directly from SPECT projection data, pioneering work was in fact carried out in the 1990s (Zeng et al 1995,  Summary of methods in the literature, from 3D to 4D, which all use the framework presented by equation (63)  Generalized EM (β > 0) (Hebert and Leahy 1989) Maximization of (63) not necessary, but just an increase by coordinate ascent OSL-EM (Green 1990) Maximize (63) (63) OTEM (Wang and Qi 2012) Modified Levenberg Marquardt proposed for maximization of (63) Huesman et al 1998, Reutter et al 1998. In these early approaches, the one tissue compartmental model is used to describe the activity concentrations. The model of the projection data therefore depends on the kinetic parameters ( F v , K 1 and k 2 ) for every j-th basis function. To reduce the dimensionality only a limited number of spatial basis functions, typically 4, were used (effectively corresponding to ROIs of different tissue regions). For example 13 parameters (4 regions × 3 kinetic parameters, and a background scaling factor) were estimated from the projection data in Zeng et al (1995), Huesman et al (1998), Reutter et al (1998. The weighted sum of square differences between modeled and measured data is then minimized using standard optimization strategies (e.g. the Newton-Raphson method was used in Reutter et al (1998)). However these approaches have not been applied to large scale problems (e.g. voxel-level / parametric methods) and represent rather specially developed cases for SPECT. A model with two exponentials was used by Limber et al (1995), for direct parametric reconstruction into a very limited number of pixels.

Simple simulation examples
In this section a simple simulation example is presented to illustrate the benefits of the 4D reconstruction method reviewed in the previous section. The simplified imaging system (figure 8) consists of just 5 projection data bins (I=5) and there are 26 time frames (T=26). The spatial basis functions chosen are 4 pixels (J=V=4), with 26 time frames of varying duration to cover 3450 s. For conventional reconstruction (with a post-reconstruction estimation of kinetic parameters) this corresponds to a TAC defined by 26 parameters estimated for each pixel (so P=26). For direct reconstruction this will be reduced, as explained below, to finding just 3 parameters (P=3) to define the TAC at each pixel. To simulate idealized data, two different system matrices were considered for relating the 4 pixels to the 5 data bins, a sparse matrix A S and a denser matrix A D : Note these I × J system matrices are used for each time frame of data, so the system is assumed to not change with time. With the more densely populated matrix A D there will be far more significant spatial noise correlations in the reconstructed image, which is potentially a key benefit of direct reconstruction methods as these correlations are usually ignored in post-reconstruction kinetic analysis. Four ground truth TACs were generated, one for each pixel, based on the SRTM, with choices of (R 1 , k 2 , BP) given by: (0.87, 0.27, 3.72), (0.90, 0.35, 2.52), (0.84, 0.17, 1.8) and (0.80, 0.3, 4.02), respectively. The reference region TAC was based on one obtained from a real [ 11 C]raclopride scan, and the first set of SRTM parameters (0.87, 0.27, 3.72) was likewise based on a fit to the striatal region of a real [ 11 C]raclopride scan. Four main simulations were then performed, corresponding to the use of the two different matrices A S and A D , and the use of two noise levels (low noise data, so that the reconstructed pixel TACs resemble ROI TACs from a real full size dynamic 3D reconstruction, and high noise data, so that the reconstructed TACs resemble single pixel TACs from a full size reconstruction, figure 9). For each simulation the ground truth TACs were forward projected by the system matrix, and Poisson noise was introduced. A total of 500 realizations of noisy data were considered for each case. The data were then reconstructed into 4 pixels of 26 time frames using i) a conventional approach (considering up to a total of 160 iterations of MLEM, followed by post-reconstruction kinetic fitting to find the binding potential, BP, for each pixel) and using ii) the direct reconstruction approach (equations (46) and (47) given in the previous section). The kinetic fitting for both the post-reconstruction approach as well as the direct approach used a 1 gigabyte dictionary of TACs, generated for a range of sample R 1 , k 2 and BP values. A given pixel TAC f j ={f jt , t=1…T} was fitted by considering each possible modeled TAC g from the dictionary, simply evaluating the objective function for each g. So the fitted values for pixel j are found according to the maximum value of this objective: However, whilst this dictionary approach is fine for single post-reconstruction kinetic fitting, it was not found suitable for the direct reconstruction method, as very small changes in the pixel TACs result in identical dictionary fits, meaning that the direct algorithm can no longer effectively update the TACs and the kinetic parameter estimates. For this reason a line search in each of the three kinetic parameters was also performed, within the range of the dictionary precision, so as to permit subtle changes in the fit parameters. Explicitly then, the direct reconstruction proceeded as follows: 1. Uniform image initialization (all TACs set to 1 for all pixels) 2. One regular EM-type update of the TAC values (independent frame EM updates) based on the measured data (equation (61)) 3. Fitting of the TACs based on the objective function in equation (66) (equation (63) with β = 0, and time-invariant sensitivity) 4. Using the fitted TACs as the new TAC values for the 4 pixels 5. Looping back to step 2 Figure 10 shows the root mean square error in the BP estimates for all pixels, for the four main simulations. As might be expected, the direct reconstruction method offers increasing levels of error reduction as noise in the data increases (where it is important to note that for a fixed count level, use of a denser system matrix increases noise in the reconstructed images, and so direct reconstruction offers benefits for this case also). An interesting point to note is that for low noise (and low spatial noise correlations) the direct method essentially matches the conventional method, when using the same objective function for the kinetic fitting.

Performance benefits of 4D: brief summary of results from the literature
We now consider the benefits of 4D reconstruction compared to conventional 3D methodology as reported in a variety of implementations from the literature. The literature is diverse in terms of the simulation studies performed to assess these methods, in terms of different levels of statistical quality of the emission data (number of counts in the list-mode or projection data), the particular radiotracer and distribution considered, the voxel/pixel/ROI sizes, the particular parameters estimated and the algorithm used. Despite this diversity and all these differences, most research articles do nonetheless have the common goal of comparing fully 4D or direct kinetic parameter estimation with the conventional method of independent-frame reconstruction followed by post-reconstruction kinetic modeling. In this section we present typically reported levels of reduction in error (root mean square error (RMSE), mean absolute error (MAE)) or the typical levels of reduction in standard deviation (SD) for a given level of  (66)) for the two noise levels considered and for the two system matrices considered. Note that the TACs in the pixels are corrected for frame length and calibrated to give Bq/mL, but the raw counts in the five projection bins are not corrected for frame length. Counts bias compared to the case of conventional methodology. It is of course important to note that these different metrics reflect different performance parameters (RMSE and MAE encompass both noise and bias, whereas SD indicates only noise). Therefore the performance differences between 3D and 4D reconstruction presented in this section are not necessarily directly comparable between publications, but do nonetheless allow relative claims of improvement to be compared. Figure 11 summarizes the reported benefits for the PET literature. The results include linear parameter estimation (Tsoumpas et al 2008b, Wang et al 2008, Matthews et al 2010, Tang et al 2010, Rahmim et al 2012 and non-linear parameter estimation (Kamasak et al 2005, Wang and Qi 2009b, Wang and Qi 2012, Yan et al 2012, Su et al 2013. The work by Matthews et al (2010) considers simulated [ 11 C]diprenorphine brain data and reports on RMSE reduction for the macro-parameters K 1 and DV estimated with direct reconstruction using a spectral analysis model. The performance improvement of direct reconstruction compared to conventional indirect reconstruction is compared after 400 iterations in the figure. (Kamasak et al 2005) reports on direct parametric reconstruction using a full two-tissue compartmental model. Simulated [ 11 C]raclopride brain data is presented where the direct reconstruction provides a reduction in normalized RMSE in the parameters K 1 -k 4 as well as in BP and DV. The errors in BP and DV estimated from the full two-tissue compartmental model were even lower compared to using the linear Logan graphical model simplification applied after reconstruction. (Wang and Qi 2009b) considers the simplified reference model and SD versus bias trade off curves are shown for simulated brain data with the tracer [ 11 C]DASB, so figure 11 reports a typical SD reduction at fixed bias. In (Wang and Qi 2012) the spatial background standard deviation versus ROI mean value is shown for the DV ratio (DVR) using the SRTM for real brain data with the tracer [ 11 C]SCH23390.  In Yan et al (2012) the one-tissue compartmental model is used for direct 4D parametric reconstruction of simulated data with kinetics corresponding to the tracer [ 11 C]P943, and the averaged reduction in the coefficient of variation (COV) for K 1 , k 2 and DV is reported in figure 11. The greater COV reduction was found for the lowest count levels, as would be expected. Note also that their reported noise reductions were in some cases accompanied by a simultaneous bias reduction. Rahmim et al (2012) report COV reduction for simulated brain data using the Logan linear model for [ 11 C]raclopride. Results of direct parametric reconstruction of the K i macro-parameter using the linear Patlak model and simulated brain data for [ 18 F]FDG can be found in Tsoumpas et al (2008b) (averaged reduction in RMSE is shown in figure 11), Tang et al (2010) (reduction in normalized standard deviation is shown) and Wang et al (2008) (reduction in standard deviation is shown in figure 11). The reduction in MAE for post-reconstruction Patlak graphical analysis after 4D reconstruction using spectral analysis basis functions for simulated brain [ 18 F]FDG data can be found in Verhaeghe et al (2009). Finally figure 11 also includes results for heart imaging with [ 13 N]NH 3 (Su et al 2013) and also results from simulated [ 11 C]raclopride data for the HRRT brain PET scanner .
In summary, these broad-ranging evaluations report error reductions in the range of 20% to 80%, in all cases showing at least some benefit arising from 4D or direct reconstruction. The benefits are of course largely dependent on the noise level and noise correlations in the reconstructed data under consideration in each case, as in fact benefits can become negligible in the presence of high count data (as indicated in the simple simulations in the previous section) or for large ROIs. To complement the summary of improvements shown in figure 11, figure 12 shows example images of the visual improvement in image quality obtained with 4D reconstruction compared to conventional post-reconstruction modeling. Finally, figure 13 demonstrates the progressive advantages of improved modeling for parametric K i images: from regular 3D OSEM, to inclusion of resolution modeling, to introducing data-derived temporal basis functions (Reader et al 2006a) and then finally with the use of data-derived spectral-analysis basis functions proposed in Reader et al (2007).

Conclusions and future directions
Fully 4D and direct parametric image reconstruction methods can reduce the impact of noisy PET or SPECT emission data on end-point parameter estimates. The techniques presented in this review emphasize the direct use of the raw PET or SPECT data, for which the primary advantage is that the noise model is well known to be Poisson, contrary to the case of post-reconstruction parameter estimation methods which often approximate the unknown noise distribution of the images being fitted. The direct methods not only use more accurate noise modeling, but also implicitly account for spatial noise correlations and impose the temporal modeling constraints throughout the iterative reconstruction procedure. Furthermore, the methods are not limited to kinetic parameter estimation, but in fact virtually any postreconstruction parameter estimation task can be converted to a direct parameter estimation task, thereby correctly modeling the noise in the raw data. Conceptually, direct linear or nonlinear parameter estimation is not difficult to implement, thanks to the two stage process now proposed by multiple researchers in the field. This nested methodology conveniently separates the linear tomographic problem from the potentially non-linear image-space modeling problem. Provided the KL distance measure is used as the objective function for the imagespace modeling (i.e. for the fitting of the parameterized model to a current 3D EM image update), then the parameter estimation can be appropriately performed with full account for the Poisson-distributed emission data. Of course though, this parameter fitting does need to be carried out for each and every EM update, rather than just once in the case of post-reconstruction fitting.
To handle list-mode data, rather than sinogram or projection data, the nested method presented in this review would need to use a list-mode EM algorithm for the 3D image frames update in stage 1 of the method (for example the early work on list-mode EM by (Parra and Barrett 1998) and (Reader et al 1998)) rather than a projection-data based EM algorithm (the example shown in figure 6 of this review). The future will see not only direct use of raw PET Figure 11. Summary of reported relative benefits of 4D or direct reconstruction compared to conventional independent-frame reconstruction with post-fitting of the images. Results from 12 papers in terms of reduction in standard deviation (for a given bias) or the reduction in root mean square error (or broadly comparable error measure) are shown, including linear and non-linear parameter estimation methods. Figure 12. Example parametric images for FDG PET imaging from the literature using simulated data. Left: parametric images for the Patlak intercept, c, when using conventional 3D OSEM reconstruction with 8 iterations (16 subsets) and then post-fitting and when using data-derived spectral analysis basis functions for 4D OSEM reconstruction with 40 iterations and then post-fitting. Right: direct 4D reconstruction using Patlak basis functions for FDG (Tsoumpas et al 2008b), shown for the same number of OSEM iterations (480 iterations, 12 subsets) for both 3D and 4D. and SPECT data, whether in projection data format or list-mode format, but probably also direct use of raw simultaneously-acquired MR data. This will allow dual-modality tracers and contrast agents to be used in a single scanning session, combining both data streams in algorithms to estimate directly the parameters of interest. The reconstruction will therefore be able to incorporate information on subject anatomy and motion, along with cardiac and respiratory motion, generating 5D or 6D images. There is even scope for estimating the motion from the acquired data, including motion fields amongst the parameters to be estimated during image reconstruction. Also, with the noise reduction achieved by direct methods, it is conceivable that these methods can aid signal separation for more ambitious multi-radiotracer PET and/ or SPECT studies, which when further combined with MR contrast agents and even multiparametric MR imaging (e.g. Ma et al (2013)) could provide interesting potential for near simultaneous imaging of a useful range of physiological and anatomical parameters.
However some possible concerns with direct parameter estimation methods for PET and SPECT have been identified. Firstly, if a model is inappropriate then errors can occur which exceed those of a post-reconstruction estimation approach, as in 4D and direct reconstruction the errors propagate to other regions in the image throughout the iterative reconstruction procedure. Nonetheless, there is promising work indicating this can be accounted for through use of residual modeling terms . A further concern might be the assessment of the fit quality of a given kinetic model: often in post-reconstruction kinetic parameter estimation one has estimates of the fit quality, but direct methods do not implicitly deliver this information. The work of Kamasak et al (2014) has started to address this issue, essentially exploiting a backprojection of the difference between the measured sinograms and the kinetically modeled sinograms. Given these possible concerns, the significance of the end point impact on actual studies will also be necessary to evaluate in order to see if the benefits Figure 13. Progressive visual improvement in post-fitted parametric image quality (for the tracer influx K i ) arising from improved spatial and then improved temporal modeling for real FDG PET data. HRRT list-mode data from a 60 min [ 18 F]FDG (184 MBq) study of a probable Alzheimer's patient was reconstructed and then post-fitted for K i at the voxel level for four methods, using a population input function. From left to right: conventional independent frames 3D OSEM, 3D OSEM with resolution modeling (point spread function (PSF) modeling), 4D OSEM with basis functions estimated from the data, and finally 4D OSEM using a data-derived surrogate input function with spectral analysis basis functions. The core components of the dynamic EM reconstruction algorithm used for the HRRT list-mode data (with scatter and randoms corrections, attenuation and normalization, resolution modeling kernel and subsets) are detailed in previous work (Reader et al 2006a). The last two columns correspond to using the two cases previously presented in the penultimate row of table 1.

3D OSEM 3D OSEM+PSF
4D OSEM+PSF (data-derived temporal basis functions) (Reader et al 2006a) 4D OSEM+PSF (data-derived spectral analysis temporal basis funct (Reader ions) et al 2007) outweigh any possible drawbacks. Recent work has indicated that the benefits under certain conditions (e.g. [ 11 C]raclopride imaging ) are relatively modest compared to other study types. Yet even if the end point benefits of these 4D and direct estimation methods are not always substantial (there is clear variation in the level of benefit reported in the previous figure 11), in general provided the modeling is accurate, they can only serve to help, since they represent the theoretically correct way of performing many tasks which are currently carried out on a post-reconstruction basis. They can even apply, for example, to such apparently straightforward tasks as post-reconstruction image registration .