Artifact reduction in short-scan CBCT by use of optimization-based reconstruction

Zheng Zhang; Xiao Han; Erik Pearson; Charles Pelizzari; Emil Y Sidky; Xiaochuan Pan

doi:10.1088/0031-9155/61/9/3387

1. Introduction

Cone-beam computed tomography (CBCT) has been widely used for guidance, monitoring, and assessment tasks in interventional (Wiesent et al 2000, Lauritsch et al 2006, Orth et al 2008, Wallace et al 2008), surgical (Grass et al 1999, Hott et al 2004, Siewerdsen et al 2005, De Vos et al 2009), and therapeutic applications (Jaffray et al 2002, Oldham et al 2005, Smitsmans et al 2005, Grills et al 2008). Due to workflow and radiation-dose considerations, it is often desirable to scan a patient with the shortest possible rotation scan that provides sufficient data for reconstruction. With the commonly used Feldkamp–Davis–Kress (FDK) algorithm (or its variants) (Feldkamp et al 1984, Johnson et al 1998), this is ${{180}^{{}^\circ}}+$ the-full-fan-angle, which is typically referred to as the short-scan angular range (Parker 1982). Although the FDK algorithm can provide images of high practical utility for a wide class of applications, it has also been recognized, however, to yield reconstructions with noticeable artifacts especially in the off-middle planes for short-scan CBCT with data insufficiency and inconsistency (Dennerlein et al 2008, Maaß et al 2010).

Optimization-based reconstruction (Delaney et al 1998, Elbakri and Fessler 2002, Sidky and Pan 2008, Tang et al 2009, Pan et al 2009, Bian et al 2010, Stsepankou et al 2012, Nien and Fessler 2014, Wang et al 2014) remains the focus of current research on CBCT algorithms. Evidence seems to suggest that appropriately designed optimization-based algorithms may be less susceptible to data inconsistency and insufficiency than analytic algorithms such as the FDK algorithm, thus possibly yielding reconstructions with reduced artifacts (Parker 1982, Dennerlein et al 2008). In this work, we carry out a study on optimization-based reconstructions from short-scan CBCT data with an emphasis on demonstrating whether different designs of optimization programs of potential practical interest impact the reduction of artifacts observed in the FDK reconstruction.

Based upon the CBCT-imaging model for analytic algorithms, it can be shown that the data function in a short-scan CBCT possesses 'redundancy' within the middle plane, i.e. the plane containing the source trajectory. A weighting function such as the Parker weighting function (Parker 1982) is used widely for normalizing the data-function redundancy so that the FDK algorithm can, when applied to the weighted data-function, yield accurate images within the middle plane. It is well-known that different weighting functions may lead to FDK reconstructions with different degrees of artifacts especially away from the middle plane (Grass et al 2001, Tang et al 2005, Mori et al 2006). Therefore, we also investigate possible impact of different weighting functions, especially those that normalize the data-function redundancy, on optimization-based reconstruction in short-scan CBCT.

This paper starts with an introduction in section 1, followed by the description of optimization reconstruction in section 2, and of study design in section 3. Results of real-data studies and final remarks are presented in sections 4 and 5, respectively.

2. Methods

The imaging model used in the optimization-based reconstruction considered is given by

$\begin{eqnarray}&&\mathbf{g}=\mathcal{H}\mathbf{f},\end{eqnarray} \tag{ 1 }$

where vector $\mathbf{g}$ of size M denotes discrete model-data, vector $\mathbf{f}$ of size N the discrete image of N voxels, and $\mathcal{H}$ the system matrix of size $M\times N$ . In this work, an entry in $\mathbf{g}$ or $\mathbf{f}$ represents the data or image value within a detector bin or image voxel; and an element in system matrix $\mathcal{H}$ represents an intersection of an x-ray detected with a voxel in the image array. The imaging model in equation (1), linking a discrete-image vector to a discrete-data vector, is referred to as a discrete-to-discrete (D–D) model. We denote experimentally measured data in practical CBCT imaging as vector ${{\mathbf{g}}_{m}}$ of size M in which an entry represents a particular data sample collected. In contrast, the imaging model upon which analytic algorithms such as the FDK algorithm are developed is referred to a continuous-to-continuous (C–C) model because it maps an object function of continuous variables to a data function of continuous variables.

2.1. Design consideration of optimization programs

In optimization-based reconstruction, an image is obtained by solving the optimization program that specifies the image. In this work, we consider optimization programs of form

$\begin{eqnarray}&&{{\mathbf{f}}^{\ast}}=\underset{\mathbf{f}}{{\text{argmin}}}\,\Phi\left(\mathcal{W}{{\mathbf{g}}_{m}},\mathcal{W}\mathcal{H}\mathbf{f}\,\right)\,\,\,\,\,\,\text{s}\text{.t}\text{.}\,\,\,\,\,\,\Psi\left(\mathbf{f}\,\right)\leqslant \alpha \,\,\,\,\,\,\text{and}\,\,\,\,\,\,{{f}_{j}}\geqslant 0,\end{eqnarray} \tag{ 2 }$

where $\Phi\left(\mathcal{W}{{\mathbf{g}}_{m}},\mathcal{W}\mathcal{H}\mathbf{f}\right)$ , referred to data fidelity in the work, denotes a measure of inconsistency between measured data ${{\mathbf{g}}_{m}}$ and model data $\mathbf{g}$ , weighted often by a matrix $\mathcal{W}$ of size $M\times M$ , $\Psi\left(\mathbf{f}\,\right)$ an image constraint function, α constraint parameter, and f_j the jth entry of vector $\mathbf{f}$ , j = 1, 2,..., N.

2.1.1. Data fidelities.

Several data fidelities and image constraints of potential practical interest are considered in the work. Specifically, the data fidelities are given by

(a)
data- ${{\ell}_{1}}$ fidelity
$\begin{eqnarray}&&{{D}_{{{\ell}_{1}}}}\left(\mathbf{f}\right)=\,\parallel \mathcal{W}{{\mathbf{g}}_{m}}-\mathcal{W}\mathcal{H}\mathbf{f}{{\parallel}_{1}},\end{eqnarray} \tag{ 3 }$
where $\parallel \centerdot {{\parallel}_{1}}$ denotes ${{\ell}_{1}}$ -norm;
(b)
data- ${{\ell}_{2}}$ fidelity
$\begin{eqnarray}&&{{D}_{{{\ell}_{2}}}}\left(\mathbf{f}\,\right)=\,\parallel \mathcal{W}{{\mathbf{g}}_{m}}-\mathcal{W}\mathcal{H}\mathbf{f}\parallel _{2}^{2},\end{eqnarray} \tag{ 4 }$
where $\parallel \centerdot \parallel _{2}^{2}$ denotes the square of ${{\ell}_{2}}$ -norm; and
(c)
data-Kullback–Leibler (KL) fidelity
$\begin{eqnarray}&&{{D}_{\text{KL}}}\left(\mathbf{f}\right)=\underset{i=1}{\overset{M}{\sum}}\,{{\left[-\mathcal{W}\mathcal{H}\mathbf{f}+\mathcal{W}{{\mathbf{g}}_{m}}+\mathcal{W}{{\mathbf{g}}_{m}}\text{ln}\left(\mathcal{W}{{\mathbf{g}}_{m}}\right)-\mathcal{W}{{\mathbf{g}}_{m}}\text{ln}\,\left(\mathcal{W}\mathcal{H}\mathbf{f}\,\right)\right]}_{i}},\end{eqnarray} \tag{ 5 }$
where i is the index of the ith entry of data vector ${{\mathbf{g}}_{m}}$ , and the entries in vector $\left(\mathcal{W}\mathcal{H}\mathbf{f}\right)$ that are smaller than ${{10}^{-20}}$ are replaced with 10⁻²⁰. In contrast to practical FDK reconstruction in which measured data ${{\mathbf{g}}_{m}}$ is pre-conditioned by a diagonal weighting matrix of size $M\times M$ , a weighting matrix in an optimization-based reconstruction is multiplied with both measured data ${{\mathbf{g}}_{m}}$ and model data $\mathbf{g}$ , thus modulating inconsistency between ${{\mathbf{g}}_{m}}$ and $\mathbf{g}$ , instead of ${{\mathbf{g}}_{m}}$ alone.

2.1.2. Image constraints.

The image constraints considered are given by

(a)
image- ${{\ell}_{1}}$ constraint
$\begin{eqnarray}&&\parallel \mathbf{f}{{\parallel}_{{{\ell}_{1}}}}=\underset{j=1}{\overset{N}{\sum}}\,|\,{{f}_{j}}|;\end{eqnarray} \tag{ 6 }$
(b)
image- ${{\ell}_{2}}$ constraint
$\begin{eqnarray}&&\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}=\underset{j=1}{\overset{N}{\sum}}\,f_{j}^{2};\end{eqnarray} \tag{ 7 }$
and
(c)
image-TV constraint
$\begin{eqnarray}&&\parallel \mathbf{f}{{\parallel}_{TV}}\,=\,\parallel \,|\nabla \mathbf{f}|{{\,}_{\text{MAG}}}\,{{\parallel}_{1}},\end{eqnarray} \tag{ 8 }$
where $|\nabla \mathbf{f}{{|}_{\text{MAG}}}$ denotes the magnitude image of the spatial gradient of $\mathbf{f}$ defined above Eq. (A.7) in the Appendix.

2.2. Optimization program with image-TV constraint

Using the data fidelities and image constraints defined above to replace, respectively, $\Phi$ and $\Psi$ in equation (2), we form and study a number of optimization programs in the work, and, for discussion convenience, dubbed with respective symbols summarized in table 1. Image-constraint parameter α corresponding to image constraints $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{1}}}}$ , $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}$ , or $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$ is denoted, respectively, as ${{\ell}_{1}}$ , ${{\ell}_{2}}$ , or t₁, as shown in the last column of table 1. For the purpose of clarity and brevity, without loss of generality, we give below explicit forms of optimization programs with the image-TV constraint, shown in row 5 of table 1. In section 4.3 we also present results of two additional optimization programs using image- ${{\ell}_{1}}$ and image- ${{\ell}_{2}}$ constraints, respectively.

Table 1. Symbols of optimization programs formed by use of data fidelities in equations (3)–(5) and image constraints in equations (6)–(8) to replace $\Phi$ and $\Psi$ in equation (2).

$\Psi$ $\Phi$	${{D}_{{{\ell}_{1}}}}\left(\mathbf{f}\right)$	${{D}_{{{\ell}_{2}}}}\left(\mathbf{f}\right)$	${{D}_{\text{KL}}}\left(\mathbf{f}\right)$	α
N/A	${{D}_{{{\ell}_{1}}}}$	${{D}_{{{\ell}_{2}}}}$	D_KL	N/A
$\parallel \mathbf{f}{{\parallel}_{{{\ell}_{1}}}}$	${{D}_{{{\ell}_{1}}}}$ - $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{1}}}}$	${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{1}}}}$	D_KL- $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{1}}}}$	l₁
$\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}$	${{D}_{{{\ell}_{1}}}}$ - $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}$	${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}$	D_KL- $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}$	l₂
$\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$	${{D}_{{{\ell}_{1}}}}$ - $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$	${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$	D_KL- $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$	t₁

Note: For example, using ${{D}_{\text{KL}}}\left(\mathbf{f}\right)$ and $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$ to replace $\Phi$ and $\Psi$ in equation (2) yields program D_KL- $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$ , as shown in equation (11) below.

2.2.1. Program ${{D}_{{{\ell}_{1}}}}$ - $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$ : image-TV-constrained data- ${{\ell}_{1}}$ minimization.

$\begin{eqnarray}&&{{\mathbf{f}}^{\ast}}=\underset{\mathbf{f}}{{\text{argmin}}}\,{{D}_{{{\ell}_{1}}}}\left(\mathbf{f}\,\right)\,\,\,\,\,\,\text{s}\text{.t}\text{.}\,\,\,\,\,\,\parallel \mathbf{f}{{\parallel}_{\text{TV}}}\leqslant {{t}_{1}}\,\,\,\,\,\,\text{and}\,\,\,\,\,\,{{f}_{j}}\geqslant 0,\end{eqnarray} \tag{ 9 }$

where ${{t}_{1}}\geqslant 0$ denotes a pre-selected image-constraint parameter.

2.2.2. Program ${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$ : image-TV-constrained data- ${{\ell}_{2}}$ minimization.

$\begin{eqnarray}&&{{\mathbf{f}}^{\ast}}=\underset{\mathbf{f}}{{\text{argmin}}}\,{{D}_{{{\ell}_{2}}}}\left(\,\mathbf{f}\,\right)\,\,\,\,\,\,\text{s}\text{.t}\text{.}\,\,\,\,\,\,\parallel \mathbf{f}{{\parallel}_{\text{TV}}}\leqslant {{t}_{1}}\,\,\,\,\,\,\text{and}\,\,\,\,\,\,{{f}_{j}}\geqslant 0.\end{eqnarray} \tag{ 10 }$

2.2.3. Program D_KL- $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$ : image-TV-constrained data-KL minimization.

$\begin{eqnarray}&&{{\mathbf{f}}^{\ast}}=\underset{\mathbf{f}}{{\text{argmin}}}\,{{D}_{\text{KL}}}\left(\mathbf{f}\,\right)\,\,\,\,\,\,\text{s}\text{.t}\text{.}\,\,\,\,\,\,\parallel \mathbf{f}{{\parallel}_{\text{TV}}}\leqslant {{t}_{1}}\,\,\,\,\,\,\text{and}\,\,\,\,\,\,{{f}_{j}}\geqslant 0.\end{eqnarray} \tag{ 11 }$

2.2.4. Program parameters.

Even if the mathematical form of an optimization program is given, parameters are needed also for completely specifying the program. As mentioned above, system matrix $\mathcal{H}$ needs to be determined, and, in this work, each of its elements is selected as the intersection length of an x-ray with an image voxel. The selection of weighting matrix $\mathcal{W}$ , which modulates data inconsistency, is discussed below in section 2.3. Additionally, the programs in equations (9)–(11) also depend upon voxel size of the image array and image-TV parameter t₁. When image- ${{\ell}_{1}}$ or image- ${{\ell}_{2}}$ , instead of image-TV, is used as an image constraint, the optimization program depends upon constraint parameter l₁ or l₂ (see row 3 and 4 in table 1). Different selections of parameters specify generally different solutions (i.e. different reconstructions). We discuss in section 3.2 below the determination of the program parameter in real-data studies.

2.3. Weighting matrix

In short-scan CBCT, let u_m and v_m denote detector sizes in directions tangential to the circular trajectory and perpendicular to the middle plane; and u and v are continuous variables, depicting the Cartesian coordinates of a point on the detector plane, satisfying $|u|\leqslant {{u}_{m}}$ and $|v|\leqslant {{v}_{m}}$ . At continuous projection angle ϕ, the weighting function normalizing data-function redundancy within the middle plane satisfies

$\begin{eqnarray}&&w\left(u,v,\phi \right)+w\left(-u,v,\phi +\pi +2\gamma \right)=1,\end{eqnarray} \tag{ 12 }$

where $\gamma =\text{atan}\left(\frac{u}{L}\right)$ , and L the distance from the source to the detector surface. Even though the weighting function is designed for the middle plane, it is used also for normalizing approximate data-function redundancy within the off-middle planes (i.e. $v\ne 0$ ) in circular CBCT. An additional condition is considered often on the weighting-function smoothness (Parker 1982) for FDK reconstruction:

$\begin{eqnarray}&&\left\{\begin{array}{*{35}{l}} \frac{\partial w\left(u,v,\phi \right)}{\partial \phi}{{|}_{\phi =2{{\gamma}_{m}}-2\gamma}}=\frac{\partial w\left(u,v,\phi \right)}{\partial \phi}{{|}_{\phi =\pi -2\gamma}}=0 \\ {} \\ \frac{\partial w\left(u,v,\phi \right)}{\partial u}{{|}_{\phi =2{{\gamma}_{m}}-2\gamma}}=\frac{\partial w\left(u,v,\phi \right)}{\partial u}{{|}_{\phi =\pi -2\gamma}}=0, \end{array}\right. \end{eqnarray} \tag{ 13 }$

where ${{\gamma}_{m}}=\text{atan}\left(\frac{|{{u}_{m}}|}{L}\right)$ . It has been shown that the smoothness condition is critical for minimizing sampling artifacts in FDK reconstruction from discrete data. The Parker weighting function (Parker 1982)

$\begin{eqnarray}\left\{\begin{array}{*{35}{l}} w\left(u,v,\phi \right)=\text{si}{{\text{n}}^{2}}\left(\frac{\pi}{4}\frac{\phi}{{{\gamma}_{m}}-\gamma}\right) & 0\leqslant \phi <2{{\gamma}_{m}}-2\gamma \\ w\left(u,v,\phi \right)=1 & 2{{\gamma}_{m}}-2\gamma \leqslant \phi <\pi -2\gamma \\ w\left(u,v,\phi \right)=\text{si}{{\text{n}}^{2}}\left(\frac{\pi}{4}\frac{\pi +2{{\gamma}_{m}}-\phi}{{{\gamma}_{m}}+\gamma}\right) & \pi -2\gamma \leqslant \phi <\pi +2{{\gamma}_{m}} \\ w\left(u,v,\phi \right)=0 & \pi +2{{\gamma}_{m}}\leqslant \phi <2\pi. \end{array}\right. \end{eqnarray} \tag{ 14 }$

satisfies conditions in equations (12) and (13), and is used widely for image reconstruction in short-scan CBCT.

In analytic reconstruction, a weighting function is used for normalizing the data-function redundancy within the middle plane of short-scan CBCT. In real-data imaging, only discrete data ${{\mathbf{g}}_{m}}$ of size M can be measured, and, a diagonal weighting matrix is constructed thus by using M values, in a concatenated form, calculated from a weighting function satisfying equations (12) and (13) at M discrete points in data space $\left(u,v,\phi \right)$ at which ${{\mathbf{g}}_{m}}$ are sampled. The FDK algorithm then reconstructs an image from the measured, discrete data weighted by the diagonal weighting matrix. On the other hand, optimization-based reconstruction can accommodate weighting matrices that are designed based upon physical, statistical, and/or mathematical motivations other than data-redundancy normalization. In an attempt to compare with FDK reconstructions that use the diagonal weighting matrix obtained from the Parker weighting function, the same weighting matrix is used in optimization programs considered in the work, while additional weighting matrices that are designed without consideration of normalizing the data-function redundancy are also studied in section 4.3 below.

2.4. Algorithm derivation and validation

2.4.1. Algorithm derivation.

The optimization programs considered in table 1 are convex, and they can thus be solved by use of the primal-dual algorithms. In the work, we use the primal-dual algorithm developed by Chambolle and Pock (CP) (Chambolle and Pock 2011, Sidky et al 2012) because it appears to be capable of handling effectively system matrix $\mathcal{H}$ of large size of practical CT interest (Sidky et al 2014), and because it solves not only smooth, but also non-smooth, programs listed in table 1. The use of a single algorithm for solving all of the optimization programs may also avoid any reconstruction differences as a result of using different algorithms for solving different optimization programs. For the programs in table 1, we have derived explicitly the proximal-mapping steps in the CP algorithm, i.e. the update procedures in lines 9 and 10 shown in the algorithm pseudo-code in the appendix, thus allowing accurate and efficient algorithm implementation.

2.4.2. Algorithm validation.

In an attempt to verify first the implementation of the CP algorithm, and its appropriate applications to solving the optimization programs, we have performed, for each of the optimization programs considered, an inverse-crime study in which model data were generated by use of a system matrix from a known truth image, and the same system matrix is used also in reconstruction. Therefore, in each inverse-crime study, program parameters such as image-voxel size and image-constraint parameters (i.e. t₁, l₁, or l₂) are known (Bian et al 2014, Han et al 2015) with which the CP algorithm was validated quantitatively to converge to the truth image numerically through solving the optimization program. The results of the necessary validation study are not shown, because the focus of the work is on investigating possible reduction of artifacts observed in FDK reconstruction by use of optimization-based reconstruction.

3. Study design and materials

3.1. Data collection and FDK reference

3.1.1. Short-scan CBCT data.

We collected data from two physical phantoms, the RANDO man and the Catphan 504 (The Phantom Laboratory, Salem NY), using a Varian on-board-imager (OBI) on a Trilogy LINAC (Varian Medical System, Palo Alto CA). The two phantoms are selected because they contain both anthropomorphic structures in RANDO and geometric, quantitative structures in the Catphan that can be used for evaluating the impact of reconstruction algorithms. The OBI CBCT system consists of a keV x-ray source and a flat-panel detector of sizes 2u_m = 39.73 cm and 2v_m = 29.80 cm, composing of $1024\times 768$ identical square-shaped-detector bins of size 0.0388 cm. The distances from the source to the rotation axis, and to the flat-detector surface, are S = 100 cm and L = 150 cm, respectively. Data were acquired at 353 and 347 views, respectively, for the Rando and Catphan phantoms, uniformly distributed over a short-scan angular range of 196 degrees. Therefore, the sizes of data vectors considered are $M=1024\times 768\times 353$ and $1024\times 768\times 347$ for the cases involving, respectively, the Rando and Catphan phantoms.

3.1.2. FDK reference.

For comparison, we also carried out reconstructions by using the FDK algorithm, with a Hanning filter and a cutoff at 0.5 of the Nyquist frequency, from weighted real-data $\mathcal{W}{{\mathbf{g}}_{m}}$ . For revealing details of artifact reduction, soft-tissue display windows are used for reconstructions; and, as shown in figure 1, two regions of interest (ROIs) enclosed by rectangles within each of the FDK reconstructions of the Rando and Catphan phantoms are selected for showing ROI images in zoomed-in views.

**Figure 1.** ROIs indicated by rectangles within transverse slices at 3.37 cm (column 1) and 7.32 cm (column 2) from the middle plane of the Rando phantom, and at 4.15 cm (column 3) and 6.0 cm (column 4) from the middle plane of the Catphan phantom. ROI images are displayed in zoomed-in views below.
Download figure:
Standard image High-resolution image

3.2. Program-parameter selection

3.2.1. Weighting matrix and image-voxel size.

Results below were obtained with a diagonal weighting matrix that is formed from discrete values of the Parker weighting function, except for the two cases in section 4.3, in which different weighting matrices were used. In a real-data study, image-voxel size and image-constraint parameter need to be estimated, because only measured data ${{\mathbf{g}}_{m}}$ are available experimentally (Bian et al 2014, Han et al 2015). We use in the Rando and Catphan studies image arrays of sizes $N=512\times 512\times 451$ and $512\times 512\times 239$ , respectively, consisting of cubic voxels of size identical to that used in clinical IGRT CBCT imaging, which is 1.258 times of the detector-bin size of OBI CBCT scanner used for collecting data.

3.2.2. Image-constraint parameter.

We performed multiple reconstructions from each data set collected, with different t₁ values. Based upon a visual inspection of these reconstructions, we then select t₁ that yields a reconstruction with reduced artifacts comparing to the FDK image, yet without worsened, or with improved, contrast and spatial resolution. Such a way of selecting t₁ is admittedly subjective, but it seems sufficient for the purpose of the work which is qualitatively demonstrating the potential of image-TV-constrained reconstruction for reduction of artifacts observed in FDK reconstruction from short-scan CBCT data. Similarly, we determined image-constraint parameters l₁ or l₂ in optimization programs using image- ${{\ell}_{1}}$ or image- ${{\ell}_{2}}$ constraint, by performing reconstructions with varying values of l₁ or of l₂ instead, and then selecting l₁ or l₂ that yields reconstruction with visually least artifacts. The specific selection of an image-constraint parameter is generally data- and object-dependent.

3.2.3. Practical convergence conditions.

The CP algorithm has been shown to converge mathematically to solutions specified by a convex optimization program, including the optimization programs considered, but with an infinite number of iterations, which cannot be achieved in practice. Instead, using the mathematical convergence conditions as a guide, we developed practical convergence conditions such that they can be achieved within a finite number of iterations (Bian et al 2014, Han et al 2015). Of course, practical convergence conditions should be interpreted as additional program parameters because they play now, along with other program parameters, a role in specifying feasible solutions. We consider in real-data studies three practical convergent conditions to be satisfied collectively as functions of iteration numbers: (1) the image- ${{\ell}_{1}}$ , image- ${{\ell}_{2}}$ , or image-TV constraint of the reconstructed image arrives at its corresponding, designed image constraint value; (2) the data fidelity data- ${{\ell}_{2}}$ , data- ${{\ell}_{1}}$ , or data-KL sufficiently plateau; and (3) the conditional primal-dual gap (cPD) (Sidky et al 2012) continuously decreases when conditions (1) and (2) are reached. (The mathematical cPD condition on the CP-algorithm convergence is that the cPD approaches zero. Depending upon the property of the system matrix, data condition, optimization program, and computer precision, it is often unrealistic to achieve numerically the cPD condition in a practical reconstruction. As the results below show, however, the three practical convergence conditions defined above are able to yield reconstructions of practical interest.)

4. Study results

From the perspective of a C–C data model for circular CBCT, short-scan data sufficient for accurate reconstruction within the middle plane are insufficient for non-middle planes. In a real-data study, data inconsistency, including noise and scatter, can lead to artifacts within the middle plane. However, for non-middle planes, artifacts result from not only data inconsistency but also data insufficiency. The results below highlight three major studies: (1) We first study optimization programs in table 1 for reconstructions within the middle plane, which contains no data-insufficiency artifacts. (2) Based upon the study results, we select the optimization programs, which are shown to perform well based upon visual inspection, for further investigation of their potential in reduction of artifacts observed within non-middle planes of FDK reconstructions caused by data insufficiency. (3) Additional studies are performed on how the artifact reduction is affected by the selection of additional optimization programs, and weighting matrices.

4.1. Impact of optimization programs on reconstructions

4.1.1. Reconstructions based upon optimization programs in row 2 of table 1.

We reconstructed images from Rando data by solving the optimization programs in row 2 of table 1, and show the convergent reconstructions in row 1 of figure 2. As expected, because the optimization programs contain only data fidelities (except for an image positivity constraint,) the convergent reconstructions are all highly noisy, with image textures considerably different than that in the FDK reconstruction. Visual differences exist among the reconstructions because different designs of optimization programs necessarily respond differently to data inconsistencies such as data noise. It can be observed that reconstruction of program ${{D}_{{{\ell}_{1}}}}$ with data- ${{\ell}_{1}}$ fidelity appears with sprinkled-salt texture, and is noisier than those obtained with programs ${{D}_{{{\ell}_{2}}}}$ and D_KL containing data- ${{\ell}_{2}}$ and data-KL fidelities. Results similar to Rando reconstructions have been obtained also for Catphan reconstructions, which are not shown. The study suggests the need of adequate image constraints for image reconstruction from real data.

**Figure 2.** Reconstructions within the middle plane of the Rando phantom obtained using the FDK algorithm, and by using the CP algorithm solving, respectively, optimization programs shown in table 1. Display window: [0.22, 0.30] cm⁻¹.
Download figure:
Standard image High-resolution image

4.1.2. Reconstructions based upon optimization programs in rows 3–5 of table 1.

We then performed reconstructions from Rando data by solving the optimization programs in rows 3–5 of table 1, each of which containing a data fidelity and an image constraint (in addition to the image positivity constraint.) As discussed in section 3.2, for each program studied, we conducted a number of reconstructions with different values of its image-constraint parameter; and, based upon visual inspection of the reconstructions, we selected a final value and used it to yield convergent reconstructions, as shown in rows 2–4 of figure 2, according to the practical convergent conditions discussed in section 3.2. Result in row 2 reveals that reconstructions with image- ${{\ell}_{1}}$ constraint remain of noisy, sprinkled-salt texture. This is not surprising because image- ${{\ell}_{1}}$ constraint encourages sparse solutions, while the Rando-phantom image is substantially non-sparse. Conversely, as image- ${{\ell}_{2}}$ constraint promotes non-sparse solutions, optimization programs with image- ${{\ell}_{2}}$ constraint thus lead to reconstructions of the Rando phantom with visually reasonable appearance, as shown in row 3 of figure 2. On the other hand, reconstructions with an image-TV constraint in row 4 of figure 2 appear comparable to those obtained with image- ${{\ell}_{2}}$ , but with bony structures of enhanced spatial resolution. For cases considered, image constraints appear to impact reconstruction textures more significantly than do data fidelities, as reconstructions vary visually more significantly within each column than within each row. Observations similar to those for reconstructions of the Rando phantom can also be made for reconstructions of the Catphan phantom.

4.2. Optimization-based reconstructions and artifact reduction

We investigate below optimization-based reconstructions for possible reduction of artifacts within non-middle planes in FDK reconstructions.

4.2.1. Reconstructions.

As figure 2 shows, optimization programs with image-TV constraint appear to yield reconstructions with visually reasonable spatial and contrast resolution as well as image texture. Therefore, for brevity, without loss of generality, we present below results of reconstructions specified by optimization programs with image-TV constraint in row 5 of table 1 (i.e. equations (9)–(11)). In the study, using the strategy described in section 3.2, we determined image parameter t₁ for the Rando data, and display in figure 3 convergent reconstructions, accompanied by the corresponding FDK reconstructions. For revealing reconstruction details, zoomed-view images within the ROIs defined in figure 1 are depicted immediately below each corresponding reconstruction. Also, in figure 4, we display convergent reconstructions of the Catphan phantom obtained by solving the programs with image-TV constraint in row 5 of table 1. Similar results have also been obtained for reconstructions within other transverse, coronal, and sagittal slices of the two phantoms, which are not shown for brevity.

**Figure 3.** Reconstructions, together with zoomed-in images within ROIs defined in figure 1, of the Rando phantom within transverse slices at 3.37 cm (row 1) and 7.32 cm (row 2) from the middle plane, reconstructed by use of the FDK algorithm and the CP algorithm solving, respectively, optimization programs in row 5 of table 1. Arrows in the reconstruction in panel 4 of row 2 highlight the low-contrast structures in the Rando phantom. Display window: [0.22, 0.30] cm⁻¹.
Download figure:
Standard image High-resolution image

**Figure 4.** Reconstructions, together with zoomed-in images within ROIs defined in figure 1, of the Catphan phantom within transverse slices at 4.15 cm (row 1) and 6.00 cm (row 2) from the middle plane, reconstructed by use of the FDK algorithm and the CP algorithm solving, respectively, optimization programs in row 5 of table 1. Display window: [0.22, 0.32] cm⁻¹ for row 1 and [0.22, 0.35] cm⁻¹ for row 2.
Download figure:
Standard image High-resolution image

4.2.2. Artifact reduction.

Results in figures 3 and 4 indicate that reconstructions specified by the programs with image-TV constraint in row 5 of table 1 appear to suppress effectively artifacts observed in FDK reconstructions. In particular, as a result of artifact reduction, low-contrast structures, highlighted by arrows, within the Rando phantom can now be observed in figure 3. Some residual artifacts remain visible in the reconstructions due to data insufficiency and inconsistency, such as beam-hardening and scatter, which are not included in the imaging model. It is also interesting to notice that image texture obtained with the programs appear differently, because they respond differently to data inconsistency and insufficiency. Reconstructions with image-TV constraint seem to be robust relative to different data fidelities in terms of compensating for the artifacts. Again, remarks similar to those for the Rando reconstructions above can be made for the Catphan reconstructions. Furthermore, inspection of figures 3 and 4 indicates that the performance of the optimization programs considered appears to be comparably effective for objects such as the Rando and Catphan phantoms with distinct anatomies.

4.2.3. Reconstruction evolution as a function of iterations.

The discussion above focuses on convergent reconstructions satisfying the practical convergent conditions, obtained at about 1090 iterations for the Rando phantom, and about 950 iterations for the Catphan phantom, respectively. For demonstrating the reconstruction convergence, we show in figure 5 convergence metrics, including data- ${{\ell}_{2}}$ , image-TV, and cPD, as functions of iterations for the Rando reconstruction in solving program D_KL- $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$ in equation (10). It can be observed that the convergence metrics plateau considerably before 1090 iterations. It is possible, however, that artifacts are compensated effectively for in reconstructions at earlier iterations. In figure 6, we display reconstructions of the Rando phantom at iterations 30, 50, and 150, and of the Catphan phantom at iterations 10, 30, and 100, as well as the corresponding convergent reconstructions, i.e. the final reconstructions, shown in column 4. It can be observed that Rando and Catphan reconstructions at, e.g. about iterations 150 already resemble visually their respective convergent reconstructions in terms of effective reduction of artifacts in FDK reconstructions.

**Figure 5.** Data- ${{\ell}_{2}}$ (left), image-TV (middle), and conditional primal-dual gap (cPD) (right), as functions of iteration number n for Rando-phantom reconstruction by using the CP algorithm solving program ${{D}_{{{\ell}_{2}}}}$ - $|\mathbf{f}{{|}_{\text{TV}}}$ in equation (10). Dashed line in image-TV plot (middle) is specified by image-constraint parameter t₁ determined. The plot scales are in arbitrary units.
Download figure:
Standard image High-resolution image

**Figure 5.** Data- ${{\ell}_{2}}$ (left), image-TV (middle), and conditional primal-dual gap (cPD) (right), as functions of iteration number n for Rando-phantom reconstruction by using the CP algorithm solving program ${{D}_{{{\ell}_{2}}}}$ - $|\mathbf{f}{{|}_{\text{TV}}}$ in equation (10). Dashed line in image-TV plot (middle) is specified by image-constraint parameter t₁ determined. The plot scales are in arbitrary units.
Download figure:
Standard image High-resolution image

**Figure 6.** Reconstructions of the Rando phantom (row 1) within transverse slice at 7.32 cm (row 1), and of Catphan (row 2) phantom within transverse slice at 6.0 cm, from their respective middle planes at iteration numbers indicated, along with their respective, convergent reconstructions (column 4), obtained with the CP algorithm solving program ${{D}_{{{\ell}_{2}}}}$ - $|\mathbf{f}{{|}_{\text{TV}}}$ in equation (10). Display windows [0.22, 0.30] cm⁻¹ and [0.22, 0.35] cm⁻¹ are for Rando and Catphan images, respectively.
Download figure:
Standard image High-resolution image

**Figure 6.** Reconstructions of the Rando phantom (row 1) within transverse slice at 7.32 cm (row 1), and of Catphan (row 2) phantom within transverse slice at 6.0 cm, from their respective middle planes at iteration numbers indicated, along with their respective, convergent reconstructions (column 4), obtained with the CP algorithm solving program ${{D}_{{{\ell}_{2}}}}$ - $|\mathbf{f}{{|}_{\text{TV}}}$ in equation (10). Display windows [0.22, 0.30] cm⁻¹ and [0.22, 0.35] cm⁻¹ are for Rando and Catphan images, respectively.
Download figure:
Standard image High-resolution image

4.3. Additional study results

4.3.1. Reconstructions with image- ${{\ell}_{1}}$ and image- ${{\ell}_{2}}$ constraints.

We have investigated artifact reduction with the optimization programs in table 1, and show results above obtained from the optimization programs with image-TV constraint. In the discussion below, we present reconstructions obtained from optimization programs with image- ${{\ell}_{1}}$ and image- ${{\ell}_{2}}$ constraints instead, i.e.

$\begin{eqnarray}&&{{\mathbf{f}}^{\ast}}=\underset{\mathbf{f}}{{\text{argmin}}}\,{{D}_{{{\ell}_{2}}}}\left(\mathbf{f}\right)\,\,\,\,\,\,\text{s}\text{.t}\text{.}\,\,\,\,\,\,\parallel \mathbf{f}{{\parallel}_{{{\ell}_{1}}}}\leqslant {{l}_{1}}\,\,\,\,\,\,\text{and}\,\,\,\,\,\,{{f}_{j}}\geqslant 0,\end{eqnarray} \tag{ 15 }$

and

$\begin{eqnarray}&&{{\mathbf{f}}^{\ast}}=\underset{\mathbf{f}}{{\text{argmin}}}\,{{D}_{{{\ell}_{2}}}}\left(\mathbf{f}\right)\,\,\,\,\,\,\text{s}\text{.t}\text{.}\,\,\,\,\,\,\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}\leqslant {{l}_{2}}\,\,\,\,\,\,\text{and}\,\,\,\,\,\,{{f}_{j}}\geqslant 0,\end{eqnarray} \tag{ 16 }$

which are denoted as ${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{1}}}}$ and ${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}$ , respectively, in table 1. For each study, image-constraint parameters ${{\ell}_{1}}$ and ${{\ell}_{2}}$ were determined as described in section 3.2. Using the CP algorithm to solve the programs, we reconstruct images from the Rando and Catphan data, and display them in figure 7. Reconstructions in figure 7 demonstrate that optimization programs with image- ${{\ell}_{1}}$ and image- ${{\ell}_{2}}$ constraints appear to be considerably less effective than the optimization programs with image-TV constraint in reducing the FDK-reconstruction artifacts. Moreover, as expected, reconstructions with an image- ${{\ell}_{1}}$ constraint appear with noisy, sprinkled-salt textures because both Rando and Catphan are highly non-sparse. It is also interesting to notice that reconstructions based upon program ${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}$ in equation (15) have an appearance resembling that of the corresponding FDK reconstructions. This is because program ${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{{{\ell}_{2}}}}$ without image-positivity constrain is equivalent to the Tikhonov-Phillips method, which has been show to be an analogy of the filtered-back projection (i.e. the FDK) algorithm (Natterer and Hadeler 1980, Natterer and Wübbeling 2001).

**Figure 7.** Reconstructions of the Rando phantom (row 1) within transverse slice at 7.32 cm, and of the Catphan phantom (row 2) within transverse slice at 6.0 cm, from their respective middle planes, reconstructed by use of the FDK algorithm, and by using the CP algorithm solving optimization programs in column 3 of table 1, respectively. Display windows [0.22, 0.30] cm⁻¹ and [0.22, 0.35] cm⁻¹ are for the Rando and Catphan images, respectively.
Download figure:
Standard image High-resolution image

4.3.2. Effect of different weighting matrices on image reconstruction.

In analytic, such as FDK, reconstruction, a weighting function used for designing a weighting matrix is required to satisfy the normalization and continuity conditions in equations (12) and (13) for avoiding yielding significant numerical artifacts. We have investigated weighting functions with different forms in optimization reconstructions. Using a weighting matrix formed from a weighting function

$\begin{eqnarray}\left\{\begin{array}{*{35}{l}} w\left(u,v,\phi \right)=\frac{1}{2} & 0\leqslant \phi <2{{\gamma}_{m}}-2\gamma \\ w\left(u,v,\phi \right)=1 & 2{{\gamma}_{m}}-2\gamma \leqslant \phi <\pi -2\gamma \\ w\left(u,v,\phi \right)=\frac{1}{2} & \pi -2\gamma \leqslant \phi <\pi +2{{\gamma}_{m}} \\ w\left(u,v,\phi \right)=0 & \pi +2{{\gamma}_{m}}\leqslant \phi <2\pi, \end{array}\right. \end{eqnarray} \tag{ 17 }$

which satisfies the normalization condition in equation (12), but not the continuity condition in equation (13), we carried out reconstruction based upon optimization programs considered, and display example reconstructions in figure 8 obtained by solving program ${{D}_{{{\ell}_{2}}}}$ - $|\mathbf{f}{{|}_{\text{TV}}}$ in equation (10) from the Rando and Catphan data. As shown in row 1 of figure 8, optimization-based reconstructions are virtually free of prominent numerical artifacts in FDK reconstructions, resulted from the weighting-function discontinuity in ϕ. Furthermore, we studied reconstructions using an identity matrix, corresponding to a uniform weighting function without satisfying equation (12), and display them in row 2 of figure 8. Again, optimization-based reconstructions appear free of numerical artifacts in the FDK reconstruction, as a consequence that the uniform weighting function does not satisfy the normalization condition. The results of the investigation indicate weighting matrices designed by use of weighting functions satisfying, or without satisfying, equations (12) and/or (13) appear to have little impact on the performance of optimization-based reconstructions.

**Figure 8.** Images of the Rando phantom within transverse slice at 7.32 cm, and of the Catphan phantom within transverse slice at 6.0 cm, from their respective middle planes, reconstructed by using the FDK algorithm (column 1 $\And$ 3), and by using the CP algorithm (column 2 $\And$ 4) solving optimization program ${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$ in equation (10) with weighting matrices calculated from a non-smooth weighting function in equation (17) (row 1) and a smooth, but uniform, weighting function (row 2), respectively. Display windows [0.22, 0.35] cm⁻¹ and [0.22, 0.30] cm⁻¹ are for the Rando and Catphan images, respectively.
Download figure:
Standard image High-resolution image

**Figure 8.** Images of the Rando phantom within transverse slice at 7.32 cm, and of the Catphan phantom within transverse slice at 6.0 cm, from their respective middle planes, reconstructed by using the FDK algorithm (column 1 $\And$ 3), and by using the CP algorithm (column 2 $\And$ 4) solving optimization program ${{D}_{{{\ell}_{2}}}}$ - $\parallel \mathbf{f}{{\parallel}_{\text{TV}}}$ in equation (10) with weighting matrices calculated from a non-smooth weighting function in equation (17) (row 1) and a smooth, but uniform, weighting function (row 2), respectively. Display windows [0.22, 0.35] cm⁻¹ and [0.22, 0.30] cm⁻¹ are for the Rando and Catphan images, respectively.
Download figure:
Standard image High-resolution image

5. Discussion

In this work, we have investigated optimization-based reconstructions for possible reduction of image artifacts often observed in FDK reconstruction from short-scan CBCT data. In an attempt to appreciate how different designs of optimization-based reconstruction may impact the artifact reduction, numerous convex optimization programs of potential research and application interest were considered. The investigation is enabled by a primal-dual algorithm that is shown mathematically to solve convex optimization programs, especially for those under consideration. The focus of the study is not on quantitative assessment of an optimization-based reconstruction in a specific, clinically or practically, relevant task; it is tailored, instead, to demonstrating whether optimization-based reconstructions can be designed for reducing artifacts observed in FDK reconstruction. The study confirms that optimization-based reconstruction with appropriate image constraints such as image-TV constraint can reduce FDK-reconstruction artifacts in CBCT with a short-scan configuration. As the results demonstrated, a direct benefit of such an artifact reduction can be an improved contrast level of low-contrast anatomic structures that are obscured otherwise by the artifacts.

The performance of optimization reconstructions in a specific imaging task is likely to be dependent upon the forms and parameters specifying the optimization-programs, as the study results demonstrated. In terms of the reduction of artifacts observed in FDK reconstruction from short-scan CBCT data, visual inspection of study results reveals that reconstructions based upon image-TV-constrained data-discrepancy minimization appear to be capable of reducing the artifacts. Furthermore, although the selection of different weighting matrices has a significant impact on FDK reconstruction, it appears to have insignificant effect on optimization-based reconstructions. Using two phantoms of IGRT CBCT relevance, we show that the image-TV-constrained reconstruction remains effective in reduction of off-middle-plane artifacts in FDK reconstructions from short-scan CBCT data for objects with substantially distinct anatomies.

We considered in the study reconstruction only from short-scan CBCT data. However, the work has implications for reconstruction from data acquired with other scanning configurations in CBCT. It is not uncommon that a scan with an angular range between those of short-scan and full-scan configurations is used in practical applications. The investigation and analysis approach in the work can readily be tailored to investigating appropriate optimization-based reconstruction for possible reduction of artifacts that may appear in the corresponding FDK reconstruction. The study method and results can also be adopted to investigate optimization-based reconstruction from data collected with an offset detector in CBCT.

The evaluation of the study results was based largely on visual inspection, which serves adequately the focus of the study, which is on whether optimization-based reconstruction can reduce some of the prominent artifacts observed in FDK reconstructions from short-scan CBCT data. However, as the results reveal, different optimization-based reconstructions may behave distinctly differently, depending upon numerous factors, including their design parameters, data conditions, and object properties. More importantly, their performance is likely to be task-specific and thus should be measured with metrics of task-performance utility. Quantitative study of task-specific performance of optimization-based reconstructions can be a topic of theoretical and practical interest for future research.

Acknowledgments

The authors would like to thank S Rose and B Chen for useful discussion. This work was supported in part by the National Institutes of Health (NIH) under grants R01s CA158446, CA182264, and EB018102.

Appendix

The pseudo-codes for the CP algorithm used are given below:

Algorithm. Pseudo-codes of the CP algorithm for solving programs in table 1

INPUT: ${{\mathbf{g}}_{m}}$ ${{\mathbf{g}}_{m}}$ ; $\mathcal{H}$ $\mathcal{H}$ , $\mathcal{W}$ $\mathcal{W}$ , and $\mathcal{X}$ $\mathcal{X}$ ; ζ and λ

1: $\nu \leftarrow {{\nu}_{\mathcal{H}}}$ $\nu \leftarrow {{\nu}_{\mathcal{H}}}$ , $L\leftarrow \parallel {{\mathcal{H}}_{\mathcal{X}}}{{\parallel}_{\text{SV}}}$ $L\leftarrow \parallel {{\mathcal{H}}_{\mathcal{X}}}{{\parallel}_{\text{SV}}}$ , $\tau \leftarrow 1/L,\,\sigma \leftarrow 1/L$ $\tau \leftarrow 1/L,\,\sigma \leftarrow 1/L$

2: $\theta \leftarrow 1,\,n\leftarrow 0$ $\theta \leftarrow 1,\,n\leftarrow 0$

3: INITIALIZE: ${{\mathbf{f}}_{0}}$ ${{\mathbf{f}}_{0}}$ , ${{\mathbf{p}}_{0}}$ ${{\mathbf{p}}_{0}}$ , and ${{\mathbf{q}}_{0}}$ ${{\mathbf{q}}_{0}}$ to zero

4: ${{\mathbf{\bar{f}}}_{0}}\leftarrow {{\mathbf{f}}_{0}}$ ${{\mathbf{\bar{f}}}_{0}}\leftarrow {{\mathbf{f}}_{0}}$

5: repeat

6: ${{\mathbf{p}}_{n+1}}\leftarrow \mathbf{\Theta}\left(\lambda,{{\mathbf{p}}_{n}},\mathcal{W},\mathcal{H}{{\mathbf{\bar{f}}}_{n}},{{\mathbf{g}}_{m}}\right)$ ${{\mathbf{p}}_{n+1}}\leftarrow \mathbf{\Theta}\left(\lambda,{{\mathbf{p}}_{n}},\mathcal{W},\mathcal{H}{{\mathbf{\bar{f}}}_{n}},{{\mathbf{g}}_{m}}\right)$

7: ${{\mathbf{q}}_{n+1}}\leftarrow \mathbf{\Omega }\left({{\mathbf{q}}_{n}},\nu,\sigma,\mathcal{X}{{\mathbf{\bar{f}}}_{n}},\zeta \right)$ ${{\mathbf{q}}_{n+1}}\leftarrow \mathbf{\Omega }\left({{\mathbf{q}}_{n}},\nu,\sigma,\mathcal{X}{{\mathbf{\bar{f}}}_{n}},\zeta \right)$

8: ${{\mathbf{f}}_{n+1}}\leftarrow \mathsf p\mathsf o\mathsf s\left[{{\mathbf{f}}_{n}}-\tau \left({{\mathcal{H}}^{T}}{{\mathcal{W}}^{T}}{{\mathbf{p}}_{n+1}}+\nu {{\mathcal{X}}^{\top}}{{\mathbf{q}}_{n+1}}\right)\right]$ ${{\mathbf{f}}_{n+1}}\leftarrow \mathsf p\mathsf o\mathsf s\left[{{\mathbf{f}}_{n}}-\tau \left({{\mathcal{H}}^{T}}{{\mathcal{W}}^{T}}{{\mathbf{p}}_{n+1}}+\nu {{\mathcal{X}}^{\top}}{{\mathbf{q}}_{n+1}}\right)\right]$

9: ${{\mathbf{\bar{f}}}_{n+1}}\leftarrow {{\mathbf{f}}_{n+1}}+\theta \left({{\mathbf{f}}_{n+1}}-{{\mathbf{f}}_{n}}\right)$ ${{\mathbf{\bar{f}}}_{n+1}}\leftarrow {{\mathbf{f}}_{n+1}}+\theta \left({{\mathbf{f}}_{n+1}}-{{\mathbf{f}}_{n}}\right)$

10: $n\leftarrow n+1$ $n\leftarrow n+1$

11: until the practical convergence conditions are satisfied

12: OUTPUT: image ${{\mathbf{f}}_{\text{conv}}}\leftarrow {{\mathbf{f}}_{n}}$ ${{\mathbf{f}}_{\text{conv}}}\leftarrow {{\mathbf{f}}_{n}}$

In the algorithm, vector ${{\mathbf{g}}_{m}}$ of size $M={{M}_{u}}\times {{M}_{v}}\times {{M}_{\phi}}$ denotes measured data, where M_u and M_v are number of detector bins along the u- and v-axis on the detector plane, and ${{M}_{\phi}}$ represents the number of views at which data are collected; vector ${{\mathbf{f}}_{n}}$ of size $N={{N}_{x}}\times {{N}_{y}}\times {{N}_{z}}$ image at the nth iteration, where N_x, N_y, and N_z are the numbers of voxels along the x-, y-, and z-axis, respectively; ${{\mathbf{f}}_{\text{conv}}}$ the convergent reconstruction when the practical convergence conditions are satisfied; $\mathcal{W}$ and $\mathcal{H}$ weighting and system matrices; parameter ${{\nu}_{\mathcal{H}}}=\frac{\parallel \mathcal{W}\mathcal{H}{{\parallel}_{\text{SV}}}}{\parallel \mathcal{X}{{\parallel}_{\text{SV}}}}$ , where $\parallel \centerdot {{\parallel}_{\text{SV}}}$ denotes the largest singular value of a matrix (Sidky et al 2012); matrix ${{\mathcal{H}}_{\mathcal{X}}}$ has a transpose $\mathcal{H}_{\mathcal{X}}^{\top}=\left({{\mathcal{H}}^{\top}}{{\mathcal{W}}^{\top}},\nu {{\mathcal{X}}^{\top}}\right)$ , where the superscript ' $\top$ ' indicates a transpose operation; $\mathsf p\mathsf o\mathsf s\left(\mathbf{f}\right)$ enforces f_j = 0, if f_j < 0, where f_j denotes the jth entry of vector $\mathbf{f}$ of size N; vectors ${{\mathbf{p}}_{n}}$ and $\mathbf{\Theta}$ are of size M, whereas vectors ${{\mathbf{q}}_{n}}$ and $\mathbf{\Omega }$ have the same size, which depends upon the programs considered. The specifics of vectors $\mathbf{\Theta}$ , $\mathbf{\Omega }$ , along with matrix $\mathcal{X}$ , are discussed below for optimization programs under study. Parameter ζ denotes $\nu {{l}_{1}}$ , $\nu {{l}_{2}}$ , and $\nu {{t}_{1}}$ , respectively, for programs with image- ${{\ell}_{1}}$ , image- ${{\ell}_{2}}$ , and image-TV constraints. For each study, the values of l₁, l₂, and t₁ were determined in section 3.2. Algorithm parameter λ has no impact mathematically on the feasible solution set specified by an optimization program, and it can, however, affect the convergence rate of the algorithm. We select λ between 0.01 to 1.0 in the study, because previous empirical studies have shown that this selection generally yields a reasonable convergence rate.

For the convenience of discussing the determination of $\mathbf{\Theta}$ and $\mathbf{\Omega }$ below, we consider two vectors $\mathbf{a}$ and $\mathbf{b}$ of sizes k N and N, where $k\geqslant 1$ is a positive integer. Let a_j denote the jth entry of $\mathbf{a}$ , j = 1, 2,..., k N, and b_j the jth entry of $\mathbf{b}$ , j = 1, 2,..., N, respectively. We use vectors $\mathbf{ab}$ and $\mathbf{a}/\mathbf{b}$ of sizes k N to denote the multiplication and division of $\mathbf{a}$ and $\mathbf{b}$ , with entry j given by

$\begin{eqnarray}\begin{array}{*{35}{l}} {{\left(\mathbf{ab}\right)}_{j}} & ={{a}_{j}}{{b}_{{{j}_{m}}}} \\ {{\left(\mathbf{a}/\mathbf{b}\right)}_{j}} & ={{a}_{j}}/{{b}_{{{j}_{m}}}}, \end{array}\end{eqnarray}$

where j = 1, 2,..., k N, and ${{j}_{m}}=\text{mod}~\left(\,j,N\right)$ indicates the remainder of j dividing by N, and ${{b}_{{{j}_{m}}}}\ne 0$ . In particular, when k = 1, $\mathbf{a}$ and $\mathbf{b}$ are of the same size N, and vectors $\mathbf{ab}$ and $\mathbf{a}/\mathbf{b}$ thus are of size N and depict element-wise multiplication and division.

A.1. Updates in line 6

Update vector $\mathbf{\Theta}\left(\lambda,{{\mathbf{p}}_{n}},\mathcal{W},\mathcal{H}{{\mathbf{\bar{f}}}_{n}},{{\mathbf{g}}_{m}}\right)$ of size M in line 6 depends upon the specific form of data fidelity considered. For programs in column 2 of table 1,

$\begin{eqnarray}&&\mathbf{\Theta}\left(\lambda,{{\mathbf{p}}_{n}},\mathcal{W},\mathcal{H}{{\mathbf{\bar{f}}}_{n}},{{\mathbf{g}}_{m}}\right)=\left[{{\mathbf{p}}_{n}}+\sigma \mathcal{W}\left(\mathcal{H}{{\mathbf{\bar{f}}}_{n}}-{{\mathbf{g}}_{m}}\right)\right]/\left(1+\sigma /\lambda \right);\end{eqnarray} \tag{ A.1 }$

for optimization programs in column 3 of table 1,

$\begin{eqnarray}&&\mathbf{\Theta}\left(\lambda,{{\mathbf{p}}_{n}},\mathcal{W},\mathcal{H}{{\mathbf{\bar{f}}}_{n}},{{\mathbf{g}}_{m}}\right)=\frac{{{\mathbf{p}}_{n}}+\sigma \mathcal{W}\left(\mathcal{H}{{{\mathbf{\bar{f}}}}_{n}}-{{\mathbf{g}}_{m}}\right)}{\text{max}\left({{\mathbf{1}}_{\text{D}}},\,|{{\mathbf{p}}_{n}}+\sigma \mathcal{W}\left(\mathcal{H}{{{\mathbf{\bar{f}}}}_{n}}-{{\mathbf{g}}_{m}}\right)|/\lambda \right)},\end{eqnarray} \tag{ A.2 }$

where ${{\mathbf{1}}_{\text{D}}}$ denotes a vector of size M with all entries set to 1, $|\mathbf{p}|$ takes entry-wise absolute values of vector $\mathbf{p}$ of size M, and $\text{max}\left(\centerdot \right)$ is performed component-wise; and for programs in column 4 in table 1,

$\begin{eqnarray}\begin{array}{*{35}{l}} \mathbf{\Theta}\left(\lambda,{{\mathbf{p}}_{n}},\mathcal{W},\mathcal{H}{{{\mathbf{\bar{f}}}}_{n}},{{\mathbf{g}}_{m}}\right) & = & \frac{1}{2}\left[\boldsymbol{\lambda }_{{\text{D}}}+{{\mathbf{p}}_{n}}+\sigma \mathcal{W}\mathcal{H}{{{\mathbf{\bar{f}}}}_{n}}\right] \\ {} & - & \sqrt{{{\left[\boldsymbol{\lambda }_{{\text{D}}}-{{\mathbf{p}}_{n}}-\sigma \mathcal{W}\mathcal{H}{{{\mathbf{\bar{f}}}}_{n}}\right]}^{2}}+4\sigma \lambda \mathcal{W}{{\mathbf{g}}_{m}}}, \end{array}\end{eqnarray} \tag{ A.3 }$

where $\boldsymbol{\lambda }_{{\text{D}}}$ is a vector of size M with entries set to λ, and $\sqrt{\mathbf{p}}$ takes entry-wise square roots of vector $\mathbf{p}$ of size M.

A.2. Updates in line 7

For programs in row 2, $\mathcal{X}$ is an identity matrix of size N, and vector $\mathbf{\Omega }$ of size N is given by

$\begin{eqnarray}&&\mathbf{\Omega }\left({{\mathbf{q}}_{n}},\nu,\sigma,\mathcal{X}{{\mathbf{\bar{f}}}_{n}},\zeta \right)=\mathbf{0},\end{eqnarray} \tag{ A.4 }$

where $\mathbf{0}$ is a vector of size N with entries set to zero.

For program in rows 3 and 4 of table 1, $\mathcal{X}$ is an identity matrix of size N, image-constraint parameter $\zeta =\nu {{l}_{1}}$ for programs in row 3 and $\zeta =\nu {{l}_{2}}$ for programs in row 4, and vector $\mathbf{\Omega }$ of size N is given by

$\begin{eqnarray}&&\mathbf{\Omega }\left({{\mathbf{q}}_{n}},\nu,\sigma,\mathcal{X}{{\mathbf{\bar{f}}}_{n}},\zeta \right)=\boldsymbol{\varrho }\left({{\mathbf{1}}_{I}}-\sigma \mathbf{s}/|\boldsymbol{\varrho }{{|}_{\text{MAG}}}\right),\end{eqnarray} \tag{ A.5 }$

where vector $\boldsymbol{\varrho }$ of size N is defined as

$\begin{eqnarray}&&\boldsymbol{\varrho }={{\mathbf{q}}_{n}}+\sigma \nu {{\mathbf{\bar{f}}}_{n}},\end{eqnarray}$

vector ${{\mathbf{1}}_{I}}$ of size N with entries set to 1, and vector $\mathbf{s}$ of size N is given by

$\begin{eqnarray}&&\mathbf{s}=\mathbf{ProjOnto}{{\ell}_{p}}\mathbf{Bal}{{\mathbf{l}}_{c}}\left(|\boldsymbol{\varrho }{{|}_{\text{MAG}}}/\sigma \right),\end{eqnarray} \tag{ A.6 }$

operator $\mathbf{ProjOnto}{{\ell}_{p}}\mathbf{Bal}{{\mathbf{l}}_{c}}$ projects vector $|\boldsymbol{\varrho }{{|}_{\text{MAG}}}/\sigma$ onto the ${{\ell}_{p}}$ -ball of scale c, $|\boldsymbol{\varrho }{{|}_{\text{MAG}}}$ depicts a vector of size N with entry j given by ${{\left(|\boldsymbol{\varrho }{{|}_{\text{MAG}}}\right)}_{j}}=|{{\varrho}_{j}}|$ , and ${{\varrho}_{j}}$ indicates the jth entry of $\boldsymbol{\varrho }$ . Furthermore, p = 1 and $c=\nu {{l}_{1}}$ for programs in row 3 of table 1, and p = 2 and $c=\nu {{l}_{2}}$ for programs in row 4 of table 1.

For program in row 5 of table 1, matrix $\mathcal{X}=\nabla$ denotes a spatial gradient matrix of size $3N\times N$ , with its transpose ${{\nabla}^{\top}}=\left(\nabla _{x}^{\top},\nabla _{y}^{\top},\nabla _{z}^{\top}\right)$ , where matrices ${{\nabla}_{x}}$ , ${{\nabla}_{y}}$ , and ${{\nabla}_{z}}$ of size $N\times N$ represent the finite difference matrices along x-, y-, and z-axis, respectively, yielding vectors ${{\nabla}_{x}}\mathbf{f}$ , ${{\nabla}_{y}}\mathbf{f}$ , and ${{\nabla}_{z}}\mathbf{f}$ of size N, which are used to form vector $\nabla \mathbf{f}$ of size 3N in a concatenated form in the order of x, y, and z, $|\nabla \mathbf{f}{{|}_{\text{MAG}}}$ depicts a vector of size N with entry j given by ${{\left(|\nabla \mathbf{f}{{|}_{\text{MAG}}}\right)}_{j}}=\sqrt{\left(\nabla \mathbf{f}\right)_{j}^{2}+\left(\nabla \mathbf{f}\right)_{j+N}^{2}+\left(\nabla \mathbf{f}\right)_{j+2N}^{2}}$ , and ${{\left(\nabla \mathbf{f}\right)}_{j}}$ indicates the jth entry of vector $\nabla \mathbf{f}$ . Moreover, image-constraint parameter $\zeta =\nu {{t}_{1}}$ , and vector $\mathbf{\Omega }$ of size 3N is given by

$\begin{eqnarray}&&\mathbf{\Omega }\left({{\mathbf{q}}_{n}},\nu,\sigma,\mathcal{X}\,{{\mathbf{\bar{f}}}_{n}},\zeta \right)=\boldsymbol{\varrho }\left({{\mathbf{1}}_{I}}-\sigma \mathbf{s}/|\boldsymbol{\varrho }{{|}_{\text{MAG}}}\right),\end{eqnarray} \tag{ A.7 }$

where vector $\boldsymbol{\varrho }$ of size 3N is defined as

$\begin{eqnarray}&&\boldsymbol{\varrho }={{\mathbf{q}}_{n}}+\sigma \nu \nabla {{\mathbf{\bar{f}}}_{n}},\end{eqnarray}$

vector ${{\mathbf{1}}_{I}}$ is of size N with entries set to 1, and vector $\mathbf{s}$ of size N is given by

$\begin{eqnarray}&&\mathbf{s}=\mathbf{ProjOnto}{{\ell}_{1}}\mathbf{Bal}{{\mathbf{l}}_{\nu {{t}_{1}}}}\left(|\boldsymbol{\varrho }{{|}_{\text{MAG}}}/\sigma \right),\end{eqnarray} \tag{ A.8 }$

operator $\mathbf{ProjOnto}{{\ell}_{1}}\mathbf{Bal}{{\mathbf{l}}_{\nu {{t}_{1}}}}$ projects vector $|\boldsymbol{\varrho }{{|}_{\text{MAG}}}/\sigma$ onto the ${{\ell}_{1}}$ -ball of scale $\nu {{t}_{1}}$ (Sidky et al 2014), $|\boldsymbol{\varrho }{{|}_{\text{MAG}}}$ depicts a vector of size N with entry j given by ${{\left(|\boldsymbol{\varrho }{{|}_{\text{MAG}}}\right)}_{j}}=\sqrt{\varrho _{j}^{2}+\varrho _{j+N}^{2}+\varrho _{j+2N}^{2}}$ , and ${{\varrho}_{j}}$ indicates the jth entry of vector $\boldsymbol{\varrho }$ .