Sparse-view CBCT reconstruction via weighted Schatten p-norm minimization

: A novel iterative algorithm is proposed for sparse-view cone beam computed tomography (CBCT) reconstruction based on the weighted Schatten p -norm minimization (WSNM). By using the half quadratic splitting, the sparse-view CBCT reconstruction task is decomposed into two sub-problems that can be solved through alternating iteration: simple reconstruction and image denoising. The WSNM that ﬁts well with the low-rank hypothesis of CBCT data is introduced to improve the denoising sub-problem as a regularization term. The experimental results based on the digital brain phantom and clinical CT data indicated the advantages of the proposed algorithm in both structural information preservation and artifacts suppression, which performs better than the classical algorithms in quantitative and qualitative evaluations.


Introduction
Cone beam computed tomography (CBCT) is an important medical imaging technology widely used in clinical diagnosis and treatment.During a CBCT scanning, the cone-beam X-rays are emitted at several equal intervals angles when the X-ray source moves in a circular orbit around the detected object.At the other end of the device, a flat panel detector receives the X-rays that pass through the detected object and a series of projection data (observations) are captured, from which a 3D voxel model can be reconstructed for clinical use.The CBCT has the advantages of fast scanning speed and high spatial resolution.However, since the X-rays are harmful to the human body, which may induce cancer or genetic diseases [1], both doctors and patients are looking for techniques to reduce the radiation dose of CBCT.There are two main approaches to reduce the dose, i.e., decreasing mAs (the product of tube current and exposure time) or reduce projection views [2].The former introduces noise in the projection data directly, while the latter leads to artifacts in the reconstructed model due to insufficient observations [3].This paper focuses on the CBCT reconstruction of sparse projection views.
Analytical methods such as the classical FDK [4] and its improved versions [5] have been developed early for full-view CBCT reconstruction.However, sparse views lead to serious artifacts in the analytically reconstructed results due to insufficient projection data.To overcome the shortcoming of the analytical methods, iterative algorithms which treat the CBCT reconstruction as an ill-posed linear problem have been investigated, such as the ART [6], SART [7], OS-SART [8], and EM [9].Although the reconstruction performance has been improved, it is still very challenging for very sparse views.
The statistical iterative reconstruction (SIR) framework has been studied to further improve the sparse-view reconstruction.Generally, the objective function of SIR is composed of two items.The first item is a data fidelity item, which drives the reconstruction result consistent with the limited projection data as possible.The second item is a regularization (or prior) item, which limits the solution to the prior knowledge of statistics.The regularization term can be built on various prior models, among which the total variation (TV) [10][11][12] is the most commonly used.Based on the TV minimization, several CBCT reconstruction algorithms have been proposed [13,14].However, these TV-based methods suffer from the staircase effect due to the piecewise assumption being used.To overcome this problem, some improved TV terms have been introduced, such as the ASD-POCS algorithm [15], total p-variation [16], and prior contour based total variation [17].Besides the TV, other regularization techniques are also investigated for sparse-view CBCT reconstruction, such as the dictionary learning [18], compressive sensing [19], and Hessian Schatten penalties [20].In addition, deep learning techniques are also used for the CBCT reconstruction, and good results have been reported [21,22].
Recently, in the field of image and signal denoising, the low-rank matrix approximation (LRMA) has attracted wide attention.In [23] and [24], the nuclear norm minimization (NNM) and the Schatten p-norm (0<p ≤ 1) minimization (SNM) are used to solve the LRMA respectively.As a convex relaxation of rank minimization, the NNM is a special case of the SNM when p = 1 and easier to solve, while the SNM performs better for accurate data recovery [25].However, both NNM and SNM treat all singular values equally and shrink them with the same threshold, which causes poor edge/texture preservation, as larger singular values normally carry more structural information.To overcome this shortcoming, the weighted versions, i.e., WNNM [26] and WSNM [27] are proposed, which shrink the larger singular values less during rank minimization.As reported in [27], compared with the WNNM, the WSNM is a better approximation to the LRMA problem, which can be transformed into a series of independent non-convex l p -norm minimization problem and solved efficiently with a generalized soft-thresholding (GST) algorithm [28].
In this paper, we propose a novel iterative algorithm for sparse-view cone beam computed tomography (CBCT) reconstruction.The nonlocal self-similarity within the medical CBCT data is utilized to improve the reconstruction.Inspired by [27], the WSNM is introduced to quantify this self-similarity and regularize the reconstructed voxels.The main contributions of this work are as follow: 1) the sparse-view CBCT reconstruction task defined as an ill-posed inverse problem is decomposed into two sub-problems by using the half quadratic splitting (HQS) technique [29], i.e., least-square reconstruction and low-rank image denoising; 2) the WSNM originally proposed for 2D image denoising is extended for CBCT reconstruction, as a regularization term to impose low-rank constraint on the reconstructed voxels.Experimental results on various datasets show that the proposed algorithm performs excellently in both qualitative and quantitative evaluations.It achieves a good balance between noise suppression and structural information preservation.The proposed method is also better than the WNNM-based method, benefiting from the extended WSNM which can effectively utilize the correlations between the non-local similar blocks of reconstructed data to preserve structural information.

General reconstruction model
Let x = x 1 x 2 . . .x N T be the vectorized voxels and y = y 1 y 2 . . .y M T be the vectorized observations of several views.The CBCT imaging can be described by [30] where A is a M × N system matrix, whose element a ij denotes the contribution of the j-th voxel x j to the i-th observation y i and can be scaled by the length of the intersection of the i-th projection ray (that generates y i ) with x j .The goal of CBCT reconstruction is to recover the non-negative voxels x from the observations y with the known system matrix A. To reduce the radiation dose, sparse projection views are adopted, which results in insufficient observations and thus makes the voxels hard to solve from Eq. (1).Regularization items are normally used to provide additional constraints to voxels, such that the voxels can be estimated by solving the following optimization problem [31] arg min where the first item is a data fidelity term to ensure that the projections of the estimated voxels are consistent with the observations captured from the CBCT device, R(x) is the regularization function that derives from inherent characteristics of voxel data for providing additional constraints to x, and λ is a factor to balance the weights of the two terms.

Solution via half quadratic splitting
To solve Eq. ( 2), the HQS technique is used to decouple the data fidelity term and the regularization term.By introducing an auxiliary variable f and replacing the original variable x in the regularization term, Eq. ( 2) can be written as arg min By using the penalty function method, Eq. ( 3) can be rewritten as an unconstrained optimization problem of two variables x and f arg min with a large penalty factor β. By fixing one variable and solving another alternatively, the CBCT reconstruction problem defined in Eq. ( 2) is finally transformed into the alternating iterative optimization of two simple sub-problems where β increases with the number of iterations.
The first sub-problem associated with the regularization term can be equivalent to a denoising task under the assumption of additive noise, i.e., restoring the true data f from the noisy data x i of a noise variance proportional to λ/β with a denoiser prior R(f).In this work, the weighted Schatten p-norm of similar voxel blocks is used as the prior to improve the denoising task, and the solution of Eq. ( 5) based on the WSNM is given in Section 3.
The second sub-problem associated with the data fidelity term is a linear least-squares problem in essence.It can be written as and solved directly by x i+1 = (A T A ) −1 A T y with the expanded system matrix and observation vector The initial voxels x 0 can be reconstructed from the observations y by using the SART algorithm, which continuously reduces the error between the estimated projections and the real observations by looping over all views until a convergence criterion or an upper limit of the iteration is met.The iteration formula of the SART is given by where x j (k) represents the j-th voxel in the k-th iteration, I ϕ represents the set of the projection rays at the view ϕ, α is the step size parameter that controls the degree of correction.
The proposed sparse-view CBCT reconstruction scheme is summarized in Algorithm 1.

Solving denoising sub-problem
Data and signals derived from real world normally contain structural information that has significant associations between their elements.Information redundancy caused by the correlations can be used to regularize the reconstructed voxels from insufficient observations.To achieve this, the noisy voxels reconstructed from Eq. ( 6) are divided into non-overlapping reference blocks of the same size.For each reference block, similar blocks that may overlap are grouped via a sliding-window matching as done in [32].Each group represents a specific type of structure and forms a 2D matrix by reshaping every block into a column vector, which should be low rank due to the similarity.By using the LRMA to each grouped matrix, the denoising sub-problem defined in Eq. ( 5) can be solved.As discussed in Introduction, a state-of-the-art LRMA technique, the WSNM, is employed to solve Eq. ( 5) in this work.The original WSNM algorithm is designed for 2D image denoising.Although the voxels can be processed with the 2D WSNM slice by slice, the 3D structure information between slices cannot be used for reconstruction.Therefore, we extend the WSNM based on the 3D block grouping.
Since the processing steps for each block group are exactly the same, without loss of generality, we use X to denote a 2D matrix grouped from the noisy voxels x.Each 2D matrix is denoised independently with the proposed WSNM algorithm, and a complete 3D voxel model can be finally obtained by reassembling the denoised blocks back.
With the assumption of additive white Gaussian noise, the noisy matrix X can be written as X = F + N, where N denotes the noise matrix, and the true matrix F is low rank due to the similarity of the grouped blocks.Let ||F|| w, p be the weighted Schatten p-norm of the matrix F where L is the rank of F, δ l is the l-th singular value of F, and w is a weight vector composed of the non-negative weights w l .With the WSNM, the true matrix F is recovered from the noisy matrix X by solving arg min where the penalty factors in Eq. ( 5) are merged into the weight vector w, which adjusts not only the weights of various singular values but also the balance between the F-norm fidelity term and the Schatten p-norm regularization term as a whole.In practice, the noise and artifacts introduced by the sparse-view reconstruction are very complex, which do not obey the Gaussian assumption.However, the WSNM can work well, as most of noise and artifacts will increase the weighted Schatten p-norm of the data matrix.Let σ 2 N denote the variance of the noise, which can be estimated effectively with the filter-based approaches from the noisy matrix X.The weights in w should be proportional to σ 2 N and increase with the size of the matrix to fit the F-norm of the fidelity term.In addition, since larger singular values normally correspond to important structural information that should be preserved more in the denoised voxels, smaller weights should be adopted for larger singular values, i.e. ω l ≤ ω l+1 for δ l ≥ δ l+1 .As suggested in [27], we set the weights as where c is a constant, S is the number of voxels contained in X, and ε is a very small constant to avoid dividing by zero.Because the true matrix F is unknown, its singular values δ l in Eq. ( 12) can be estimated by δl where σ l corresponds to the l-th singular values of the noisy matrix X.
Let X = UΣV T be the SVD of X, where Σ is the diagonal matrix of singular values {σ l } L l=1 sorted in descending order.As proven in [27], the optimal solution of Eq. ( 11) can be computed by F = U∆V T , where the singular values of F contained in ∆ are solved from arg min Equation ( 14) defines a complex non-convex and non-smooth multi-variable optimization problem with coupled inequality constraints.Fortunately, with the ascending weights (0<w l ≤ w l+1 ) and the descending singular values (σ l ≥ σ l+1 >0), as shown in [27], Eq. ( 14) can be decoupled into L independent single-variable optimizations without any constraints arg min which can be easily solved by using the GST algorithm [28].
Given p and w l , a threshold is defined as If σ l <τ, δ l = 0 is the global optimum of Eq. ( 15), otherwise, the optimum satisfies and can be solved iteratively (for more details please refer to [28]).
Finally, by reassembling all denoised blocks back, the denoised voxels f can be obtained.The steps of the denoising algorithm is given in Algorithm 2. for each reference block, do

3:
Search similar blocks to form a 2D matrix X

Experiments and results
In order to evaluate the reconstruction results, three qualitative indices, the root mean square error (RMSE), peak signal to noise ratio (PSNR), and structural similarity index (SSIM) are used.
The RMSE is defined as where xi and x i are the voxels of the reconstruction model and ground truth, respectively, and N is the total number of voxels.The PSNR is defined as where b is the bit number of the voxel value.Both RMSE and PSNR are based on the deviation of voxels and do not consider the structural information in the CBCT data, which may lead to inconsistent with the human perception.The SSIM measures the structural similarity of two signal, which is defined as where µ x and σ x are the mean value and variance of N reconstructed voxels {x i } respectively, µ x and σ x are those of the true voxels {x i }, σ xx is the covariance between {x i } and {x i }, and C 1 and C 2 are constants to avoid instability, which are set to 0.0001 (2 b − 1) 2 and 0.0009 (2 b − 1) 2 , respectively.The power p in the WSNM is set to 0.9 in the experiments.As indicated in [27], the best power p is proportional to the noise level of the data being processed.In terms of the CBCT reconstruction, the power p can be chosen according to the number of projection views used for reconstruction, which means that a smaller p is preferred for more sparse views.The size of blocks is 4 × 4 × 4 and the number of similar blocks is 70.

Reconstruction of digital brain phantom
The digital brain phantom is used for validation, which is generated based on the real MRI data of human brain and widely used for evaluating the CBCT reconstruction [33].The phantom has 256 × 256 × 256 voxels with the resolution of 1 × 1 × 1 mm 3 .The Siddon's ray-driven algorithm [34] is used to simulate the projection data from 32 sparse views.The WNNM, proposed for conventional CT slice reconstruction in [35], is extended to 3D (labeled as WNNM-3D) as a benchmark for CBCT reconstruction.Besides, three classical methods discussed in Introduction, the FDK, SART, and ASD-POCS, are also used for comparison.
The reconstruction results are shown in Fig. 1, where the 66th slice in horizontal direction is selected for comparison due to containing rich structural information.The FDK proposed for full-view reconstruction cannot restore a clinically acceptable image from insufficient projection data.The reconstructed slice in Fig. 1(b) is seriously damaged by the streak artifacts.The SART is much better than the FDK, however, as shown in Fig. 1(c) the streak artifacts is still obvious within the tissue region.The ASD-POCS can remove the artifacts and suppress the noise very well, however, meanwhile the edges are blurred and some critical structural details are ignored in Fig. 1(d).The WNNM-3D and the proposed WSNM-3D methods perform excellently in the preservation of structure information.In terms of noise suppression, the proposed method (see Fig. 1(f)) is better than the WNNM-3D (see Fig. 1(e)) and much better than the SART.For a clearer comparison, the region of interest (ROI) marked by the box in Fig. 1(a) are zoomed in and shown in Fig. 2. As indicated by the red arrow, nearly perfect edges are restored by the WNNM-3D and the WSNM-3D methods, while the edges reconstructed by the ASD-POCS are too smooth and compared with the ground truth in Fig. 2(a) some structural details (indicated by the yellow arrow) are obviously lost in Fig. 2(d).
To show the difference between voxels intuitively, Fig. 3 shows the voxel curves and their local magnifications along the coronal axis (128th row, 66 slice) and the vertical axis (128th row, 128th column).As shown in Fig. 3(b) and (e), for the segments where the voxel values change greatly, the curves yielded by the proposed algorithm are the most consistent with the reference curves (ground truth), which means a better edge preservation ability, by contrast, the ASD-POCS is the worst performer, whose curves are too smooth.While for the smooth segments shown in Fig. 3(c) and (f), the ASD-POCS, WNNM-3D, WSNM-3D performs much better than the SART, whose curves are polluted seriously by the noise.The results of the quantitative evaluation are given in the Table 1.Consistent with the visual inspection, the proposed algorithm achieves the best scores in all three indices.

Reconstruction of real clinical data
Clinical voxel data captured from an abdomen routine scan with a Neusoft NeuViz 128 CT system are also used for validation.The X-ray tube voltage is 120kVp, the tube current is 375mA, and the exposure time is 20ms/slice.The original voxels is 512 × 512 × 266, which are down-sampled to 256 × 256 × 266 with a resolution of 1.6367 × 1.6367 × 1 mm 3 as the ground truth.The projection data simulated at 32 views are used for reconstruction.Figures 4 and 5 show the reconstruction results at the slices 60 and 200, respectively, which are compared with the ground truth.The ROIs marked by the boxes are zoomed in and shown in Fig. 6.The results are consistent with those on the digital brain phantom.As marked by the arrows, the SART preserves edges well but introduces serious artifacts, while the ASD-POCS can suppress artifacts but also blur the edges.The WSNM-3D, slightly better than the WNNM-3D, yields more balanced results.Figure 7 shows the absolute difference images between the reconstruction slices and the ground truth.The WSNM-3D achieves the smallest differences at both slices, which further confirms the advantages of the proposed method.For further validation, the reconstruction results in the sagittal and coronal planes are shown in Figs. 8 and 9, respectively.The results are consistent with those in the horizontal plane (see Figs. 4 and 5), the proposed algorithm achieves a good balance between the artifacts suppression and the structural information preservation.
Table 2 shows the quantitative evaluation of the reconstruction results for the clinical data.The best results are also achieved by the proposed algorithm.
Discussion on computational complexity: The most time-consuming step of the proposed algorithm is step 3 in Algorithm 1, i.e., the WSNM for denoising voxels (Algorithm 2), and    In our experiments, reconstructing a 256 × 256 × 266 clinical voxel data with the WSNM-3D algorithm takes about 4 hours by using Matlab 2016a on a PC with 3.3GHz CPU and 12GB RAM.The WNNM-3D takes about the same amount of time.The FDK is the most efficient algorithm, taking about 3 minutes.The ASD-POCS and SART take about 20 and 30 minutes respectively.The WSNM-3D is the most time consuming methods in the tests.However, since the WSNM for each reference block is exactly the same, which can be computed in parallel and independently, the reconstruction speed of the WSNM-3D algorithm can be improved greatly by using GPU acceleration and more efficient programming languages (e.g., C/C++).

Conclusion
This paper proposes a novel iterative algorithm for sparse-view CBCT reconstruction based on the weighted Schatten p-norm minimization.By using the half quadratic splitting, the spare-view CBCT reconstruction problem is decomposed into two sub-problems: simple reconstruction sub-problem and denoising sub-problem, which can be solved alternately.The weighted Schatten p-norm minimization proposed in the field of image denoising is extended to 3D to regularize the denoising sub-problem.Benefiting from the extended WSNM, the structural similarity in the 3D voxel can be used effectively to improve the reconstruction results.The digital brain phantom and clinical CT data are used to verify the proposed method.Experimental results show that the proposed algorithm can effectively eliminate the staircase effect, remove artifacts and suppress noise.In terms of edge/structure information preservation, the proposed algorithm performs better than the classical algorithms.The results measured by the three quantitative indicators also confirm the advantage of the proposed algorithm.

Fig. 7 .
Fig. 7. Absolute difference images between the reconstruction slices and ground truth.The first row shows difference images of slice 60 and the second shows those of slice 200.The columns from left to right are obtained by FDK, SART, ASD-POCS, WNNM-3D, and WSNM-3D.