Fast gridding projectors for analytical and iterative tomographic reconstruction of differential phase contrast data.

This paper introduces new gridding projectors designed to efficiently perform analytical and iterative tomographic reconstruction, when the forward model is represented by the derivative of the Radon transform. This inverse problem is tightly connected with an emerging X-ray tube- and synchrotron-based imaging technique: differential phase contrast based on a grating interferometer. This study shows, that the proposed projectors, compared to space-based implementations of the same operators, yield high quality analytical and iterative reconstructions, while improving the computational efficiency by few orders of magnitude.

A tomographic forward or backprojector based on the gridding method has O(N 2 log 2 N) complexity (N is the number of rows/columns of an image supposed to be square), leading to a significantly higher computational efficiency than a space-based implementation of the same operator (O(N 3 ) complexity) [17]. Several studies have also shown that CT analytical [9,10,15] and iterative [12,13] algorithms, adopting gridding projectors, can provide high quality reconstructions, while substantially reducing the calculation times (up to 3 orders of magnitude).
This work introduces a gridding forward and backprojector designed to perform analytical and iterative reconstruction of tomographic datasets, where the forward model is represented by the derivative of the Radon transform. These operators are, in particular, applied to the reconstruction of differential phase contrast (DPC) tomographic data acquired with a grating interferometer.
A grating interferometer is an imaging setup yielding three complementary types of informa-tion [18-20]: attenuation, differential phase and dark field signals. DPC focuses exclusively on the phase information related to the real part of the object refractive index and provides a high sensitivity to electron-density variations, down to 0.18 e/nm 3 [21]. For this reason, DPC based on a grating interferometer is particularly suitable for the visualization of soft-tissue specimens. DPC data can be analytically reconstructed by means of filtered backprojection without need for integration of the projection information, if an imaginary filter function corresponding to a Hilbert transform in the image space (HFBP) is used instead of the usual Ram-Lak filter [19]. This approach is however not suited for the reconstruction of DPC underconstrained data because of insufficient accuracy. Recent interest in the biomedical field for grating interferometry, when investigating dose sensitive specimens, has lead to several studies aimed at extending established CT iterative algorithms to the DPC case. In this regard, ad-hoc forward projectors working with a blob [22] and a cubic B-spline [23] basis and iterative schemes relying on different algorithms (the regularized maximum likelihood [22], the separable paraboloidal surrogate [24], the alternate direction method of multipliers [25] and a combination of statistical model and thresholding [26]) have been proposed.
So far, all analytical and iterative reconstruction methods developed for DPC utilize spacebased projectors, characterized by a complexity of O(N 3 ) and, therefore, strongly limiting their computational performance. The projectors proposed here greatly reduce the computational times required for analytical and, in particular, iterative reconstruction, while preserving the accuracy of the results.

Contributions
The contributions of this manuscript are summarized as follows: • Design of a gridding forward projector acting as derivative of the Radon transform.
• Optimization of the parameters by means of the minimal oversampling strategy [5].
• Comparison of accuracy and efficiency with a standard space-based implementation of the same projector.
• Validation of the proposed operators through analytical and iterative reconstruction of simulated and experimental DPC data.

Notation and preliminaries
This work focuses on parallel beam geometry, even though gridding can be extended to fanand cone-beam geometries [14-16].
The study object, i.e., an axial 2D slice of the whole 3D volume, is represented by a finite integrable real function Its CT forward projection or X-ray transform or Radon transform (in 2D, these transforms coincide [27]) is given by: where · represents the inner product, θ ∈ [0, π) and e θ = (cos θ , sin θ ). The variables used in (1) are displayed in Fig. 1(a). The derivative of the Radon transform is considered with respect to the variable t and is given by The collection of all projections (P {θ } or d P {θ } /dt) in parallel beam geometry is also called sinogram. Figure 1(b) shows an example of DPC sinogram; the pixels of each projection are placed along the row or channel direction. The 2D Fourier transform of a generic function g(a, b) is indicated withĝ or F a,b {g}; F a {g} or F b {g} represents the 1D Fourier transform along the variable a or b, respectively. (x, y), (t, θ ), (u, v) and (θ , ω) are the coordinates of the real-space Cartesian, real-space polar, Fourier Cartesian and Fourier polar reference frame, respectively. (· · · ) refers to entries of a continuous function, [· · · ] entries of a discrete function/array and {· · · } either entries of an operator or a collection of discrete values. Vectors and matrices are indicated with lower-and upper-case bold letters, respectively; both † and * correspond to the Hermitian adjoint operator.

Connection between gridding and CT
The gridding method (GM) retrieves a signal from samples of its Fourier transform located on a non Cartesian lattice: the samples are, first, convolved with a smooth and rapidly decaying window function, the inverse fast Fourier transform (IFFT) is computed and, finally, the contribution of the window function is removed from the signal in real space [1].
The connection between GM and CT reconstruction is provided by the Fourier slice theorem (FST) [27]:P The discretized version of (3) states that the fast Fourier transform (FFT) of a projection at angle θ yields equidistant samples off along the line v = u tan θ . The samples collected from multiple projections form a polar lattice in Fourier space. To retrieve f through IFFT, these polar samples need to be interpolated at Cartesian locations, an operation that can be performed by the GM. CT reconstruction through a filtered gridding backprojector consists in the following steps [6,8,10]: 2. ramp filtering of the sinogram ←−P Accordingly, the gridding-based implementation of R or forward gridding projector (FRP) is given by the same steps listed above (ramp filtering excluded) in the reverse order [9,13]: 4. IFFT1 of each polar slice ofP {θ } ←− P {θ } .

Differentiated forward gridding projector
The formula of the differentiated forward gridding projector (DFRP) is obtained from the FST (3), the definitions (1), (2) and from the property that derivatives in real space are mapped into multiplications in Fourier space.
where i is the imaginary unit (i 2 = −1), h andĥ are the gridding interpolation kernel and its Fourier transform, f (d) = f /h is the pre-corrected and oversampled version of f . Choosing the kernel separable, i.e., h(x, y) = ψ(x)ψ(y) =⇒ĥ(u, v) =ψ(u)ψ(v), simplifies the implementation of the method. Discretizing (4) results in: where i is the imaginary unit; θ m = mπ/M ∈ [0, π) for m = 0, 1, ..., M −1 and M is the number of views; the image is assumed to be square with N pixels (N is even) and to have unit resolution, therefore, x p = −N/2 + p and y q = −N/2 + q for p, q = 0, 1, ..., N − 1 are the image pixel coordinates; t n = −N/2+n for n = 0, 1, ..., N −1 (N is also the number of detector cells); the Fourier space is sampled at G evenly spaced points in the range [0.5, 0.5), therefore, u r = −1/2 + r/G and v s = −1/2+s/G for r, s = 0, 1, ..., G−1, G = αN for α > 1; α is called oversampling ratio; ω j,u = ω j cos θ m , ω j,v = ω j sin θ m and ω j = −1/2 + j/G for j = 0, 1, ..., G − 1; U m j (V m j ) are the sets of pixel coordinates u r (v s ) inside the interpolation support; g 1 and g 2 are factors mapping distances from the polar and Cartesian grid into look-up table (LUT) indices and depend on the sampling density of ψ andψ. F −1 ω and F {x,y} in (5) are implemented as IFFT1 along ω and FFT2 along x, y, respectively. f (o) represents a zero-padded or oversampled version of f by a factor α > 1.0. Gridding an oversampled grid prevents the appearance of aliasing artifacts due to tails of the reconstruction wrapping back into the field-of-view (FOV) [7,10]. A general rule of thumb is to set α = 2.0.
Since the DFRP is a linear operator, the differentiated gridding backprojector (DBRP) is the Hermitian adjoint operator of (5). From a computational point of view, the DBRP performs the same exact operations as the DFRP, but in reverse order.
The accuracy and efficiency of gridding operators depend on the choice of the finite kernel ψ: the smaller the support of ψ, the more accurate the pre-correction in real space; at the same time, the smaller the support ofψ, the faster the interpolation stage in Fourier space [7,10,13]. The accuracy of a convolution kernel is also strongly determined by the shape (in particular, the rolloff) of the central lobe and the amplitude of the aliasing sidelobes characterizing ψ. The pre-correction removes the rolloff induced by this central lobe, but, in doing so, the aliasing sidelobes are amplified [7,13].
In previous work [4,6,7], different compact kernels have been studied for gridding [7]. The conclusion is that a Kaiser-Bessel (KB) [28] kernel has an optimal compact support in real and Fourier space and provides analytical formulas, that enable shape optimization for ψ andψ on the basis of a certain criterion, if available. The formulas for the KB kernel are: valid for |k x | ≤ W /2G and where I 0 is the zero-th order modified Bessel function, W is the size of the convolving kernel and β is the tapering parameter, that determines how fast the KB drops to zero. The accuracy and computational performance of (5) with a KB kernel depends on α, W , β and S, the sampling density ofψ. Working with a pre-sampled LUT enables faster calculations and this approach is, therefore, preferable than invoking Eq. (6), every time a point needs to be interpolated in Fourier space. According to the minimum oversampling criterion, the KB's shape can be optimized by minimizing the maximum pixel aliasing error [5]: The optimal kernel ψ opt is found as solution of the minimization problem: which translates in the following results [5]: where NN and LIN refer, respectively, to the nearest neighbor and linear interpolation scheme to sample the convolution LUT and γ is the maximum allowed sampling error forψ. The minimal oversampling criterion (9) reduces the initial parameter space {W, S, α, β } to {W, γ, α}.

Algorithm complexity
The cost of the DFRP lies in the convolution and the call of FFT/IFFT. Given an input image of N × N pixels, an oversampling ratio α, a kernel width W and a number of views M, the convolution amounts to approximately W MN operations, whereas the overall call of FFT/IFFT corresponds to α 2 N 2 log 2 (αN) + αMN log 2 (αN) (1-time FFT-2D and M-times IFFT-1D) floating point operations. This second factor usually represents the leading cost term for real datasets.

Optimization approach
For the pure Radon transform case, an optimal KB kernel, ψ opt , can be constructed with any setting {W, S, α, β } fulfilling (9). The choice of one specific set among the optimal ones would then be dictated by computational efficiency criteria. However, the minimal oversampling criterion does not account for the influence of the term −2πiω j on the accuracy of the actual implementation of the DFRP. To assess which set of theoretically optimal gridding parameters provides the most accurate results for the DFRP, the 3D parameter space {W, γ, α} needs to be explored in a brute-force manner, through experiments making use of a family of functions, whose R (1) can be computed analytically. The following family of 2D radially symmetric functions is considered: The analytical differentiated Radon transform is given by: (11) To derive Eq. (11), one has to apply definition (2) to f (n) (x) and perform a cosinusoidal substitution to obtain an integral of the form π 0 dω (sin ω) m , whose solution is available in [29]. Optimal gridding parameters are evaluated for analytical and iterative reconstruction methods. For the former case, the Hilbert-filtered DBRP is applied to DPC sinograms, created with (11), and the obtained reconstructions are compared with the original phantom (10). For the latter case, an approach disregarding the specific choice of iterative algorithm (and its specific parameters) is followed: the forward projection computed with DFRP is compared to the analytical sinogram and is, then, reconstructed with the Hilbert-filtered DBRP. In this way, both the accuracy of the standalone forward projector and of the coupling forward-adjoint operator can be investigated.
The experiments in Sec. 4.2 make use of f (n) (x) with n = {2, 3}, sampled on a grid with 512 × 512 pixels and its analytical R (1) is computed for 805 views, homogeneously distributed in [0, π). The number of views is enough to prevent undersampling artifacts in the reconstructions, since 805 512 * π/2 (according to the FBP Nyquist criterion [27]). Results are not biased by the resolution of the phantoms, because experiments repeated with different grid sizes have yielded the same outcome.
The accuracy of the results is measured by the peak signal-to-noise ratio (PSNR) [30], a score quantifying the overall difference between the computed and reference image. The PSNR is defined as: where O and I are 2D arrays with M rows and N columns, being, respectively, the oracle and the image to analyze. Since in (12) the difference is located at the denominator, PSNR results are more sensitive to small variations from the oracle than the mean square error. When analyzing sinograms, the PSNR is computed over the entire image, whereas, for reconstructions, it is computed within the resolution circle. The mean structural similarity index (MSSIM) [31] has also been employed for the image analysis. Although confirming the trends described by the PSNR, the MSSIM results poorly sensitive for this kind of test. For this reason, only PSNR scores are reported in Sec.4.2.

Optimal gridding parameters
To limit the extent of the brute force search of {W, γ, α} opt , preliminary experiments, studying how the accuracy of the operators depends on just one parameter, while fixing the other two, have been performed. These experiments focus on the forward projection with DFRP and on the reconstruction of analytical sinograms with Hilbert-filtered DBRP. Results, collected in Figs. 2(a)-2(c), show the following key trends: the choice of α is more critical for the backprojector (Fig. 2(b)); the PSNR decreases only when γ > 10 −4 ( Fig. 2(a)); the DFRP is highly influenced by the choice of W (Fig. 2(c)).
The general brute force search of the 3D parameter space has W ranging in [2.7, 17.7] (20 points), γ in [10 −8 , 10 −3 ] (10 points) and α in [1.05, 3.0] (20 points). Results can be visualized by means of 2D PNSR maps, displaying the accuracy as a function of two parameters while the third one is fixed, as shown in Figs. 3(a)-3(c). Few relevant trends, mirroring the results of the preliminary experiments, can be recognized. The accuracy of DFRP and DBRP is only weakly influenced by γ, provided that this parameter is < 10 −4 (Figs. 3(a) and 3(b)). DFRP is very sensitive to the choice of W (Fig. 3(b)): the PSNR varies up to 5 dB for a difference of 0.8 in kernel size. The reconstruction accuracy of DFRP sinograms (coupling DFRP-DBRP) is sensitive to the choice of α (Fig. 3(a)) in a wider interval, 1.05 α 2.30, than for the reconstruction of analytical sinograms (DBRP standalone, Fig. 3(c)), 1.05 α 1.70.
The optimal parameters are found by locating the maximum PSNR values in the 3D volume, created by the brute force search. The PSNR reaches the maximum on a continuous spread area (the intersection of the darkest blue regions in Figs. 3(a)-3(c)), while no local extrema exist. Since the broad PSNR maximum corresponds to multiple parameter configu-  rations, {α,W, γ} opt is chosen as the peripheral point, requiring the smallest computational time. The optimal parameters for analytical reconstruction, as obtained from the top score for the reconstruction with Hilbert-filtered DBRP of analytical sinograms, are {W, γ, α} opt 1 = {4.45, 1.7 · 10 −6 , 1.75}. The optimal parameters for iterative reconstruction, merging the re-  sults for the computation of the forward projection with DFRP and the recostructions of these sinograms with Hilbert-filtered DBRP, are {W, γ, α} opt 2 = {6.6, 6.0 · 10 −6 , 2.38}. The second set of optimal parameters is more conservative and less computationally efficient, because it involves a bigger kernel size and a higher oversampling ratio compared to the first set. The aliasing error introduced by the DFRP can be amplified by the DBRP at each iteration in iterative schemes, therefore a more conservative choice of parameters is mandatory. The results of the brute-force search were confirmed, when using f (n) with n = 1, 4, 5.

Benchmark procedure
To benchmark the accuracy and efficiency of the proposed operators, a standard space-based implementation of R (1) has been considered: the forward projector introduced by [17] with additional differentiation along the channel direction. Considered a discrete image f [m, n], the original implementation of the Radon Transform of [17], based on linear interpolation, is given by: ... represents the floor operation, x min and y min correspond to the smallest abscissa and ordinate of the selected ray within the image grid. R (1) is computed from finite differences of Eq. (13) along the pixel direction, i.e. the t-axis. This operator is named differentiated Radon transform and abbreviated as DRT; its corresponding backprojector is abbreviated as DBP.
Experiments focus on the the impact of undersampling and noise in analytical and iterative reconstructions performed with the proposed operators on one side, DRT and DBP on the other side.
The PSNR and MSSIM are used for the assessment of the reconstruction quality of simulated data. For real data, since a reference image is not available, the signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) are averaged for multiple regions-of-interest (ROIs) at different distances from the image center. The SNR is computed within a ROI, supposed to be homogeneous; for the CNR, two different neighbouring ROIs are used, each supposed to be homogeneous. SNR and CNR are less reliable metrics than those relying on a reference image, but they can, nevertheless, give an approximate indication of image quality. The first experiment deals with a simulated undersampled and noisy dataset of f (n) (n = 1, 2, 3) with 512 × 512 pixels, computed by means of (11). A set of noiseless analytical sinograms is created with a number of views ranging from 20 to 600. Another set is, instead, created with an increasing amount of Poisson noise, with σ ranging from 1% to 50% of the average value, µ, of the sinogram, while the number of views is fixed to 805.

Validation of the proposed operators for analytical reconstruction
Since reconstructions with HF-DBRP and HF-DBP, for the same condition of either undersampling or noise, are indistinguishable at visual inspection, the analysis of the results relies on the information provided by PSNR and MSSIM (Figs. 4(a)-4(d)). Figures 4(a) and 4(b) show that the reconstructions with HF-DBRP have generally superior quality for a number of views ≥ 12% of what is required by the FBP sampling criterion (in this case, 100 views). For very low number of views, the HF-DBRP scores slightly worse than the space-based counterpart. Figures 4(c) and 4(d) show that the gridding algorithm appears quite sensitive to the level of noise affecting the data, since the drop of PSNR and SSIM for increasing noise is steeper compared to the other operator. The PSNR scores are everywhere higher for the gridding reconstruction, whereas, according to the MSSIM, for σ > 20%, the reconstructions with HF-DBP seem to perform better.
These results indicate that analytical reconstruction with the gridding backprojector is less sensitive to undersampling than to noise. The HF-DBRP may slightly underperform compared to a standard method, when a substantial amount of noise affects the data.
The second experiment tackles the reconstruction of a DPC sinogram of a rat brain, acquired at the TOMCAT beamline of the Swiss Light Source at the Paul Scherrer Institute (Switzerland). The array has 721 views × 1493 pixels. The reconstructions, shown in Figs. 5(a) and 5(b), are indistinguishable at visual inspection; nevertheless, the reconstruction with HF-DBRP shows improved SNR and CNR values (Table 3(a)). When the original sinogram is downsampled to 400 and 300 views, each pair of reconstructions still looks identical and the difference of SNR and CNR becomes rather small (Table 3(b) and 3(c)).
The computational efficiency represents the very attractive feature of the proposed operators. Table 4 reports the time elapsed (in seconds) for the reconstruction of DPC sinograms of various sizes. The used hardware is an Intel(R) Core(TM) i7-3520M CPU 2.90GHz. The HF-DBRP can speed-up calculations by two orders of magnitude for small datasets, up to three orders for bigger datasets with respect to a standard space-based operator.

Validation of the proposed operators for iterative reconstruction
Iterative reconstructions are performed by means of the alternate direction method of multipliers (ADMM) [33], solving a standard LASSO (Least Absolute Selection and Shrinkage Operator) problem. This method has been applied to DPC data in [25] with a more sophisticate formulation and a forward projector based on a cubic B-spline basis [23]. Details regarding the ADMM, its usage for tomographic reconstruction and the choice of the parameters are given in the Appendix.
Two implementations of the ADMM are tested in this section and the difference lies in the forward and backprojector used: the DFRP and its Hermitian adjoint operator (ADMM-DFRP) versus the DRT and its Hermitian adjoint operator (ADMM-DRT).
The first experiment focuses on the convergence of the algorithm: an analytical sinogram, created from f (2) and having 100 views × 256 pixels with additional Poisson noise of σ = 8%µ (this σ has been chosen to be slightly bigger than that estimated on the real data presented in the following), is reconstructed by means of ADMM and the relative norm of the difference between consecutive iterations, i.e. ||x (k+1) − x (k) || 2 2 /||x (k) || 2 2 , is measured. The ADMM runs for 100 iterations. Figure 6 shows that ADMM-DFRP and ADMM-DRT have almost exactly the same convergence speed and that the solution remains practically unaltered after 20 iter-   Table 3. ations. This fact proves that the gridding operators do not alterate the algorithm convergence with respect to standard projectors. For the next reconstructions, the ADMM is stopped when ||x (k+1) − x (k) || 2 2 /||x (k) || 2 2 < ε = 5 · 10 −6 . In a second step, the iterative reconstructions of the noisy and undersampled simulated sinogram, employed for the convergence test, are analyzed. Images in Figs. 7(b) and 7(c) show no visible difference; the PSNR and SSIM analysis (Table 5(a)) reveals a slightly better accuracy for the ADMM-DFRP.
Finally the sinogram of the rat brain, also used in the previous section, is reconstructed with the ADMM approach. In this case, the number of projections has been reduced by a factor of 2.5 leading to an undersampled dataset with an array of 300 views x 1493 pixels. Also for real data, the reconstructed slices (Figs. 8(a) and 8(b)) look almost identical at visual inspection; the analysis, reported in Tabs. 5(a) and 5(b), highlights a slight better performance for the ADMM-DFRP.
From a computational point of view, the proposed operators are ideal for iterative algorithms, since these methods generally call the forward projector and its Hermitian adjoint few times per iteration, often representing the computational bottleneck of the entire reconstruction process. For this last dataset, on an Intel(R) Core(TM) i7-3520M CPU 2.90GHz, a single conjugategradient sub-iteration within the ADMM (see Appendix) takes around 40s with the proposed operators (ADMM-DFRP), and around 830s for ADMM-DRT. Since the stopping criterion terminates the procedure after approximately 20 iteration, each involving 15 sub-iterations, the Table 3. SNR and CNR scores for the analytical reconstructions of the rat brain with 721 (a), 400 (b) and 300 (c) views. These values represent averages computed over the ROIs displayed in Fig. 5    time elapsed for the reconstructions, if run on a single core, is around 3.5h for the ADMM-DFRP and 69h for the ADMM-DRT.

Summary
Novel projectors have been introduced to address analytical and iterative reconstruction of tomographic differential phase contrast datasets, when the forward model is represented by the derivative of the Radon transform, R (1) . These operators are based on the gridding method and feature a complexity of O(N 2 log 2 N). The gridding parameters, offering the best tradeoff between accuracy and computational performance, have been experimentally optimized on the basis of the minimal oversampling strategy, that requires the utilization of a Kaiser-Bessel  kernel for the interpolation in Fourier space. The performance of the proposed differentiated gridding forward projector (DFRP) and backprojector (DBRP) have been assessed with respect to a standard space-based implementation of R (1) and R (1) * . The comparison has shown that the differentiated gridding operators can yield analytical and iterative reconstructions of the same quality as space-based operators. The great advantage of the proposed operators lies in the computational efficiency, since calculation times can be reduced by 3 orders of magnitude.