Image Decomposition Algorithm for Dual-Energy Computed Tomography via Fully Convolutional Network.

Background
Dual-energy computed tomography (DECT) has been widely used due to improved substances identification from additional spectral information. The quality of material-specific image produced by DECT attaches great importance to the elaborated design of the basis material decomposition method.


Objective
The aim of this work is to develop and validate a data-driven algorithm for the image-based decomposition problem.


Methods
A deep neural net, consisting of a fully convolutional net (FCN) and a fully connected net, is proposed to solve the material decomposition problem. The former net extracts the feature representation of input reconstructed images, and the latter net calculates the decomposed basic material coefficients from the joint feature vector. The whole model was trained and tested using a modified clinical dataset.


Results
The proposed FCN delivers image with about 60% smaller bias and 70% lower standard deviation than the competing algorithms, suggesting its better material separation capability. Moreover, FCN still yields excellent performance in case of photon noise.


Conclusions
Our deep cascaded network features high decomposition accuracies and noise robust property. The experimental results have shown the strong function fitting ability of the deep neural network. Deep learning paradigm could be a promising way to solve the nonlinear problem in DECT.


Introduction
Conventional single-energy X-ray technique provides information about the examined object which is not sufficient to characterize it precisely. Dual-energy computed tomography (DECT) provides additional information by using two different energy spectra to scan the object, which has been presented as a valid alternative to conventional single-energy X-ray imaging. In recent years, the adoption of DECT has gained increased attention in public security [1] and medical field [2,3]. e advantage of DECT is the ability for material characterization and differentiation [4]. is decomposition of mixture into two basic materials depends on the principle that the attenuation coefficient is material and energy dependent.
us, measurements at two distinct energies should permit the separation of the attenuation into its basic components. e quality of material-specific image produced by DECT attaches great importance to the elaborated design of the basis material decomposition method. e existing decomposition methods can be divided into two main categories: projection-based [5][6][7] and image-based [8][9][10]. Projection-based methods pass the projection data through a decomposition function, followed by image reconstruction such as filtered backprojection (FBP). It commonly provides better accuracy and reconstructed image with reduced beam-hardening artifacts in comparison with image-based methods. However, projection-based methods need matched projection datasets. is means that physically the same lines need to be measured for each spectrum, which is usually not the case in today's CT scanners. Image-based methods use linear combinations of reconstructed images to get an image that contains material-selective DECT information. It is an approximative technique, and the resulting images are less quantitative than with projectionbased methods. But image-based methods can handle mismatched projection datasets and are applicable to the decomposition of three or more constituent materials, which is more expedient in practice. us, they have been employed more frequently in modern DECT implementations.
e material decomposition problem in image domain can be described by the following equation: where μ H and μ L are the pixels in reconstructed images from low-and high-energy projections, respectively, and x 1 and x 2 are the corresponding points in decomposed basic materials images. e subscripts 1 and 2 indicate two specific materials. μ 1L/H and μ 2L/H are the average attenuation coefficients of the two basic materials under low/high-energy spectra. ese attenuation coefficients are usually obtained by manually selecting two uniform regions of interest (ROIs) on the CT images that contain the basic materials [9,11,12]. Direct material decomposition via matrix inversion is a way of calculating the points x 1 and x 2 in the decomposed image, which is written as follows: Equation (2) can be easily solved as long as the value of Δ � μ 1H μ 2L − μ 2H μ 1L is not null. However, values of the two terms in Δ do not differ significantly from each other. erefore, the decomposition result is very sensitive to the noise in the input reconstructed images. Various methods have been proposed to solve this noise suppression problem. Precorrection [13,14] methods reconstruct two waterprecorrected images, followed by a linear combination, to yield images that are free from cupping artifacts usually in water-equivalent materials. e noise reduction techniques after image decomposition include Kalender's correlated noise reduction (KCNR) [15,16], noise forcing (NOF) [17], and noise clipping (NOC) [18], whose most fundamental strategy is the application of a smoothing filter. Recent advanced iterative methods [9,10] consider the statistical properties of the decomposition process, producing highquality edge-preserving images. ese methods have shown great success on the decomposition problem. eir well performances rely on the well-handcrafted design of the algorithm.
In recent years, deep learning techniques, which use neural networks having a deep structure with three or more layers, have attracted widespread attention, mainly by outperforming alternative machine learning methods in numerous important applications. e current most popular deep model is the convolutional neural network (CNN) which has emerged as a powerful class of models for image classification [19,20] and object detection [21]. In the field of computed tomography, some of the recent studies have already attempted to use deep neural networks to solve the problems such as low-dose image denoising [22] and artifact reduction [23]. Wang [24] provides an analytical and global perspective to the combination of tomographic imaging and deep learning. For the material decomposition problem in DECT, several neural network-based methods have also been proposed, but they all decompose the material in the projection domain [7,25,26].
Inspired by the recent learning-based methods [27,28], in this paper, we propose an end-to-end image decomposition algorithm via deep learning techniques. A modified fully convolutional network is applied to extract the feature of reconstructed images and suppress the image noise at the same time. e last layer of the model is a fully connected layer to calculate the decomposed images from the extracted features. We demonstrate the effectiveness of our algorithm by the experiment on a clinical dataset. Two conventional algorithms are implemented and compared to the proposed FCN.

Fully Convolutional Network.
Fully convolutional network (FCN) is one kind of CNN, which is firstly proposed and used for semantic segmentation [29]. e standard CNN generally is composed of a pooling layer and a convolutional layer which are alternately connected. e convolutional layers learn the features of the input. e pooling layers guarantee that the deeper layers can extract higher scalelevel features through downsampling. In order to map the feature to the class labels, a fully connected layer will be added to the last output layer, which has fixed dimensions and throws away spatial coordinates. Due to this structural design, the naive CNN requires fixed-sized inputs and produces no-spatial outputs. e main idea of FCN is transforming the last fully connected layer into a convolution layer with kernels that cover its entire input region. is replacement policy brings about several advantages for FCN. First, the input of the net can be the images of arbitrary sizes, which means that the net can be trained on image patches and then tested on the fullsized images. Second, it can efficiently learn to make dense predictions for per-pixel tasks such as semantic segmentation. Lastly, per-pixel tasks for naive CNN generate a huge amount of redundant convolution computations at adjacent patches. FCN avoids such problems by computing all convolutions in the first layer on the entire input image, leading to significant speedup in the forward-propagation process.
Because of these advantages, FCN is especially suitable for solving the image-based material decomposition problem which can also be regarded as a per-pixel prediction task. In addition, convolution operation to image is interpretable, since it can be seen as a kind of image filtering.

Image Decomposition Model.
For image decomposition, we designed an end-to-end decomposition model based on FCN.
e proposed model takes reconstructed images as inputs and predicts the basic material coefficients pixel by pixel in the decomposed image, completing image decomposition and noise suppression at one time.
An overview of our model is illustrated in Figure 1. It is composed of two types of layers: convolutional and fully connected layers. Since the pooling layer may discard important structural details in feature maps, we omit it from the model to avoid losing the quality of result images. But no downsampling process by the pooling layer will lead to the same size of the feature maps at different layers. We hope the model can still catch the multiscale features of the image at different layers, so the strides of the convolutional layers are set to 2 to finish the downsampling operation. e input of the model is the image patch of 65 × 65 size in reconstructed images. ere are two independent fully convolutional nets corresponding to the reconstructed images from low-and high-energy projections. e two nets have the same layer structure and are called the L-FCN and H-FCN in short in this study. ey are composed of four convolutional layers. e output of layer n can be formulated as follows: where x n is the input feature map or images and W n f and b n f represent the convolutional kernel weights and bias parameter, respectively. * is the convolutional operation.
ReLU(x) � max(0, x) is the nonlinear active function of the neuron. e outputs of L-FCN or H-FCN (C 4 (x 4 )) are a 512 × 1 vector which represents the feature of the current input patch. e two feature vectors from L-FCN and H-FCN are merged into a joint vector. en, a fully connected layer is used to calculate the decomposed basic material coefficients from the joint vector, which follows the following equation: where X � (x 1 , x 2 ) is the predicted material coefficients vector, W c and b c are the unsolved parameter matrixes, and M represents the merged vector from L-FCN and H-FCN. e whole decomposed images can be obtained by traversing all the patches in the input images. e specific information about each layer of the proposed FCN is listed in Table 1.

e Training Detail.
e proposed FCN is implemented via the TensorFlow [30] framework on a computer platform containing two Titan X GPUs (a total of 24 GB video memory). e base learning rate of the model is 5 × 10 −3 , which decays by an exponential power of 0.9. ere are 1200 training samples in one batch. e mean squared error (MSE) is utilized as the loss function: where X � (x 1 , x 2 ) is the true value of the decomposed image. We used Adam [31] to optimize the loss function in this study. e entire model contains about 64k unsolved parameters and is trained for 40 epochs in 37 hours. e loss curve for training is plotted in Figure S1 in the Supplementary Materials.

Experimental Dataset.
e experimental data are acquired from a real clinical dataset which contains 5987 pleural and cranial cavity 512 × 512 images from 12 patients. ese raw data are obtained by one single-energy scan. e tissue and bone regions in the images are all manually sketched out. e images from 10 patients were selected to generate training samples, and the images from the rest of the patients were used for testing. All the images are split up into two partitions. Each partition includes regions of bone or tissue only and is used as the ground truth of the decomposed images. In order to generate dual-energy images, we processed the original raw data and simulated the imaging system. e original image is inconvenient to process for its small value. So, firstly, we amplified the value of the raw data to a proper range via a linear transform that follows the following equation: where Here, the different setting of λ t and λ b is for the purpose of better visual contrast in the transformed images. Secondly, we applied a BM3D [32] algorithm for attenuation of additive white Gaussian noise from the image. irdly, we used SpekCalc [33] software to generate 80 kVp and 140 kVp energy spectra, calculated the projection under the simulated scan of dual energy, and obtained the reconstructed images via filtered backprojection (FBP). Lastly, for each patient in the training set, we selected one slice every 10 images. en, for each image, we extracted 65 × 65 patches with the sliding interval of 5 pixels. e patch size was set to 65 × 65, the same as the input layer of the proposed FCN, getting totally 2,454,300 training patches.

Evaluation Metrics.
e proposed FCN is compared with two other algorithms, direct decomposition (matrix inversion) and iterative decomposition [9]. We choose the bias and standard deviation to evaluate the performance of these methods. Bias shows the difference between the measured value and expected value, which can be a measure of the precision of the result. Standard deviation (SD) reflects the degree of dispersion of the result. ey are calculated as follows: Computational and Mathematical Methods in Medicine where x i and x i are the predicted value and true value at point i of the image, respectively, μ is the mean value of the material, and N is the number of points in ROI.
In order to further investigate the robustness of the proposed FCN, before reconstruction via FBP, photon noise is introduced into the dual-energy projections. ere are two major types of noise in X-ray projection images [34]. One type is due to the electrical and roundoff error, which is image independent and can be considered as the Gaussian noise; the other type is due to the statistical fluctuation of the X-ray photons, which is image dependent and can be considered as the Poisson noise. e noise of the first type is small and is omitted in this study. e noise of the second type can be calculated as follows: where p L and p H are the noise-corrupted low-and highenergy projections, g(x) is a random process according to Poisson's distribution with mean x, and I L and I H are the number of photons of low-and high-energy incident X-rays. We set I L � 5 × 10 5 and I H � 1 × 10 6 in the experiments, respectively.

Results
We test our model on a cranial image and a pleural image which are excluded from the training dataset. Figure 2 shows the decomposition results by using three algorithms. e first column is the ground truth. Bone and tissue are chosen as the basis materials. Matrix inversion achieves similar results in vision as iterative decomposition. Loss of details and noticeable blocky artifacts are observed for the tissue and bone images from both algorithms. Figure 3 shows the zoom-in images whose areas are indicated in Figure 2 with a dashed rectangle. e iterative decomposition delivers smooth image due to its smoothness regularization term in loss function. It is noticeable that the proposed FCN suppresses most artifacts while preserving the structural features better than the competing algorithms. But there are not distinct improvements in edge-preserving. We guess this is mainly caused by the convolution kernel in the model. e convolutional operation of image can be seen as a kind of filtering. For quantitative evaluation, the bias and SD are calculated on the images generated by using different algorithms inside material's ROI and summarized in Table 2. Generally, the estimate of bone is more accurate than that of tissue. e proposed FCN achieves results closest in values to the ground truth, with about 60% smaller bias and 70% lower standard deviation than the competing algorithms, suggesting its better material separation capability.
To evaluate the potential improvement by FCN, we investigate the effects of photon noise on the material decomposition algorithms. e reconstructed image is generated from noise-corrupted projections as described in Section 3.2. Figure 4 presents the decomposition results on same testing images. It can be seen that direct matrix inversion magnifies the noises both in ROI and background. Iterative decomposition also suffers from serious artifacts.
is indicates that both algorithms are more sensitive to the noise. e proposed FCN yields the decomposed images that have not much noticeable change in comparison with the results in Figure 2.      Figure 2 with a dashed rectangle. Computational and Mathematical Methods in Medicine Figure 5 illustrates the absolute value of the difference between images in Figures 2 and 4, providing a visual comparison of the performance of noise suppression. For matrix inversion, the noise is statistically independent and evenly distributed in the images because the value of each pixel in decomposed images is calculated by using the corresponding pixel in projections. For iterative decomposition, the noise demonstrates a regional distribution characteristic. e region of tissue and background contain larger amount of noises than bone. In contrast, there are not much obvious differences in the result produced by the proposed FCN. Clearly, it outperforms the other two algorithms, more effectively suppressing image noise while keeping subtle structures. e quantitative results are listed in Table 3. In the case of photon noise, the bias and SD of the competing algorithms have increased in varying degrees. FCN still demonstrates good agreement to the true value, indicating its advantages on the antinoise capability.

Discussion
We have designed a cascaded neural network for the material decomposition problem. e reconstructed images are pixel wisely mapped to decomposed images via several convolutional layers and a fully connected layer. e size of the input layer is 65 × 65, based on the hypothesis that the value of the material coefficient depends largely on the local region in reconstructed images. e proposed FCN processes data in an end-to-end way, without any needs of precorrected images or other prior knowledge. e experimental results show its strong performance in capturing the localized structural information and suppressing image noise. e decomposed images generated by matrix inversion and iterative decomposition contain relatively a large amount of artifacts. In the robustness-testing experiment, the noise-corrupted inputs will have a negative impact on the performance of the other competing algorithms, but not much on the FCN. e proposed FCN still achieves excellent results which have low bias and standard deviation. Data augmentation was used in the training process. It brought no boost in performance but costs more training time. We guess the main reason for this issue is that the material decomposition is a regression problem. e value of the label is in a continuous space. Data augmentation assumes that the examples in vicinity share the same class. is hypothesis is usually plausible to the classification problem in which the label is a discrete variable, but unnecessary for the regression problem. e main drawback of our algorithm is the requirement of the specific type of material. Tissue and bone are selected as the basic material in the experiment. e whole model needs to be retrained if one of the materials was changed. So, we hope the proposed algorithm can be used in some applications such as medical diagnosis where the selection of the material is relatively fixed. e amount of training samples is another main factor contributing to the effectiveness of our model. Normally, more data bring better performance of the model. But it may be difficult to collect enough data in some conditions.

Conclusions and Further Work
In this study, we present a deep learning approach for the image decomposition problem in DECT. According to the preliminary decomposition results, we successfully prove the feasibility of the proposed algorithm which delivers image with 70% smaller bias and 60% lower standard deviation than the competing algorithms. A deep learning paradigm promises to improve the ability of solving the nonlinear problem in DECT.
We think there are two directions of work that are worth further researching. One is to extend our model to make it applicable to the three-materials decomposition problem. e other is the attempt of using the deconvolutional network which will output the whole decomposed images in a forwardpropagation calculation rather than pixel wisely prediction.

Data Availability
e code and data used in the research can be obtained from https://github.com/XYF-GitHub/ImageDecomposition-DECT.    Figure S1: the proposed model contains about 64k unsolved parameters and is trained for 40 epochs in 37 hours. e training batch size is 1200 reconstructed images from the noise-corrupted low-and high-energy projections. Figures  S2 and S3: more testing results to show the superiority of the proposed method. All the testing images are reconstructed from the noise-corrupted low-and high-energy projections. (Supplementary Materials)