Deep embedding convolutional neural network for synthesizing CT image from T1-Weighted MR image
Introduction
Computed tomography (CT) and structural magnetic resonance (MR) images are both important and widely applied in the treatment planning of radiotherapy (Balter et al., 1998, Chen et al., 2004, Khoo et al., 1997, Schad et al., 1987). Recently, it has become desirable to synthesize CT image from the corresponding MR scan. For example, quantitative positron emission tomography (PET) requires CT image for attenuation correction (Carney et al., 2006, Kinahan et al., 1998, Pan et al., 2005). The approach for CT-based attenuation correction is to transform the CT image, which is expressed in Hounsfield units, into an estimate of the linear attenuation map. This map is then projected to obtain the attenuation correction factors for PET (Carney et al., 2006). However, unlike the traditional PET/CT scanner, the MR signal in the cutting-edge PET/MR scanner is not directly correlated with the tissue density and thus cannot be applied to attenuation correction after simple intensity transform (Wagenknecht et al., 2013). As a possible solution, one may segment MR images to identify different tissues first. This is often challenging because some parts in the MR images, e.g., bone and air-filled cavities, are presented in similar intensities while they have different attenuation properties. With the CT image synthesized from the T1-weighted MR image, however, the challenge can be greatly alleviated.
In this work, we aim to address the problem of synthesizing CT from T1 MR. But these two modalities differ largely regarding their image appearances, which makes this synthesis problem challenging to solve. Examples of the T1-weighted MR images and their corresponding CT images are shown in Fig. 1. These images are acquired from the same patients, i.e., (a) for the brain, and (b) for the prostate, respectively. In the MR images, the intensity values of ‘air’ and ‘bone’, as pointed by the blue arrows and orange arrows, are both low. However, in the CT images, the ‘air’ appears dark, while the ‘bone’ turns to be bright. In general, the intensity mapping between these two modalities of MR and CT is highly complex, encoding both spatial and context information in non-linear mapping.
There are several reports in the literature focusing on inter-modality medical image synthesis, i.e., from MR to CT. These methods can be mainly categorized into the following three classes.
- (a)
Atlas-based methods. In the atlas-based methods (Arabi et al., 2016, Hofmann et al., 2008, Kops and Herzog, 2007), a set of atlases are prepared in advance, each of which consists of both MR and CT acquisitions. Given a new subject with the MR image only, all atlases are first registered with the new subject by referring to their respective MR images. Then, the resulted deformation fields are applied to warp the respective CT images of the atlases to the new subject space, from which the subject CT image can be synthesized through the fusion of aligned atlas CT images (Burgos et al., 2014). Clearly, the performances of the above methods are highly related with the registration accuracy, and the quality of the synthesized CT image also relies on the sophisticated strategies to fuse the warped CT images. Note that the atlas-based methods may also cost high computational time for registering all images.
- (b)
Sparse-coding-based methods. These methods (Yang et al., 2012, Yang et al., 2008) usually involve several steps in the respective pipelines. First, the overlapping patches are extracted from the new subject MR image. These subject MR patches are then encoded by a MR patch dictionary that is built from the linearly aligned MR atlases. The obtained sparse representation coefficients are transferred to the coupled CT patch dictionary (also built from the linearly aligned CT atlases), to fuse the respective CT atlas patches for finally synthesizing the subject CT image. Roy et al. (2010) applied this framework for predicting FLAIR image from T1- and T2-weighted MR images. Similarly, (Ye et al., 2013) estimates T2- and diffusion-weighted MR images from T1-weighted MR. But one main drawback of these methods is that the estimation is computationally expensive (Dong et al., 2016a) due to the need of sparse coding optimization upon all image locations. Because each location needs extract patch and go through all the operations to get its corresponding predicted patch. And constructing a global dictionary means the dictionary needs to be of big size for ensuring the final prediction performance, which obviously adds the cost time for solving the sparse representation coefficients (Dong et al., 2016a, Yang et al., 2008).
- (c)
Learning-based methods. These methods learn the complex mapping from the local detailed appearances of MR images to those of CT images in the same subjects (Huynh et al., 2016, Johansson et al., 2011, Roy et al., 2014). In order to address the issue of expensive computation in sparsity learning based methods, Huynh et al. (2016) presented an approach to estimate CT image from MR using the structured random forest and the auto-context model. Vemulapalli et al. (2015) proposed an unsupervised approach to maximize both global mutual information and local spatial consistency for inter-modality image synthesis. But, such methods often have to first decompose the whole input MR image into the overlapping patches, and then map each MR patch to the corresponding CT patch. Also, the additional computation cost can be high in order to assemble the overlapping CT patches into a single output image.
Recently, the convolutional neural network (CNN) has shown its tremendous popularity and good performance in the computer vision and medical image computing fields (Liao et al., 2013, Ren et al., 2018, Xiang et al., 2017, Xu et al., 2016). CNN is capable of modeling non-linear mapping between different image spaces, without defining hand-crafted features. Also, CNN-based method can overcome time-consuming problem of the patch-based method by taking whole image as input and outputting its whole image prediction during testing stage. Successful applications can be found by reconstructing high-resolution images from low-resolution images (Dong et al., 2016a), and by enhancing PET signals from the simultaneously acquired structural MR (Li et al., 2014). (Han, 2017) also proposed a deep convolutional neural network method for CT synthesis from MR image, which achieved reasonable performance compared to the atlas-based methods. However, this method can only process a single slice through each forward mapping. To handle the problem of 3D MR-to-CT synthesis, this method had to process multiple slices independently, which could often cause discontinuity and artifacts in the synthesized CT images. Besides CNN-based network, Van Nguyen et al. (2015) proposed the location-sensitive deep network (LSDN) for synthesizing images across domains by integrating intensity features from image voxels and their spatial information.
In this paper, we propose a deep embedding convolutional neural network (DECNN) to synthesize CT images from T1-weighted MR images. Concerning the examples in Fig. 1, the mapping from MR to CT can be highly complex, as the appearances of these two modalities vary significantly across spatial locations (Wagenknecht et al., 2013). This large inter-modality appearance gap challenges the accurate learning of CNN. To this end, we decompose the CNN model into two stages: 1) the transform stage and 2) the reconstruction stage. The transform stage is a collection of the convolutional layers that are responsible for forwarding the feature maps, while the reconstruction stage aims to synthesize the CT image from the transformed feature maps. Besides, we also propose a novel embedding block. Note that the embedding block is able to synthesize the CT image from the tentative feature maps in the CNN. Next, the tentative CT synthesis is embedded with the feature maps, thus the newly embedded feature maps become more related to the CT images and can be further refined by the subsequent layers in the CNN. More importantly, we insert multiple embedding blocks into the transform stage to derive our DECNN accordingly. Note that the embedding block is similar to deep supervision (Lee et al., 2015) which has been adopted by many computer vision tasks (Chen et al., 2016, Xie and Tu, 2015). Holistically-nested edge detection (HED) method (Xie and Tu, 2015), for example, leverages multi-scale and multi-level feature learning to perform image-to-image edge detection. This method results in multi-outputs and then fuses them in the end of the network. DCAN (Chen et al., 2016) takes advantages from the auxiliary supervision by introducing multi-task regularization during training. Our embedding block goes beyond deep supervision since the midway feature maps are further embedded into the subsequent layers of the network. The embedding block thus provides consistent supervision to facilitate the modality synthesis and improve the quality of the final results. This embedding strategy also resembles the auto-context model (Tu and Bai, 2010). We note that auto-context model generally requires independent learning for each stage, while our DECNN can integrate all embedding blocks into a unified network for end-to-end training.
The advantage of the embedding block is that it has greatly strengthened the inter-modality mapping capability of DECNN regarding MR and CT images. In particular, the tentatively synthesized CT images are embedded to generate better feature maps, which will be transformed forward in the purpose of refining the synthesis of the CT image. Also, through the experiments we find that the embedding block contributes to faster convergence when training the deep network in back-propagation. Moreover, DECNN allows us to process all test subjects in a very efficient end-to-end way.
Our main contributions can be summarized as follows.
- •
We propose a very deep network architecture for estimating CT images from MR images directly. The network consists of convolutional and concatenation operations only. It can thus learn an end-to-end mapping between different imaging modalities, without any patch-level pre- or post-processing.
- •
To better train the deep network and refine the CT synthesis, we propose a novel embedding strategy, to embed the tentatively synthesized CT image into the feature maps and further transform these features maps forward for better estimation of the final CT image. This embedding strategy helps back-propagate the gradients in the network, and also make the training of the end-to-end mapping from MR to CT much easier and more effective.
- •
We carry out experiments on two real datasets, i.e., human brain and prostate datasets. The experimental results show that our method can be flexibly adapted to different applications. Moreover, our method outperforms the state-of-the-art methods, in terms of both the accuracy of estimated CT images and the speed of synthesis process.
The rest of this paper is organized as follows. In Section 2, we present the details of our proposed DECNN for estimating CT image from MR image. Then, in Section 3, we conduct extensive experiments, evaluated with multiple metrics, on both real brain and prostate datasets. Finally, we conclude this paper in Section 4.
Section snippets
Method
CNN is capable of learning the mapping between different image spaces. We adopt the CNN model similar to Dong et al. (2016a) for the task of MR-to-CT image synthesis, and then develop our DECNN accordingly. As mentioned above, we decompose the CNN model into two stages, i.e., (1) the transform stage and (2) the reconstruction stage, as also illustrated in Fig. 2(a). The transform stage is used to forward the feature maps (i.e., derived from MR images), such that the CT image can be synthesized
Experimental result
In this section, we evaluate the performance of our method on two real CT datasets, i.e., (1) brain dataset and (2) prostate dataset, which are the same datasets used in (Huynh et al., 2016, Nie et al., 2016). We first describe the datasets used for training and testing our method. Next, more detailed training setup is given. Subsequently, we analyze the effect of the embedding blocks in our architecture. We also present both qualitative and quantitative comparisons between our DECNN model and
Discussion
We have presented a novel MR-to-CT mapping method for different modality transform on both brain and prostate datasets. Compared with the traditional learning based method, our DECNN model not only achieves the best synthesis result, but also performs several times or even orders of magnitude faster in the testing stage. There are also some limitations for our method. First, our method costs lots of time for training, which generally takes 2–3 days to get a model, while traditional methods
Conclusion
In this paper, we propose a novel DECNN model to synthesize the CT image from the T1-weighted MR image. Deep learning is well known for its capability in encoding the highly complex mapping between two different image spaces. The embedding block, which embeds the tentative CT estimation into the flow of the feature maps in deep learning, is integrated with CNN in our work. Thus, our derived DECNN can transform the embedded feature maps forward and reconstruct better CT synthesis results in the
Acknowledgements
This work was supported by National Key Research and Development Program of China (2017YFC0107600), National Natural Science Foundation of China (61473190, 81471733, 61401271), Science and Technology Commission of Shanghai Municipality (16511101100, 16410722400). This work was also supproted in part by NIH grants (EB006733, CA206100, AG053867).
References (51)
- et al.
Improvement of CT-based treatment-planning models of abdominal targets using static exhale imaging
Int. J. Radiat. Oncol. Biol. Phys.
(1998) - et al.
MRI-based treatment planning for radiotherapy: dosimetric verification for prostate IMRT
Int. J. Radiat. Oncol. Biol. Phys.
(2004) - et al.
A global optimisation method for robust affine registration of brain images
Med. Image Anal.
(2001) - et al.
Magnetic resonance imaging (MRI): considerations and applications in radiotherapy treatment planning
Radiother. Oncol.
(1997) - et al.
Deep auto-context convolutional neural networks for standard-dose PET image estimation from low-dose PET/MRI
Neurocomputing
(2017) - et al.
Atlas-guided generation of pseudo-CT images for MRI-only and hybrid PET–MRI-guided radiotherapy treatment planning
Phys. Med. Biol.
(2016) - et al.
Attenuation correction synthesis for hybrid PET-MR scanners: application to brain studies
IEEE Trans. Med. Imag.
(2014) - et al.
Method for transforming CT images for attenuation correction in PET/CT imaging
Med. Phys.
(2006) - Chen, H., Qi, X., Yu, L., Heng, P.-A., 2016. DCAN: Deep contour-aware networks for accurate gland segmentation. arXiv...
- et al.
Image super-resolution using deep convolutional networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2016)
Accelerating the super-resolution convolutional neural network
MR‐based synthetic CT generation using a deep convolutional neural network method
Med. Phys.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification
MRI-based attenuation correction for PET/MRI: a novel approach combining pattern recognition and atlas registration
J. Nuclear Med.
Depth Map super-resolution by deep multi-scale guidance
Estimating ct image from mri data using structured random forest and auto-context model
IEEE Trans. Med. Imaging
Is synthesizing MRI contrast useful for inter-modality analysis?
Natural image denoising with convolutional networks
Caffe: convolutional architecture for fast feature embedding
CT substitute derived from MRI sequences with ultrashort echo time
Med. Phys.
Attenuation correction for a combined 3D PET/CT scanner
Med. Phys.
Elastix: a toolbox for intensity-based medical image registration
IEEE Trans. Med. Imaging
Alternative methods for attenuation correction for PET images in MR-PET scanners
Rapid multi-organ segmentation using context integration and discriminative models
Cited by (153)
SC-GAN: Structure-completion generative adversarial network for synthetic CT generation from MR images with truncated anatomy
2024, Computerized Medical Imaging and GraphicsCT synthesis from MR images using frequency attention conditional generative adversarial network
2024, Computers in Biology and MedicinesTBI-GAN: An adversarial learning approach for data synthesis on traumatic brain segmentation
2024, Computerized Medical Imaging and GraphicsDeep learning based synthesis of MRI, CT and PET: Review and analysis
2024, Medical Image AnalysisMSE-Fusion: Weakly supervised medical image fusion with modal synthesis and enhancement
2023, Engineering Applications of Artificial Intelligence