Tumor Segmentation in Contrast-Enhanced Magnetic Resonance Imaging for Nasopharyngeal Carcinoma: Deep Learning with Convolutional Neural Network

Objectives To evaluate the application of a deep learning architecture, based on the convolutional neural network (CNN) technique, to perform automatic tumor segmentation of magnetic resonance imaging (MRI) for nasopharyngeal carcinoma (NPC). Materials and Methods In this prospective study, 87 MRI containing tumor regions were acquired from newly diagnosed NPC patients. These 87 MRI were augmented to >60,000 images. The proposed CNN network is composed of two phases: feature representation and scores map reconstruction. We designed a stepwise scheme to train our CNN network. To evaluate the performance of our method, we used case-by-case leave-one-out cross-validation (LOOCV). The ground truth of tumor contouring was acquired by the consensus of two experienced radiologists. Results The mean values of dice similarity coefficient, percent match, and their corresponding ratio with our method were 0.89±0.05, 0.90±0.04, and 0.84±0.06, respectively, all of which were better than reported values in the similar studies. Conclusions We successfully established a segmentation method for NPC based on deep learning in contrast-enhanced magnetic resonance imaging. Further clinical trials with dedicated algorithms are warranted.


Introduction
Head and neck cancer (HNC), especially nasopharyngeal carcinoma (NPC), is an aggressive cancer type with high incidence rate in Southern China [1]. The cancer incidence data collected in Guangxi and Guangdong show that nasopharyngeal cancer is the fourth most common cancer for males [2]. External beam radiation therapy is the primary therapy to this cancer. The 3-year local control rate for NPC after therapy is higher than 80% and the 3-year overall survival rate is up to 90% [3]. Noninvasive medical imaging is of great importance to determine the tumor volume for successful radiation treatment planning [3,4].
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI), a functional noninvasive imaging modality, plays a key role in the studies of cancer by providing information about physiological characteristics in tissues. Studies have concluded that DCE-MRI is useful in differentiating tumors from normal tissues in NPC [4]. Accurate segmentation of NPC tumors from DCE-MRI is important for the radiotherapy treatment planning and prognosis evaluation. However, the accuracy of tumor segmentation in DCE-MRI is affected by some imaging factors such as low spatial resolution, poor signal-to-noise ratio, partial volume effect, and the intensity changes during perfusion [5].
There have been many studies performed to automatically segment NPC tumors from medical images. Zhou et al. [6] performed NPC tumor segmentation in MR images by using Semi-Fuzzy C-means with the percent match (PM) values close to 0.87. Zhou et al. [7] performed NPC tumor 2 BioMed Research International segmentation in MRI by using the two-class support vector machine (SVM) method with PM values close to 0.79. Huang et al. [8] performed semisupervised NPC lesion extraction in MR images by using spectral clustering-based method with the positive predictive value up to 0.71. Huang et al. [9] performed NPC tumor segmentation by using Bayesian classifiers and SVM method with average specificity of 0.93.
The above-mentioned methods were all conventional machine learning techniques that require subjective feature extraction and selection. Deep learning (DL) technique, such as convolutional neural network (CNN), has recently emerged as a powerful tool in solving the challenges aforementioned, which detects low-level features such as shape and texture information autonomously from small patches of the input images and then combines these features into high-level features for the image processing tasks such as classification, segmentation, and detection without the subjective feature extraction and selection [10,11]. Deep learning techniques perform even better in generalization with new datasets [12].
To the best of our knowledge, DL with CNN technique in tumor segmentation has recently attracted research interest [13,14]. Wang et al. [15] performed NPC tumor segmentation in MR images by using deep convolutional neural networks; however, the average Jaccard similarity coefficient (JSC) value was less than 0.8. In the current study, we reported an automatic and accurate segmentation method based on the CNN architecture with dynamic contrast-enhanced MRI.

CE-MRI and Preprocessing.
Twenty-nine newly diagnosed NPC patients from August 2010 to April 2013 were included from the First Affiliated Hospital, Sun Yat-Sen University. This study was approved by the local institutional review board of Sun Yat-Sen University. Written informed consent was obtained from each patient before the MRI scan. PVE could severely affect the images whenever the tumor size is less than 3 times the full width at half maximum (FWHM) of the reconstructed image resolution [16]. Thus, the patients with lymph nodes or lesions smaller than 1 cm were excluded in the current study to avoid possible partial volume effects (PVE), according to the advice from the radiologists. Imaging of DCE-MRI was performed in the primary tumor region including the retropharyngeal nodes with regional nodal metastasis, in with a 3.0-T MRI system (Magnetom Trio, Siemens) with the field of view of 22cm×22cm×6cm (AP×RL×FH), a flip angle of 15 ∘ , and scanning time of 6 minutes and 47 seconds, resulting in 65 dynamic images. The contrast agent gadolinium-diethylenetriamine pentaacetic acid (Gd-DTPA) (Omniscan; Nycomed, Oslo, Norway) was injected intravenously as a bolus into the blood at around the 8th dynamic acquisition using a power injector system (Spectris Solaris, MedRad, USA) and a 25 mL saline flush at a rate of 3.5 mL/sec was immediately followed. The dose of Gd-DTPA was 0.1 mmol per body weight in kg of the patient. The matrix of the 65 reconstructed dynamic image was 144×144×20×65.
The ground truth was manually contoured in ImageJ (National Institutes of Health, Bethesda, MD) with the consensus between two experienced radiologists (Dr. Yufeng Ye, 13 years' experience, and Dr. Dexiang Liu, 18 years' experience in Radiology) who were blind to this study. Since tumors were mostly enhanced at the 35 th scan of our DCE-MRI, this scan from each patient was used for training and testing our DL model, and we only selected the scanned images containing the tumor area. There were a total of 87 slices of CE-MRI acquired from each of the 29 patients. To fulfill the requirement of large number of data in training the DL model, we augmented the 87 MRI to more than 60,000 slices of images by using the following methods [17], namely, rotating each slice between -10 degrees and 10 degrees with an interval of 2 degrees to augment each slice to 11 slices, changing the image contrast with an embedded Matlab function, Imadjust, to adjust the image contrast automatically to produce 33 extra different slices from one single slice and adding Gaussian noise to the images with a power of 1×10 −8 to produce 2 different additional slices from each slice. Totally we augmented the images by 11×33×2=726 times for each patient's CE-MRI set to give a total of 63126 (87x726) slices. These augmented images were then normalized by performing Z-score translation [18], in which the image intensity value in each voxel was normalized by the mean intensity of this image.

CNN Network.
The CNN network included two phases of feature representation and scores map reconstruction. In the feature representation phase, the network consisted of 2 Pool-Conv-ReLu blocks (P1-P2) and 4 Conv-ReLu blocks (C1-C4) (see Figure 1). A Pool-Conv-ReLu block included one pooling layer (Pool), one convolution layer (Conv), and one rectified linear units (ReLu) layer, while a Conv-ReLu block consisted of one convolution layer and one ReLu layer. The convolution layer detected local features from the input images and the ReLu layer accelerated the convergence. The pooling layer was designed for reducing the dimension of feature maps and network parameters. The input images with a matrix size of 144×144 were transformed into the feature maps of matrix size of 36×36 in the feature representation phase.
In the scores map reconstruction phase (D1-D2, Ct1-Ct2, C5-C6), the images were reconstructed from the 36×36 feature maps. Two deconvolution layers (D1-D2) were applied to reconstruct an output image with a matrix size of 144×144. Since some image details could be missing in this reconstruction from the 36×36 feature maps, the fine features obtained from the previous feature representation phase were combined with the scores map to allow the integration of local and global multilevel contextual information. A concatenate layer was then used for the information connection. Then a convolution layer was applied for information fusion and the final reconstruction. The detailed parameters of the CNN network are shown in Table 1.

Model Training and Model-Based Segmentation.
A stepwise training scheme was used to train the DL CNN network. Firstly, we trained the network in the feature representation 144×144 Output Scores map reconstruction phase Feature representation phase  In the training process, the weights were optimized in each iteration. The weight of a Gaussian distribution with mean of 0 and standard deviation of 1 was used in the convolution kernel at the initialization step. The training parameters were as follows: basic learning rate: 1×10 −7 , step size: 1x10 5 , gamma: 0.1, momentum: 0.9, weight decay: 5x10-4. It took 52 hours for a complete training procedure with a NVIDIA GeForce GTX 980 GPU equipped on an Intel Core i7 3.5 GHz computer.
We used the trained model in the segmentation tasks of NPC tumor lesions in the testing dataset. The testing images were input into the trained model. A score map representing the tumor region of the NPC tumor was acquired for each input image.

Tumor Segmentation.
We used the testing dataset to make forward propagation and evaluated the segmentation performance based on the trained model. Parameters of recall, precision, and dice similarity coefficient (DSC) were given by where true positive (TP) denotes the correctly identified tumor area and false positive (FP) denotes the tumor area, but the area is normal tissue in ground truth and false negative (FN) denotes normal tissue but the pixel isolated is tumor area in ground truth. And those are the results for each patient.
For the comparisons with other published results, values of corresponding ratio (CR), percent match (PM) [7], and Jaccard similarity coefficient (JSC) [15] were also calculated as The model validation technique of leave-one-out crossvalidation (LOOCV) was used such that, in one repetition, the images of 28 patients were used as the training dataset (which were then augmented to >60000 images), and the images of the remaining patient were used as the testing dataset. After each patient's images in these 87 images were tested, the mean and variance of DSC, recall, CR, PM, and JSC were calculated to evaluate the segmentation performance of our method. Table 2 tabulates the tumor volumes as segmented by the radiologist (the golden standard) and by the proposed automatic segmentation method together with DSC, CR, PM, recall, and JSC. These values were calculated for each patient, not for each lesion. Table 3 shows the comparison of segmentation performance in terms of DSC, CR, and PM between our current results of DL CNN network and those of published results using other models. The mean DSC with our method for 29 patients was 0.89±0.05, and the range was 0.80-0.95. The mean PM with our method for 29 patients was 0.90±0.04 with a range of 0.71-0.92, which was higher compared to the mean PM of the value less than 0.9 in other studies. The mean CR was 0.84±0.06 and the range was 0.83-0.96, while the mean CR was 0.72 in similar studies using other algorithms [7,15,19]. Figure 2 shows the segmentation with high accuracy, in which the DSC, CR, and PM were 0.941, 0.915, and 0.950, respectively, showing good accordance between segmentation results using our current DL CNN network and ground truth. Figure 3 shows a less accurate segmentation result as obtained by our current DL CNN model with values of DSC, CR, and PM being 0.797, 0.731, and 0.937, respectively, showing slight difference between segmentation results using our current DL CNN network and ground truth.

Discussion
Based on the CNN technique, we achieved a supervised segmentation method for NPC tumors in CE-MRI with high accuracy of mean DSC being 0.89. The performance was also robust with a low standard deviation of 0.05 for DSC among the results of different images. For comparison with the other studies, we calculated CR and PM. The mean values of CR and PM achieved with our method were 0.84 and 0.90, respectively. Compared with similar studies in literature, results of CR and PM in our study are more superior, indicating better accuracy in tumor segmentation with our current CNN technique than with other models with highest mean CR and PM of 0.72 and 0.90, respectively [7,15,19]. This may indicate that our method has indeed improved the automatic segmentation accuracy.
Firstly, the improvement may lie in the application of CNN to extract the image features automatically and objectively. In our model, the low-level features were combined into high-level features with semantic information through convolutions (Figure 1). By iterations through the back propagation algorithm, we highlighted the characteristics associated with the targeted area and gradually suppressed irrelevant features [12]. In this way, our model can extract the most useful features and achieve better segmentation results.
Secondly, in our designed network architecture, we fused the different feature maps at feature representation phase and scores map reconstruction phase for the final reconstruction. As shown in Figure 4(a), which was acquired in the reconstruction phase, the tumor location and shape are roughly visible; however, they are unclear. Through the fusion of this feature map and the fine-feature map acquired in feature representation phase, we may fix the problem of information loss in the reconstruction process. As shown in Figure 4(b), we finally had better segmentation through the reconstruction from the fused feature maps.   There is space to further improve the accuracy and effectiveness of our current model. As shown in Table 3, our method resulted in less accurate DSC results of 0.80 in some cases. For further improvement, we may include T2 weighted images, since T2 weighted images are widely used in the manual contouring of tumor regions. Therefore we would expect to have better performance with both the DCE-MRI with T2 weighted images. We applied the Z-score translation in preprocessing to normalize the DCE-MRI [18]. However, some information could be lost during this normalization. Therefore, we may investigate an appropriate method of normalization to avoid the loss of intrinsic image features. Importantly, we may improve our network architecture, such as the depth of our network, for the direct training of the 3D BioMed Research International 7 images and the incorporation of time domain information from the dynamic scans. In future studies, it is expected to further improve our method and the segmentation results with these ideas.

Conclusion
A robust segmentation method for NPC tumor based on deep learning convolutional neural network and CE-MRI has been established. The tumors can be segmented successfully in seconds with high accuracy. This automatic segmentation method may be time-effective in tumor contouring for routine radiotherapy treatment planning. Future studies may aim to improve the segmentation accuracy and efficiency with more training data and optimized network structure, thus helping clinicians improve the segmentation results in the clinical practice of NPC.

Data Availability
The authors do not have permission to share data.

Conflicts of Interest
All authors of the manuscript declare that there are no conflicts of interest with regard to equipment, contrast, drug, and other materials described in the study.