Introduction

Nasopharyngeal carcinoma (NPC) is one of the most common malignancies in Southeast Asia and China1. In modern intensity-modulated radiation therapy (IMRT) and volumetric modulated arc therapy (VMAT), adequately delivering dose to the target volumes while sparing the critical organs at risk (OAR) is key to the success of radiation therapy for NPC2,3. Adaptive radiation therapy (ART) may be crucial for radiotherapy of NPC4. Accurate acquisition of 3D volume images to monitor patient-specific variation in the radiotherapy process is essential in ART. Repeated scans with fan beam based CT are potentially most accurate image acquisition for ART. However, it will increase radiation exposure, delay or prolong treatment course, add logistic burdens to the patients and also increase the workload of the clinic. Recently, Cone-beam CT (CBCT) image acquisition with an on board imaging system equipped on the treatment delivery has been widely used in patient setup and monitoring of anatomical changes during the treatment5. However, the inferior image quality compared to conventional CT images and the uncalibrated Hounsfield Units (HU) of CBCT have been limiting its usage in ART due to the poor target and OAR delineation and incorrect dose calculation6.

Improving CBCT imaging quality and HU fidelity has been extensively studied7,8,9,10. A primitive method is to calibrate the electron density (ED) value of the CBCT images using the HU-ED curve obtained from a commercial phantom7. More robust methods, such as histogram-matching based solutions11, voxel-to-voxel mapping using deformable image registration (DIR)8 and Monte-Carlo (MC) based methods9, have been proposed for CBCT correction. It was reported that the dosimetric accuracy of CBCT corrected by an automated patient-specific calibration method was comparable to recalculation on conventional CT data sets for head-and-neck patients10.

More recently, deep learning (DL) methods such as U-net CNN12,13,14 and GAN15,16 have already been implemented widely in the generation of synthetic CT (sCT). In particular, Cycle-consistent adversarial network (CycleGAN) is one of the most commonly used methods for CBCT to CT transformation, as it does not require paired information of the training data17,18,19,20,21. Liang et al. developed a CycleGAN network to synthesize CT images from CBCT images for head-and-neck cancer patients, and the sCT images were both visually and quantitatively similar to real CT images17. Kida et al.18 indicated that CycleGAN could produce high quality CT-mimicking images from CBCT images while preserving anatomical structures for prostate cancer patients. Sun et al.19 proved that 3D CycleGAN improved electronic density and anatomical structure delineation accuracy, from 2D CycleGAN. DL generated sCT images have been reported useful in dose calculations in ART applications22,23,24. Their accuracy in clinical dose calculations for NPC16 and prostate cancer radiation therapy22 were preliminarily verified.

In this study, we report a hybrid use of both tradiation and DL methods in the correction of CBCT images for the application in ART. HU of the CBCT images for NPC patients were firstly corrected by a commercial phantom. SCT images were then generated by a CycleGAN from the original CBCT images, and their resulting image quality and dosimetric accuracy of the two sets of sCT images were evaluated.

Material and methods

Image acquisition and processing

52 NPC patients receiving radiotherapy in Fujian Cancer Hospital from 2020 to 2021 were included in this study. This study has been approved by the ethics committee of Fujian Cancer Hospital (ethics number: SQ2020-043-01) and all patients provided written informed consent prior to enrollment in the study. All methods were performed in accordance with the Declaration of Helsinki as well as relevant guidelines and regulations. During simulation for treatment planning, CT images were obtained on a Brilliance CT Big Bore (Philips Medical Systems Inc., Cleveland, OH, USA), with a head neck protocol (120 kVp, 225 mA). The CT image slice had a dimension of 512 × 512 pixels, with a voxel resolution of 1.14 × 1.14 × 3 mm3. All CBCT images were acquired before the patients’ first radiotherapy on XVI of an Elekta Axesse accelerator, with a tube voltage of 120 kV and an exposure current of 25 mA. The dimension of CBCT image slice was 410 × 410 pixels with resolution of 1 × 1 × 1 mm3.

CT images and CBCT images were rigidly registered with a benchmark of CBCT images, using an open source-software 3D-Slicer25. Then the axial aligned CT images were resampled to CBCT images voxel and size, called RCT as a reference standard for image evaluation. Binary masks were created based on threshold segmentation and morphological processing methods to avoid the adverse impact from non-anatomical structures during the process of training. The voxel values of images were clipped to the range of [− 1000, 2000], while the voxel values of regions outside the masks were set to − 1000 HU.

Before the training of CycleGAN model, each RCT and CBCT images were cropped from the image center to the size of 256 × 256 and the CT value were normalized to [− 1, 1]. 41 patients were randomly chosen for the training set and the remaining 11 patients were used in validation. 264 slices were taken from each patient’s dataset. Therefore the training and validation dataset consisted of 10,824 and 2904 CT and CBCT slices, respectively. Due to GPU memory limitations, a two-dimensional CycleGAN model is adopted in this study.

Calibration of HU by phantom

The CIRS model 062 (CIRS Tissue Simulation Technology, Norfolk, VA, USA) was scanned with the same Big Bore CT and the same CBCT on the linear accelerator, with the same acquisition parameters. For each scan, the average HU number of each material insertion (electron density relative to water of 1.00, 0.20, 0.50, 0.97, 0.99, 1.06, 1.07, 1.16 and 1.61) was read out in the central slice of the phantom. Then the average HU number in the CT scan and CBCT scan was plotted against the known electron density, respectively. HU of the CBCT images were corrected based on these two curves, by an in-house program to make the corrected CBCT images (CBCT_cor).

CycleGAN method

As shown in Fig. 1, the CycleGAN model includes two generators and two discriminators. In the forward cycle, Generator-RCT (GRCT) generates sCT from CBCT, and then Generator-CBCT (GCBCT) generates Cycle CBCT (CCBCT) from sCT. While in the backward cycle, GCBCT generates synthesized CBCT (sCBCT) from RCT, and then GRCT generates Cycle CT (CCT) from sCBCT. The two discriminators, DRCT and DCBCT, were used to determine whether sCT and sCBCT were real images. Loss function of CycleGAN was consisted of adversarial loss and cycle consistency loss. The adversarial losses for the two cycles are

$$ L_{CT} = E_{RCT} \left[ {\left( {1 - D_{RCT} (RCT)} \right)^{2} } \right] + E_{CBCT} \left[ {\left( {D_{RCT} \left( {G_{RCT} (CBCT)} \right)} \right)^{2} } \right] $$
(1)

and

$$ L_{CBCT} = E_{CBCT} \left[ {\left( {1 - D_{CBCT} (CBCT)} \right)^{2} } \right] + E_{RCT} \left[ {\left( {D_{CBCT} \left( {G_{CBCT} (RCT)} \right)} \right)^{2} } \right] $$
(2)
Figure 1
figure 1

Illustration of cycle-consistent generative adversarial network (CycleGAN).

The cycle consistency losses for the two cycles are

$$ L_{fw} = E_{CBCT} \left[ {\left\| {CBCT - G_{CBCT} (G_{RCT} (CBCT))} \right\|_{1} } \right] $$
(3)

and

$$ L_{bw} = E_{RCT} \left[ {\left\| {RCT - G_{RCT} \left( {G_{CBCT} (RCT)} \right)} \right\|_{1} } \right] $$
(4)

Thus, combining these two kinds of losses, the full objective is:

$$ L_{cyclegan} = L_{CT} + L_{CBCT} + \lambda \left( {L_{fw} + L_{bw} } \right) $$
(5)

Network structure and parameters

The generator contains an encoding layer, a conversion layer and a decoding layer. The encoder reduces the number of spatial dimensions and identifies the features of the input image. The conversion layer, which consists of nine layers of ResNet module26, will then change to its eigenvectors. The decoder repairs the spatial dimensions of the object and generates a synthesized image. The discriminator is a binary network with outputs between [0, 1]. The mode is trained with Adam optimizer27 from Tensorflow28. The learning rate decays linearly after 20 epochs with an initial value of 0.0002, while the momentum term β1 and β2 are set to 0.5. The other parameters are set as follows: λ = 10, batch-size = 2, epoch = 100. The original CBCT images and CBCT_cor images are used to train the model respectively. In the following text, SCT1 is generated by the CycleGAN model from original CBCT images while SCT2 is generated from the CBCT_cor.

Evaluation

In this study, the patients in the validation set were used to evaluate the improvement of image quality. The Mean Absolute Error (MAE) and Mean Error (ME) for CBCT, CBCT_cor, SCT1 and SCT2 versus RCT were calculated in the region of Binary masks, respectively. Meanwhile, HU profiles were also compared for these types of images while a side-by-side comparison was performed.

Volumetric modulated arc therapy (VMAT) plans of the patients in the validation set were generated on the RCT images. The prescribed dose were 69.96 Gy, 60.06 Gy and 56.1 Gy to the planning target volumes of primary nasopharyngeal tumor and definitive bilateral lymph nodes (PTV6996), high risk region (PTV6006), low risk region and bilateral low-risk nodal regions (PTV5610) in 33 fractions, respectively. The contours were copied from the RCT images to the CBCT, CBCT_cor, SCT1 and SCT2 images via rigid registration. Dose calculation was performed with the Pinnacle3 (version 16.2, Philips Radiation Oncology Systems, Madison, WI).

The comparison of dose distribution was performed among CBCT_cor, RCT, SCT1 and SCT2 images. Several dosimetric parameters were collected for quantitative comparisons. For PTVs, the D2 (the dose corresponding to 2% of volume), Dmean (the mean dose) and D98 (the dose corresponding to 98% of volume) were recorded. For OARs, Dmean or Dmax (the max dose) were compared. The global 3D gamma passing rates were also calculated by the radiotherapy module of 3D-Slicer with criteria of 3%/3 mm and 2%/2 mm, with 10% dose threshold, respectively.

The Wilcoxon’s signed rank test was carried out (between SCT2 and CBCT, SCT2 and CBCT_cor, SCT2 and SCT1,) for MAE, ME, gamma pass rate and dosimetric parameters previously described. Statistical Package for the Social Sciences (SPSS 21.0; SPSS Inc., Chicago, IL, USA) was used to perform these tests and P < 0.05 was considered statistically significant.

Results

Side-by-side comparison

Figure 2 shows CBCT, CBCT_cor, RCT, SCT1 and SCT2 images from one patient of validation set. The image quality of SCT1 and SCT2 were significantly better than that of the CBCT and CBCT_cor. As shown in Fig. 2A, both of SCT1 and SCT2 images generated by CycleGAN model could remove the scattering artifacts of CBCT images. Meanwhile, the line profiles of different areas for this patient were plotted. In the profile of line A which passes though soft tissue, bone and cavity areas, sCT images especially for SCT2 HU values were well corrected to the HU values of RCT. In the profile of line B which passes though brain tissue area, the SCT1 and SCT2 HU values were smooth and well corrected to the RCT HU values too. Nevertheless, the CBCT and CBCT_cor HU values were obvious noisy.

Figure 2
figure 2

(A) The side-by-side comparison of CBCT, CBCT_cor, RCT, SCT1 and SCT2 for a validation patient; (B) the line profile of line a; (C) the line profile of line b.

The MAE and ME evaluation

The results of MAE and ME comparisons between the CBCT, CBCT_cor, SCT1 and SCT2 images against RCT images for all 11 validation cases are listed in Table 1 and Fig. 3. For both MAE and ME, the result of SCT2 images was the smallest, follow by SCT1. In addition, as shown in Fig. 3, the MAE for each patient between SCT2 and RCT is less than that between SCT1 and RCT. The range of MAE improved from (79, 143) to (74, 97), which suggested that corrected CBCT images can help improve training results.

Table 1 MAE and ME results for four kinds of images against RCT images from all validation cases.
Figure 3
figure 3

MAE and ME comparisons for each patient in validation cases. (A) MAE, (B) ME.

Dose distribution comparison

Figure 4 shows the dose distribution based on CBCT_cor, RCT, SCT1 and SCT2 images for three validation patients. The distribution of isodose lines on SCT2 was closest to that on RCT, followed by SCT1. Moreover, the isodose lines of CBCT_cor such as 7350 cGy were significantly different from that of RCT.

Figure 4
figure 4

The dose distributions for three validation patients on CBCT_cor, RCT, SCT1 and SCT2 were displayed.

The average relative dosimetric difference for the CBCT, CBCT_cor, SCT1 and SCT2 compared to RCT of all validation patients were listed in Table 2. The average dosimetric difference (6.5 ± 8.7%) was considerable when calculated from the uncorrected CBCT images. For most targets and OARs, the relative dosimetric differences for SCT2 were least compared to the RCT (0.6% ± 0.6%). Respectively, the average differences for CBCT_cor and SCT1 were (2.7% ± 1.4%) and (1.2% ± 1.0%) respectively.

Table 2 The average relative dosimetric difference for the CBCT, CBCT_cor, SCT1 and SCT2 compared to RCT of all validation patients (mean ± SD).

3D gamma analysis

The result of 3D gamma analysis was shown in Table 3. Regardless of the 3 mm/3% or 2 mm/2% criteria, the gamma passing rate of SCT2 compared with RCT (98.7% and 97.1%, respectively) was marginally higher than that of SCT1 (97.7% and 95.7%, respectively). Moreover, both of STC1 and SCT2 have significantly higher passing rates than that of CBCT_cor.

Table 3 3D gamma analysis of the CBCT, CBCT_cor, SCT1 and SCT2 compared with RCT (mean ± SD).

Discussion

In this study, we developed a hybrid approach using conventional phantom correction method and deep learning methods to generate sCT images from CBCT images acquired for NPC patients during their treatments. The HU values of CBCT images were firstly corrected by the HU-ED curves and then used to train the CycleGAN model. For comparison, the original CBCT images were also used for model training with the same parameter settings. Image quality and dose distribution were evaluated on the CBCT images corrected by HU-ED curves and sCT images generated by CycleGAN model, with RCT images as the ground truth.sCT images generated by the CycleGAN model successfully removed most scatter artifacts on the CBCT images. The image quality of SCT1 and SCT2 was visually comparable to the RCT, as Zhang et al.29 and Chen et al.30 reported. Moreover, the HU profile of sCT images in most regions, especially SCT2, was closer to that of RCT images. It indicated that the HU value of SCT1 and SCT2 images were adequately corrected to that of RCT images, consistent with previous studies19,31. The MAE was significantly less in SCT2 than SCT1 (83.51 vs. 105.62, P < 0.05), indicating improvement was achieved by training with HU corrected CBCT images. The MAE was greater than previously reported because it was calculated over the structures inside the patient external contour, which was believed to be a better evaluation than calculated from the entire image in previous studies.

The eventual goal of this study is to improve the image quality and dosimetric fidelity for the readily available CBCT obtained during treatment to implement ART for NPC patients if their anatomy changes. Visual inspections of the 3D dose distributions on the sCT images proved it much closer to those calculated from the RCT images, compared to using the uncorrected CBCT and the HU correction only CBCT_cor images. SCT2 was superior to SCT1. As shown in Fig. 4, the prescription dose level isodose line (7350 cGy) from the SCT2 was more close to that of RCT, while the high dose area was obviously larger in the SCT1 images. The average relative dosimetric differences of the SCT2 was significantly lower than that of SCT1 (0.6 vs. 1.2). When evaluated with the 3D gamma analysis, the dose distribution of the SCT2 was more robust than that of SCT1 with a smaller standard deviation (0.5 vs. 1.9) in 2%/2 mm gamma index evaluations.

Our results show that the sCT images generated from the HU corrected CBCT images is superior to that generated from uncorrected CBCT images with the same CycleGAN model, both in image quality and dose calculation accuracy. The sCT images generated by the hybrid method of deep learning and phantom correction could achieve adequate accuracy in ART dose calculations for NPC patients. There were some limitations in this study. First, due to the computational power limitation in hardwares, the current approach was only able to implement the 2D CycleGAN. Even results can be reasonably expected if a three-dimensional model could be adopted32,33,34. Secondly, due to the rigid registration algorithm used for pre-processing, there is a certain difference between RCT images and CBCT images. Deformable image registration may improve the performance of our approach upon validations in future research35.

Conclusion

The image quality and dose calculation accuracy on synthetic CT generated from the deep learning CycleGAN with HU corrected CBCT images were examined and evaluated. This method efficiently provided a 3D volumetric imaging dataset with improved quality and adequate dose calculation accuracy for the application in ART for NPC patients.