Deep-learning-based segmentation using individual patient data on prostate cancer radiation therapy

Purpose Organ-at-risk segmentation is essential in adaptive radiotherapy (ART). Learning-based automatic segmentation can reduce committed labor and accelerate the ART process. In this study, an auto-segmentation model was developed by employing individual patient datasets and a deep-learning-based augmentation method for tailoring radiation therapy according to the changes in the target and organ of interest in patients with prostate cancer. Methods Two computed tomography (CT) datasets with well-defined labels, including contoured prostate, bladder, and rectum, were obtained from 18 patients. The labels of the CT images captured during radiation therapy (CT2nd) were predicted using CT images scanned before radiation therapy (CT1st). From the deformable vector fields (DVFs) created by using the VoxelMorph method, 10 DVFs were extracted when each of the modified CT and CT2nd images were deformed and registered to the fixed CT1st image. Augmented images were acquired by utilizing 110 extracted DVFs and spatially transforming the CT1st images and labels. An nnU-net autosegmentation network was trained by using the augmented images, and the CT2nd label was predicted. A patient-specific model was created for 18 patients, and the performances of the individual models were evaluated. The results were evaluated by employing the Dice similarity coefficient (DSC), average Hausdorff distance, and mean surface distance. The accuracy of the proposed model was compared with those of models trained with large datasets. Results Patient-specific models were developed successfully. For the proposed method, the DSC values of the actual and predicted labels for the bladder, prostate, and rectum were 0.94 ± 0.03, 0.84 ± 0.07, and 0.83 ± 0.04, respectively. Conclusion We demonstrated the feasibility of automatic segmentation by employing individual patient datasets and image augmentation techniques. The proposed method has potential for clinical application in automatic prostate segmentation for ART.


Introduction
Diseases related to the prostate are common in adult males, and prostate cancer is the second leading cause of cancer-related deaths in the United States (the first is lung cancer) [1].In addition, incidences of prostate cancer are increasing in Asia and Europe and are expected to increase globally, along with mortality rates [2].Prostatectomy and radiation therapy are the common treatment options for clinically localized prostate cancer [3].Radiation therapy is a noninvasive treatment that destroys cancer cells via irradiation with high-energy radiation [4,5].Radiation therapy accurately depicts the target and the organ at risk (OAR) in computed tomography (CT) images; the highest possible dose is provided to the target while irradiating the OAR with the minimum dose through the established treatment plan.As radiation therapy is conducted daily for four to eight weeks, the volume and shape of the target and OAR can change [6] owing to changes in the patient's physiology, such as weight loss, movement of body fluids, and tumor volume reduction [7].Woodford et al. [8] reported that the gross tumor volume during 30 fractions was reduced by an average of 38%, ranging from 12% to 87%.Romero et al. [9] demonstrated a maximum dose loss of 21.1% for the tumor volume when treatment was continued with the initially established treatment plan.Failure to adapt the treatment to changes in the target or OAR can result in treatment failure or radiation overdose of up to 36.8% in the OAR, which can cause unwanted acute or late toxicity [10].Therefore, radiation treatment strategies considering these anatomical changes are required.
Adaptive radiation therapy (ART) is a radiation treatment strategy in which the detected anatomical changes in a patient are considered and a modified treatment plan established during the treatment course [11].ART enables the accurate treatment of cancer.However, reestablishing treatment plans and recontouring the target and OAR for each modification require additional resources.In particular, OAR recontouring is a time-consuming process that requires approximately 180 min [12].In addition, the contouring technology employed by dosimetrists has considerable impact on the consistency and accuracy of OAR segmentation [13].
Automated segmentation techniques have been proposed to reduce committed labor and accelerate the ART process.Schulz et al. [14] developed a semiautomatic contouring method that demonstrated a 30% reduction in the time required for OAR segmentation compared with manual contouring.Doshi et al. [15] proved the usability of semiautomatic contouring, wherein the average Dice similarity coefficient (DSC) between semiautomatic and manually segmented structures was 87% in tongue tumors.Although the contouring time can be reduced by employing semiautomatic segmentation, the segmented OAR must be manually modified owing to the incompleteness of automation [16].Recently, automatic segmentation techniques that do not require additional manual modifications have been developed to enhance segmentation accuracy.
Developing a high-performance network requires large amounts of data for network training.However, acquiring many medical images is difficult because of patient privacy and security issues [31].Even if the medical images are acquired, creating a manually segmented OAR, which is termed as a "label," is time-consuming and expensive [32].Employing pretrained deep-learning models can reduce the required amount of image data for training; however, a model's architecture can require different modalities and image sizes, limiting the usability of the models and reducing the prediction performance [33,34].To overcome these restrictions, an image augmentation method that modifies the original image data has been proposed to increase the number of datasets [35].Image augmentation methods can be categorized into rigid and nonrigid methods.The rigid method produces geometric variations in the original image but has limitations in creating human body images of various shapes; whereas the nonrigid method can create human body images having various shapes but can occasionally create unrealistic images [36].Recently, augmentation employing a deep-learning method was proposed to create modified images that were similar to actual patient images [37].Here, we investigated the feasibility of an automatic segmentation network that utilized only individual patient datasets.As the internal structure of each patient has its own characteristics and the deviation or change from the initial shape and position can have a limited range, we assumed that an individual model utilizing each patient's own data may have high performance.Thus, this study proposes an automatic segmentation model that employs an individual patient dataset and utilizes an image augmentation technique for tailoring radiation therapy according to the changes in the target and organ of interest in patients with prostate cancer.
For radiation therapy, a CT simulation must be performed (CT 1st ) and a label must be generated in the process of creating a treatment plan.After the segmentation model was trained by using the CT 1st image with the label and other augmented CT 1st images, automatic segmentation was performed on the second CT image (CT 2nd ) during ART.Nonrigid augmentation utilizing deep learning was employed to create the training datasets.Our assumption was validated by quantifying the model performance against the CT 2nd label, which was manually contoured.

Materials and methods
The CT 1st image and label and CT 2nd images were used to develop and validate the model.Insufficient training datasets do not ensure sufficiently accurate model [38].Thus, we generated augmented datasets by using VoxelMorph's deformable vector field (DVF) extraction method [39].VoxelMorph is a deep-learning-based image transformation method that learns the DVF between two images such that a moving image can be changed into a fixed image.Before the DVF method was applied, various augmented images were created from the CT 1st and CT 2nd images utilizing conventional rigid augmentation, including image rotation, zoomin, zoom-out, and flipping.For one image-set pair comprising CT 1st and another augmented image, VoxelMorph was applied, and 10 DVF sets for one pair of images were extracted.Finally, the extracted DVF was applied to the CT 1st image to create augmented images.The segmentation model was coded by adopting a U-net-based model (nnU-net) and trained by using augmented images.Label segmentation of the CT 2nd image was performed, and the accuracy of the model was evaluated.

Image acquisition and preprocessing
The study protocol was approved by the Institutional Review Board of Samsung Medical Center (IRB number 2019-09-119-002).We explained the research to adult male patients aged �18 years who underwent radiation treatment for prostate cancer between November 18, 2019, and September 24, 2021, and obtained signed consent forms from patients who expressed willingness to participate in additional CT scans.All patient CT data collected for research purposes were anonymized.Specifically, two sets of CT images and manually contoured structures (labels) (CT 1st image, label; and CT 2nd image, label) were collected from the 18 patients.CT images were acquired with two types of CT scanners, 27 Discovery™ CT590 RT scanners (GE Healthcare, Waukesha, WI, USA) and 9 LightSpeed RT16 scanners (GE Healthcare, Waukesha, WI, USA).The voxel spacing of the CT image obtained with Discovery™ CT590 RT was 0.9766 × 0.9766 × 2.5 mm 3 and the pixel dimensions were 512 × 512 × 64.The pixel dimensions of the CT image acquired with LightSpeed RT16 were 512 × 512 × 64; however, the voxel spacing was 1.2695 × 1.2695 × 2.5 mm 3 ; thus, voxel spacing was matched with the Discovery™ CT590 RT scanner through preprocessing.The VoxelMorph augmentation method learns the DFVs in the network in three dimensions (3D); hence, images with pixel dimensions of 512 × 512 × 64 cause memory problems.Therefore, CT images were resized to 256 × 256 × 64 pixels, and the final voxel spacing was changed to 2.5380 × 2.5380 × 2.5 mm 3 .The bladder, rectum, and planning target volumes were determined for the segmented structures.The labels were contoured by a dosimetrist and a physician at Samsung Medical Center.

Augmentation
VoxelMorph is a deep-learning-based framework adopted for the deformable medical image registration of two images, wherein a moving image is deformed and registered to a fixed image through an iterative learning process by minimizing a loss function.The input image pair included a CT moving image and a CT fixed image, and each voxel of CT moving was deformed and converted to be similar to that in CT fixed via an iterative registration process in Voxel-Morph.Subsequently, multiple DVFs were created.In the learning process, DVFs were extracted by employing U-net, and a CT moved image was created via spatial transformation of the CT moving image by applying the extracted DVFs.The similarity between the CT moved and CT fixed images was scored as a loss score to transform CT moving to be similar to CT fixed through iterative training.The loss function is expressed as follows: where ϕ is the DVF, CT moving � ϕ is the CT moved image transformed through spatial transform, and L smooth penalizes the local spatial variations in ϕ, wherein λ is a regularization parameter.L MSE (CT fix , CT moving � ϕ) is the mean square voxel difference applicable when CT fix and CT moved (= CT moving � ϕ) have similar image intensity distributions and local contrast and is expressed as follows: Repeated model training generated a DVF for each iteration until CT moving was equal to CT fixed .The CT 1st image with label information was utilized for CT moving .As illustrated in Fig 1, to generate various types of augmentation data, the CT 2nd image and 10 modified CT images from CT 1st and CT 2nd were generated by using the following parameters: right rotation 3˚, left rotation 3˚, flipping, zoom in 5%, and zoom out 5%; these were utilized as CT fixed .When the loss exceeded 1000, the organs in the generated CT moved overlapped, resulting in images that did not resemble the human anatomy.Therefore, we began saving the DVF as the loss decreased to below 1000.As the iterations continued, nine more DVFs were saved such that 10 DVFs were extracted for one image pair.Each epoch consisted of 250 iterations, and the training was terminated when 10 sets of DVFs with a loss below 1000 were completely saved.Approximately 3-10 min was required to save 10 sets of DVFs.In total, 110 DVFs were generated from 11 pairs of image sets (11 CT fixed and one CT moving ), and 110 augmentation datasets were generated via spatial transformation of CT moving image and CT moving label with 110 DVFs (Fig 2A).Thus, 110 training sets were created utilizing CT 1st image with labels and CT 2nd image, which were employed as training data for the segmentation network.

Segmentation model
We constructed a segmentation network by adopting nnU-net [40].nnU-net is a U-net-based segmentation model used to optimize the parameters by analyzing 23 public datasets utilized in international biomedical segmentation competitions.By employing 110 augmented data points generated through VoxelMorph, we trained the network and segmented and generated labels in CT 2nd (Fig 2).The network utilized for training was a 3D nnU-net; 80% of the data were used for training and the remaining 20% were employed for validation.In addition, a general augmentation method, such as gamma, mirror, crop, or rotation, was randomly applied to the data for network segmentation training.The input patch size was 224 × 224 × 54, and the batch size was two.Five downsampling operations were performed, and the size of the function map was 4 × 4 × 3 at the bottleneck.The optimizer employed was a stochastic gradient descent, and the loss was computed using the Dice and cross-entropy scores.The training was run for 50 epochs, and each epoch was equal to 250 iterations.The learning rate was constantly reduced by using a polynomial.The training was performed by applying Voxel-Morph in Pytorch 1.5.1 and nnU-net in TensorFlow 2.0, employing an NVIDIA GeForce 2080Ti graphics processing unit.For 50 epochs, we saved the model at each epoch.Among the 50 models, the model with the lowest Dice and cross-entropy loss on the validation data was selected as the final model for the patient; thus, 18 segmentation models were produced.

Evaluation
We conducted segmentation using a limited individual dataset and VoxelMorph augmentation, which referred to an individual model.To assess the impact of the number of datasets and the use of VoxelMorph on the model performance, we compared the segmentation performance using individual datasets against the total dataset (data from 18 people).The model trained with all datasets is referred to as the total model.Additionally, we evaluated the effect of incorporating voxel-morphs on model performance.We also tested the dependency of the model performance on the network type by comparing the Basic U-net and nnU-net.For this purpose, a U-net-based model was developed and trained using a process identical to that adopted for the nnU-net-based model.The predicted segmentation was quantitatively evaluated utilizing the DSC, Hausdorff distance (HD), and mean surface distance (MSD).The DSC evaluates the overlap between true and predicted volumes.where True and Pred denote the actual and predicted label volumes, respectively.Meanwhile, the HD is the largest distance from a point on the true label to the nearest point on the predicted label and is expressed as follows: where h HD (True, Pred) is the Euclidean distance between two points: one in the true label volume and the other in the predicted label volume.The MSD measures the average surface changes between the predicted and true label volumes and is expressed as follows: Table 1 presents a comparison between the performances of VoxelMorph for different types of datasets.In the nnU-net, the highest values of 0.957 and 0.874 were presented for the bladder and rectum, respectively, in the total dataset with VoxelMorph.Whereas, for the prostate, the highest value of 0.847 was realized in the individual dataset without VoxelMorph.The performance of Basic U-net was lower than that of nnU-net; however, the best-performing combination for each organ remained the same.In addition, the DSC values for the bladder, rectum, and prostate were computed on CT 2nd by adopting the labels generated in the CT 1st (CT 1st to CT 2nd segmentation).The rectum exhibited the lowest value (0.624) of the three labels.The DSC, HD, and MSD scores for the three organs automatically segmented using the nnU-net network with VoxelMorph were computed for 18 patients in both the individual and total models.The performances of the individual and total models are summarized in Tables 2  and 3, respectively.In the DSC evaluations of the two models, the bladder had average DSC values of 0.944 and 0.957, with the highest DSC values among the segmented organs.In terms of the average HD, the individual model had the lowest values in the order of prostate, bladder, and rectum, while the total model had the lowest values in the order of bladder, prostate, and rectum.The average MSDs of 0.658 and 0.467 mm were observed in the bladder (lowest), followed by 1.015 and 0.697 mm in the rectum, and 1.200 and 1.210 mm in the prostate.Although a direct comparison is not possible, Table 4 summarizes recently published studies that evaluated model performance using DSC values to compare the quantities of data utilized.The dataset number refers to the number of CT images with labels.This study employed a single dataset and an unlabeled CT image set to train the segmentation model.Among the studies evaluated, Sultana et al. [42] achieved the highest DSC value of 0.90 ± 0.05 for the prostate.Kiljunen et al. [43] employed the largest number of datasets, specifically 876, for training the segmentation model; whereas, Kawula et al. [44] utilized the smallest number of datasets, 47, excluding our study.The performance of the model developed using one dataset was as follows: eighth among the 12 models for the prostate, eighth among the 10 models for the rectum, and fourth among the 10 models for the bladder.The performance of the model developed by utilizing the total dataset was the same for the prostate and bladder as that by utilizing the individual datasets; however, the ranking of the rectum increased from eighth to fourth among the 10 models.

Results
To utilize the image generated by using VoxelMorph, whether the image resembles an actual human body should be determined.

Discussion
ART can improve therapeutic efficacy because a treatment plan can be adapted by continuously updating the shapes and positions of the patient's OAR and target in the treatment plan.Accordingly, we created and evaluated a patient-specific automatic segmentation model utilizing labeled and additional unlabeled CT images.Performing accurate automatic segmentation is technically challenging because the volumes of the prostate and rectum are small, and the Hounsfield unit value on CT images is similar to that of soft tissues.Nevertheless, employing one individual dataset, the proposed model achieved good performance; the DSC values were more than 0.80 for the prostate and rectum and 0.94 for the bladder.These results demonstrate the feasibility of using a patient-specific segmentation model with an individual patient dataset.
We compared the performance of the employed model, nnU-net, against a Basic U-net with varying numbers of datasets, with and without a VoxelMorph (Table 1).For the individual datasets, the nnU-net network using VoxelMorph performed better than that without Vox-elMorph.Conversely, the Basic U-net network model performed better without a VoxelMorph.However, for the total dataset, VoxelMorph performed the best for both network models.VoxelMorph augmentation increases the segmentation performance with a more advanced network model, particularly when models are trained with larger datasets (total model).
As shown in Table 1, the segmented labels of CT 2nd adopting the CT 1st label exhibited the lowest DSC, in the order of the rectum, bladder, and prostate.Compared with the segmentation results utilizing deep learning, the DSC values for the rectum and bladder increased by more than 0.20.This suggests that the shape changes of the organs according to the progression of radiation treatment are minimal in the prostate and notable in the bladder and rectum.The average DSC values obtained in this study were compared to those reported in other studies (Table 4).The average DSC values for the bladder and prostate in both the developed individual and total models closely match those reported in previous studies.However, for the rectum, the individual model exhibited a lower performance, while the total model exhibited a high performance.For the bladder, our total model exhibited a DSC value of 0.02 higher than the average reported in the referenced studies, where a DSC value of 0.90 or more was observed.Conversely, the individual model's rectum had a DSC value 0.03 lower than the average reported, falling below the values in all referenced studies, except for a DSC value of 0.8 in the study conducted by Xu et al. [49].However, the DSC value was 0.01 higher than the average reported.Both the individual and total models for the prostate showed DSC values lower than the average reported in the referenced studies.The DSC values observed in Case 13 failed to accurately distinguish the boundary between the prostate and rectum (Fig 3F ).
The DSC value reported by Kiljunen et al. [43], who employed 800 datasets, was similar to that obtained for the individual models, and the DSC value reported by Elmahdy et al. [46], who employed 259 datasets, was higher than that obtained for all organs in the individual models.The performance of the aforementioned models does not depend linearly on the amount of data used.Considering that the studies cited in Table 4 did not employ the same dataset, we found that the data quantity, quality, network structures, and other parameters could affect the accuracy of the model.Nevertheless, quantitative measures of the performance of the developed individual models proved the feasibility of automatic segmentation by adopting only one patient dataset.
The HD is the maximum distance from a point on the true label to the nearest point on the predicted label.The HD values of the segmented organs indicate the presence of outliers.In the individual model, the average HD was < 10 mm for all organs, while the maximum HDs in the bladder and rectum exceeded 20 mm, with the standard deviation of 5 mm.Conversely, in the total model, the maximum HDs of the bladder and rectum were within 20 mm, with the standard deviation of 3 mm.The bladder and rectum exhibited fewer outliers as the number of datasets increased.Compared with the study conducted by Liu et al. [45], where the reported average and maximum HDs for the prostate were 7.0 and 18.4 mm, respectively, our individual and total models achieved average and maximum HDs for the prostate of 5.19 and 8.29 mm, and 5.16 and 8.29 mm, respectively.This suggests that the performance of the proposed model is comparable to that of Liu's model using 774 datasets, indicating a minimal impact of the dataset number on prostate segmentation.A small HD indicates small error of the outliers, thereby demonstrating that our segmentation method produced predictions that aligned closely with the actual organ.
The MSD represents the mean distance error between the actual and predicted labels.The average MSDs of the bladder and rectum in both datasets were comparable: the rectum exhibited a lower MSD in the total model, with the average difference of 0.318 mm.An increase in performance was realized regarding the rectum using an augmented and increased dataset; whereas, for the bladder and prostate, similar performances were exhibited as the total model, even with one dataset.Moreover, in more than 50% of the increases in the individual model, specifically 9 of 18 cases in the prostate and 11 of 18 cases in the rectum, the MSD was less than 1 mm.Notably, the recommended treatment accuracy of radiation therapy by Task Group 142 of the American Association of Physicists in Medicine is 1 mm [53].Although the criteria for treatment accuracy differ from those for MSD, more than 50% of the individual models meet the treatment accuracy criteria.Segmentation using individual models demonstrates the potential for actual treatment use.
Our study has several limitations.We utilized simulated CT data and an additional unlabeled CT image taken during treatment.Additional CT scans are difficult to perform during the treatment for patients undergoing short-term treatments such as stereotactic body radiotherapy.If the improvement in the cone-beam CT images taken for setup verification is equivalent to the simulation CT images, additional CT scans can be eliminated.In Case 01, as illustrated in Fig 4, the model failed to predict the air area of the rectum.The training CT 1st for the rectum lacked an air area and air label, resulting in the model's inability to predict the air area in the CT 2nd .Therefore, predicting the air area on CT 2nd is feasible if at least one dataset contains air in the rectal label.Nevertheless, our model demonstrated a performance comparable to that of the models proposed in studies utilizing many datasets.Further studies are necessary to enhance accuracy by refining the segmentation model and diversifying the DVF-based augmentation method.The prostate, rectum, and bladder considered in this study showed fewer morphological changes over time than the other organs.Our augmentation method learns these shape changes to generate images; therefore, minor morphological changes may make producing diverse images difficult.Additionally, we plan to recruit patients and focus on other organs to compare and validate our augmentation-based segmentation.

Conclusion
We developed a personalized automatic segmentation method utilizing individual patient datasets and image augmentation techniques to customize radiation therapy based on variations in the target and critical organs of patients with prostate cancer.In contrast to previous studies in which large datasets were utilized, in this study, we demonstrated the feasibility of automatic segmentation by adopting only two datasets per patient.Thus, this model is a promising tool for ART in patients with prostate cancer.

Fig 1 .
Fig 1.Generation of DVFs utilizing VoxelMorph method.(a) Generation of 10 modified CT images from 2 CT images of a patient to be employed as CT fixed images.(b) 10 DVFs extracted from each CT fixed via VoxelMorph method.Utilizing the 10 modified CT images and CT 2nd image as a fixed image, 110 DVFs were extracted.https://doi.org/10.1371/journal.pone.0308181.g001

Fig 3 Fig 3 .
Fig 3  presents examples of the predicted CT 2nd labels using individual models and the actual CT 2nd labels of the prostate, rectum, and bladder, presented as the 3D surface of the organ utilizing a 3D slice viewer[41].Six cases show the highest or lowest DSC for each organ.For a

Fig 4 .
Fig 4. Comparison of segmentation by overlapping CT and segmentation.Case 01 contains air in the rectum; Case 07 contains no air in the rectum.https://doi.org/10.1371/journal.pone.0308181.g004 Fig 5 illustrates 12 of the 110 augmented images generated by using the VoxelMorph model in Case 04 from the same slice number.These 12 images depict various shapes of the human body.

As depicted in Fig 3 ,
the shape of the rectum varied in each patient.VoxelMorph augmentation in the total model demonstrated superior performance compared with the individual model, attributed to its capacity to learn from diverse rectal shapes.This leads to enhanced rectal segmentation performance.

Fig 5 .
Fig 5. Augmented CT images and labels generated in Case 04 utilizing VoxelMorph.To verify whether the generated image exhibits a similar structure to the actual image, an examination of the image generated using VoxelMorph is presented for patient case 04.https://doi.org/10.1371/journal.pone.0308181.g005

Table 1 . Comparison of DSC values with different networks, dataset numbers, VoxelMorph.
Individual' refers to a model utilizing an individual dataset, with or without the use of VoxelMorph, and 'total' pertains to a model employing the entire datasets, with or without the use of VoxelMorph.DSC, disc similarity coefficient, * one labeled CT image set and one unlabeled CT image set.Bold text indicates the highest DSC.