Boosting radiotherapy dose calculation accuracy with deep learning

Abstract In radiotherapy, a trade‐off exists between computational workload/speed and dose calculation accuracy. Calculation methods like pencil‐beam convolution can be much faster than Monte‐Carlo methods, but less accurate. The dose difference, mostly caused by inhomogeneities and electronic disequilibrium, is highly correlated with the dose distribution and the underlying anatomical tissue density. We hypothesize that a conversion scheme can be established to boost low‐accuracy doses to high‐accuracy, using intensity information obtained from computed tomography (CT) images. A deep learning‐driven framework was developed to test the hypothesis by converting between two commercially available dose calculation methods: Anisotropic analytic algorithm (AAA) and Acuros XB (AXB). A hierarchically dense U‐Net model was developed to boost the accuracy of AAA dose toward the AXB level. The network contained multiple layers of varying feature sizes to learn their dose differences, in relationship to CT, both locally and globally. Anisotropic analytic algorithm and AXB doses were calculated in pairs for 120 lung radiotherapy plans covering various treatment techniques, beam energies, tumor locations, and dose levels. For each case, the CT and the AAA dose were used as the input and the AXB dose as the “ground‐truth” output, to train and test the model. The mean squared errors (MSEs) and gamma passing rates (2 mm/2% & 1 mm/1%) were calculated between the boosted AAA doses and the “ground‐truth” AXB doses. The boosted AAA doses demonstrated substantially improved match to the “ground‐truth” AXB doses, with average (± s.d.) gamma passing rate (1 mm/1%) 97.6% (±2.4%) compared to 87.8% (±9.0%) of the original AAA doses. The corresponding average MSE was 0.11(±0.05) vs 0.31(±0.21). Deep learning is able to capture the differences between dose calculation algorithms to boost the low‐accuracy algorithms. By combining a less accurate dose calculation algorithm with a trained deep learning model, dose calculation can potentially achieve both high accuracy and efficiency.


| INTRODUCTION
In radiation therapy, the radiation doses to the tumor and surrounding normal tissues directly determine treatment efficacy and safety. 1,2 It is pivotal for the radiation therapy treatment planning systems (TPSs) to accurately calculate the dose distributions to aid physician decisions.
Accurate dose calculation is also key to a reliable and reproducible model between dose distributions and clinical outcomes to guide future treatments. 3 The dose calculation algorithms have seen generations of development. The early generations of algorithms, usually referred to as correction-based methods, [4][5][6][7] are barely physics principle driven. Their accuracy is highly unreliable in heterogeneous regions (for instance, areas with lung tissue surrounding tumors), where the loss of electronic equilibrium occurs. 8 To better model the physics in these regions, model-based techniques were developed. [9][10][11][12] These techniques model the radiation energy transport via dose kernels and convolutions. To account for the tissue heterogeneity, the dose kernels are scaled based on the equivalent electron density path lengths encountered by radiation beams, leading to the superposition-convolution type of algorithm, which is widely used in today's clinic. Analytical anisotropic algorithm (AAA), one of such algorithms, is commercially implemented in the Eclipse TPS 12 (Varian Medical Systems, Palo Alto, CA). However, a discrepancy over 5% from measurements can still be observed for AAA calculations in inhomogeneous regions, 13 which can be clinically significant. 14,15 The discrepancy is due to the fact that kernel scaling does not explicitly and realistically model the energy transport through physical interactions either. The Monte Carlo algorithm represents a third type of dose calculation algorithms. It models the transport and energy deposition of each particle (photons, electrons, etc.) via explicit physics principles, which are modeled with measured data or proven formula, and provides the highest accuracy. 16,17 However, Monte Carlo needs to simulate the transport of each particle individually, which requires substantial computational power and may significantly prolong the dose calculation time. Besides Monte Carlo, recently a new dose calculation technique using the linear Boltzmann transport equation was implemented as the Acuros XB (AXB) algorithm in Eclipse. 18,19 It is proven that AXB would theoretically converge to the same solution as the Monte Carlo algorithm. 20 The accuracy of AXB has been extensively validated. 17,21,22 The efficiency of AXB is plan dependent, which improves with increasing beam numbers (relatively) and favors volumetric-modulated arc therapy plans. In some scenarios, however, AXB can be 10 times slower than AAA. 23 In general, a trade-off exists on dose calculation: more accurate dose calculation requires more computational power, and is generally more time consuming and resource demanding. Due to this tradeoff, current radiotherapy TPSs may have to use less accurate methods for dose calculation, in order to improve efficiency, especially during plan optimization. 24 Such an adoption is less ideal, since these low-accuracy dose calculations used during optimization may potentially trap the optimization into a local optimum, and yield a suboptimal final plan. As the radiotherapy society is pursuing more precise, individual-tailored treatments, the use of low-accuracy dose calculation algorithms for plan optimization may fail to generate high-quality plans within a tight time frame, especially for online adaptive radiotherapy. 25 A technique enabling dose calculation with both high accuracy and efficiency is thus much desired for plan optimization.
In addition, recent advancements of real-time imaging techniques also call for such a technique, 26 which will make possible on-the-fly dose monitoring and intervention through real-time plan re-optimization and adaptation.
As mentioned, the differences between low-accuracy and highaccuracy doses are mostly within inhomogeneous regions. The inhomogeneity, however, is fully captured in CT images that are used for dose calculation. From this observation, we hypothesize that the differences between dose algorithms can be learned and correlated with the dose distribution and the CT intensity information. With this learnt correlation, we can then quickly boost the low-accuracy doses to high-accuracy, to overcome the trade-off between dose calculation accuracy and efficiency. Recently, the developments and applications of artificial intelligence (AI) in radiation therapy have seen tremendous growth. [27][28][29][30][31][32] Sophisticated convolutional neural networks can handle intensive tasks including medical image de-noising, segmentation, treatment plan optimization/evaluation, and clinical outcome prediction. Some networks, including the U-Net, 33 can perform voxel-wise prediction and mapping, which allows a potential voxel-to-voxel dose map conversion to boost the accuracy of low-accuracy doses. Additionally, the U-Net can extract both global and local features from dose distributions and CT images, which can be directly correlated with dose differences between algorithms, as the energy transport is essentially determined by both long-range (global) photon transport and short-range (local) electron transport. With dedicated graphics processing units (GPUs), the inference of U-Net can also be executed within seconds, meeting the efficiency requirement. Driven by our hypothesis, in this study we introduced an AIbased framework to achieve rapid, direct 3D dose map conversion from low-accuracy doses to high-accuracy doses. We trained and evaluated the whole framework on AAA ("low-accuracy") and AXB ("high-accuracy") dose maps to demonstrate its effectiveness, since these two algorithms are well studied, widely available and can be easily evaluated by other groups. Note that the "low-accuracy" and "high-accuracy" here were defined relatively, since under different context and scenarios, AAA can also be high accuracy (for instance, when compared with pencil-beam convolution) and AXB may be low accuracy when compared with a full-fledged Monte-Carlo package.
The "high" and "low" here, thus, were determined relatively between the two algorithms under study. We derived and tested the dose conversion model using a large lung cancer patient database, aiming to improve the dose calculation of lung cancer treatment, whose accuracy is the most susceptible to tissue inhomogeneity. 28 2 | MATERIALS AND METHODS

2.A | Data preparation
In this study, we retrospectively collected a total of 120 lung cancer patient cases in our institution treated between 06/2017 and 03/ 2018. The retrospective study was approved under an institutional review board umbrella protocol. All patients were planned in Eclipse V15.5, by techniques ranging from 3D non-coplanar conformal static beams, intensity-modulated static beams, and 3D conformal arcs to volumetric-modulated arcs. The total prescription doses ranged from 24 to 60 Gy, covering both conventional and stereotactic body radiation therapy treatments. The tumors were distributed across both central and peripheral lung regions. Both primary lung tumors and metastatic tumors from breast, liver, kidney, and prostate were included. The treatment plans used beam energies ranging from 6 MV, 10 MV, 6 MV FFF, to 10 MV FFF. All treatments were designed and successfully delivered on an Elekta VersaHD LINAC with a 160-leaf Agility multi-leaf collimator head. 34 All cases were planned and treated using AAA as the dose calculation engine, with heterogeneity correction turned on. The dose grid was 2.5 mm × 2.5 mm × 2.0 mm in resolution.
For each AAA dose distribution, we calculated the corresponding AXB dose distribution using exactly the same plan. The AXB doses were reported in the form of dose to medium to account for the elemental composition of different tissues. In Eclipse, the tissue designation is based on the densities determined from CT Hounsfield units. The corresponding elemental composition of each tissue is then determined on the basis of the International Commission on Radiological Protection Report 23. 35 The dose grid of AXB was 2.5 mm × 2.5 mm × 2.0 mm in resolution, same as AAA. For each patient, we also exported the planning CT volume. The planning CT volumes were of varying voxel resolutions and volumetric dimensions for different patient cases. We exported the CT and dose files as DICOM-RT files from Eclipse, registered them with DICOM coordinates, and converted them into numeric arrays for the training, validation, and testing purposes. Prior to feeding them into the neural network, we rescaled and interpolated both the AAA and AXB doses, as well as the patient-specific CT volumes to a uniform resolution of 1.37 mm × 1.37 mm × 2.00 mm.

2.B | Network structure selection
For efficient and accurate dose boosting, we employed a Hierarchically Dense U-Net (HD U-Net) structure. 30 Hierarchically Dense U-Net is a combined version of U-Net and DenseNet. 36 Compared with U-Net, HD U-Net uses densely connected layers within each hierarchical level of U-Net, which helps with feature propagation and reuse, and reduces the vanishing gradient issue. Compared with DenseNet, HD U-Net preserves the pooling and upsampling procedures of U-Net, which are able to capture the global features from the input. Once the HD U-Net structure is set, it trains the same way as U-Net. Quantitative comparisons between the three type of networks have been reported and well-documented in a previous publication, 30 showing the advantage of the HD U-Net. As reported, HD U-Net was able to achieve high accuracy with much fewer parameters in the network than U-Net, which reduced the chance of overfitting. In contrast, DenseNet provided the overall worst results due to its lack of ability in capturing global features. For the supervised training, input channels for the HD U-Net include the Eclipse-calculated AAA dose distribution and the CT volume, and the "ground-truth" output is the Eclipse-calculated AXB dose distribution.
For testing, the output will be the boosted dose. We used patchbased training 37 to balance the size of the training data and the computational resources. We separated the full dose volume (512 × 512 × 128) into patches (patch size: 512 × 512 × 16), and feed each of them individually into the network for training/testing.
We then merged the output dose maps into a single volume as the final output. The overall training and testing framework is illustrated in Fig. 1.

2.C | Training and testing
Of the 120 paired, patient-specific AAA-AXB dose maps, we ran- To assess the accuracy of the boosted AAA dose map, we visually evaluated its difference with the AXB dose map directly calculated from Eclipse. MSEs were computed between the boosted AAA and the AXB dose maps to evaluate their differences. 3D gamma analysis 39 based on both 1%/1 mm and 2%/2 mm criteria was also performed to quantitatively assess the match between the boosted AAA and the AXB dose distributions. We also compared the dose volume histograms (DVHs) of the planning target volume (PTV) and lungs between the boosted AAA and the AXB doses to evaluate the accuracy of dose conversion. Quantitative dosimetric endpoints, including the D 95 and V 100 of the PTV and V 20Gy and D mean of the lungs, were also assessed. The corresponding results between the original AAA doses (prior to boosting) and the AXB doses were also computed for comparison.
In addition to the proposed network with both original AAA dose and CT as input, we also evaluated a second network using only the original AAA dose as input. The second network was evaluated to assess the potential of directly learning intensity, texture, and XING ET AL.  In Fig. 3 we showed the gamma index maps between the original AAA and the AXB dose distributions, and between the boosted AAA and the AXB dose distributions for a 3D non-coplanar static beam plan [ Fig. 3(a)] and a volumetric-modulated arc plan [ Fig. 3(b)], respectively. A stringent criterion (1%/1 mm) was used to fully demonstrate the differences between dose maps. The red regions in the map indicated failed gamma index (>1). Large dose discrepancies can be observed on the original AAA gamma index maps, especially around the tumor region. As a comparison, the gamma index maps of the boosted AAA dose have these discrepancies largely removed. General training and testing processes where the patient-specific computed tomography (CT) and low-accuracy anisotropic-analytic-algorithm (AAA) doses serve as the input into the HD U-Net structure, and the high-accuracy Acuros XB (AXB) doses serve as the 'ground-truth' output for supervised training/validation. Using the trained framework, a new patient-specific CT and low-accuracy AAA dose can be input to obtain a high-accuracy, boosted AAA dose as the output, with its accuracy matching the AXB dose level.

| RESULTS
of the AXB doses, while substantial discrepancy could be observed between DVH curves of the original AAA and the AXB doses.
In Table 1, the boosted AAA doses demonstrated substantially improved match to the AXB doses, with average (± s.d.) gamma passing rate (1 mm/1%) 97.6% ± 2.4%, compared to 87.8% ± 9.0% for the original AAA doses. Using a less strict criterion (2 mm/2%) yielded 99.8% ± 0.4% for the boosted AAA doses, compared to 98.4% ± 1.5% for the original AAA doses. The corresponding average MSE was 0.11 ± 0.05 between the boosted AAA and the AXB doses, compared to 0.31 ± 0.21 between the original AAA and the F I G . 2. (a) The "ground-truth" Acuros XB (AXB) dose maps and relative differences between (b) the original anisotropicanalytic-algorithm (AAA) and the AXB dose maps; (c) the boosted AAA and the AXB dose maps, for a three-dimensional noncoplanar static beam plan. (d) The "groundtruth" AXB dose maps and relative differences between (e) the original AAA and the AXB dose maps; (f) the boosted AAA and the AXB dose maps, for a volumetric-modulated arc plan. The "ground-truth" AXB doses were shown in absolute quantities (Gy). The dose differences were normalized to the plan prescription dose (%).

XING ET AL.
| 153 AXB doses. The boosted AAA doses (w/o CT as input) were of accuracy in between the original AAA doses and the boosted AAA doses.
Note that since this study focuses on developing a network using CT as one of the input, if not specifically mentioned, boosted AAA doses by default refer to those obtained from this network.
In Table 2  terms of seconds/minutes, AXB can be~30 s to more than 2 min slower than AAA. In general, AXB tends to be much less efficient for plans with static gantry beams, which are frequently used in our clinic for lung treatments. Our dose boosting scheme, which can potentially be executed within 1 s, will significantly improve the dose calculation efficiency.
In this study, we trained, validated, and tested a voxel-wise dose boosting model on 120 in-house lung cancer patient cases using a Hierarchically Dense U-Net. Visual comparisons of dose differences showed major improvements in the boosted AAA doses, for areas both within the tissues and along tissue interfaces (Fig. 2). In comparison, the original AAA lacks accuracy in calculating doses at multiple regions, where electron densities are changing rapidly and invalidate the kernel scaling approach it applied to account for tissue inhomogeneity. Gamma index distribution maps shown in Fig. 3 also confirmed the improvement of accuracy in boosted AAA doses. The structural-specific DVH curves and dosimetric endpoints also demonstrated that the boosted AAA doses matched well with the AXB doses ( Fig. 4; Table 2), and provided more accurate target coverage and OAR sparing information. In general, the conversion model yielded~98% gamma passing rate between the boosted AAA and the AXB doses for 1%/1mm gamma analysis and~100% gamma passing rate for 2%/2 mm gamma analysis, showing almost perfect match. In comparison, the corresponding results were only~88% and~98% for the original AAA doses (Table 1).
In our developed network, we used both CT images and a lowaccuracy dose map as input to derive a high-accuracy dose map. We also evaluated the feasibility of directly using the low-accuracy dose map (w/o CT) to correlate with the high-accuracy dose map for dose boosting (Table 1). It can be observed that the network without using CT as input also helps to boost the AAA dose to match better with the AXB dose than the original AAA dose. However, it is also evident that the boosted AAA dose with CT as input has its accuracy best matched with the AXB dose. With the CT images providing electron density information, the HD U-Net will be better informed of potential inaccuracies in the original AAA dose maps through interpreting the CT density information, to further improve the accuracy of dose boosting.
In current clinical practice, convolution/superposition algorithms like AAA remain the main dose calculation engine for clinical TPS. To improve the dose calculation accuracy, AXB has been introduced into clinics as an alternative to full Monte-Carlo simulation, but its clinical use is still limited. Potential hurdles include the acquisition T A B L E 1 Quantitative comparisons between the original anisotropic-analytic-algorithm (AAA) and the Acuros XB (AXB) doses, and between the boosted AAA (w/ and w/o computed tomography (CT) as input) and the AXB doses. MSE: mean squared error. RX: prescription. The results of the 30 testing patient cases were included in the analysis. Another study by Ref. [44] is also converting between low-accuracy and high-accuracy Monte-Carlo doses for de-noising, which may not be readily applicable to non-Monte-Carlo-based dose calculation algorithms which are dominant in current clinical TPSs.
In summary, it is shown in this study that with the power of deep learning, we can uncover a mapping scheme between low-accuracy and high-accuracy dose maps, using patient anatomical structure maps and intensity distributions from CT as guidance. The "low" and "high" are defined only relatively in the context of the two algorithms under study, and should not be interpreted in an absolute fashion. We tested this hypothesis through developing and evaluating a dose boosting framework between AAA and AXB dose maps.
This framework can be readily extended to other potential pairs of dose maps, the relative accuracy difference of which may be more pronounced. The relative accuracy levels of different algorithms can be determined through the four types of algorithms defined in the AAPM Task Report No. 85, 1 or the "a" to "c" categories stratified by Ref. [45]. It will be of interest to test the framework in boosting pencil-beam convolution doses to Monte-Carlo doses, which might have a positive impact on treatment plan optimization as less-accurate pencil-beam convolution-type calculations are usually used within optimization to promote efficiency. It also remains to be investigated how robust the current framework will be to boost a

CONF LICT OF I NTEREST
No conflict of interest.