Robust treatment planning with 4D intensity modulated carbon ion therapy for multiple targets in stage IV non-small cell lung cancer

Intensity modulated particle therapy (IMPT) with carbon ions can generate highly conformal treatment plans; however, IMPT is limited in robustness against range and positioning uncertainty. This is particularly true for moving targets, even though all motion states of a 4DCT are considered in 4D-IMPT. Here, we expand 4D-IMPT to include robust non-linear RBE-weighted optimization to explore its potential in improving plan robustness and sparing critical organs. In this study, robust 4D-optimization—based on worst-case optimization on 9 scenarios—was compared to conventional 4D-optimization with PTV margins using 4D dose calculation and robustness analysis for 21 uncertainty scenarios. Slice-by-slice rescanning was used for motion mitigation. Both 4D-optimization strategies were tested on a cohort of 8 multi-lesion lung cancer patients with the goal of prioritizing OAR sparing in a hypofractionated treatment plan. Planning objectives were to keep the OAR volume doses below corresponding limits while simultaneously achieve CTV coverage with D95% ≥ 95 %. For the conventional plans, average D95% was at 98.7% which fulfilled the target objective in 83.2% of scenarios. For the robust plans, average D95% was reduced to 97.6% which still fulfilled the target objective in 80.7% of cases, but led to significantly improved overall OAR sparing: Volume doses were below the limits in 96.2% of cases for the conventional and 99.5% for the robust plans. When considering the particularly critical smaller airways only, fulfillment rates could be increased from 76.2% to 96% for the robust plans. This study has shown that plan robustness of 4D-IMPT could be improved by using robust 4D-optimization, offering greater control over uncertainties in the actual delivered dose. In some cases, this required sacrificing target coverage for the benefit of better OAR sparing.


Introduction
Advanced stage lung cancer treatment remains a challenge in spite of the relative success of stereotactic hypofractionated radiotherapy for non-small cell lung cancer (NSCLC) (Baumann et al 2001, Kong et al 2005, Rosenzweig et al 2005, Fakiris et al 2009, Grutters et al 2010, Greco et al 2011 . In a recent simulation study (Anderle et al 2018), we have shown that scanned carbon ion therapy can maintain or increase the dose to a tumor while significantly reducing the radiation exposure of critical organs. In this previous study, while different motion patterns were assessed, only nominal target conditions were analyzed without considering further uncertainties. The robustness of ion beam therapy with its steep dose gradients was shown to impede a clinical realization, especially for moving targets.
Several studies on scanned proton therapy have shown the importance of combining intensity modulated particle therapy (IMPT) with robust optimization (Stuschke et al 2012, Liu et al 2014, Cubillos-Mesias et al 2017. In addition, moving targets can be particularly critical for scanned particle therapy, as the interference between target motion and scanned particles can lead to degradation of the target coverage in form of range changes and interplay patterns. With 4D-optimization, target motion and motion-induced range changes can be incorporated directly into the treatment plan (Graeff 2014). An extension to 4D-robust optimization to also include uncertainties in motion patterns was realized by Liu et al (2016). Here, we introduce robust 4D-optimization of carbon ion treatment plans with RBE-weighted (relative biological effectiveness) doses and test it on a cohort of complex patients with stage IV NSCLC and multiple lesions in both lung lobes. This was assessed by performing a robustness analysis (RA) on a variety of uncertainty scenarios in order to test the following hypotheses: 1. Robust 4D-optimization decreases the variation of both target coverage and OAR exposure over multiple uncertainty scenarios, including those not contained within the optimization. 2. Robust 4D-optimization achieves dose constraints more reliably than conventional optimization, with a significant increase in OAR sparing under uncertainty conditions.

Robust 4D-optimization
This study uses the GSI in-house treatment planning system (TPS) TRiP4D (Richter 2012), based on TRiP98 (Krämer et al 2000a, Krämer andScholz 2000b). It is capable of 4D dose calculation and several 4D-optimization modalities (Graeff et al 2013, Graeff 2014. Up to now, TRiP4D has undergone several extensions including the capability to handle multiple targets (Anderle 2016) and an implementation of iterative objective optimization to handle constraints such as dose volume constraints via an adaptive repetition of the treatment plan optimization (Anderle et al 2018). A detailed description of the cost function for 4D-ITV-based IMPT can be found in Anderle et al (2018). TRiP4D's increased demand for computing power due to 4D-optimization is accounted for by using multi-threaded optimization and dose calculation based on the OpenMP library (Hild et al 2014). The 4D-optimization considered here operates on all motion states of a 4DCT to generate a single treatment plan which is valid for all motions states including motion-induced range changes (Graeff 2014). This implicitly defines a range-considering ITV, which can be considerably larger than the union of all motion states (geometrical ITV), especially in the lung (Graeff et al 2012). Slice-by-slice rescanning is available to mitigate potential interplay due to respiratory motion.
A robust version of 4D-optimization was realized by an implementation of a voxel-wise worst-case scenario method proposed by several publications (Unkelbach et al 2007, Pflugfelder et al 2008, Liu et al 2012 into the existing 4D dose calculation algorithm of the plan optimization. In each step during treatment plan optimization, nonlinear RBE weighted doses are calculated for various uncertainty scenarios, while the worst-case is always chosen for the structure in consideration. This is done voxel-wise for all voxels in target and OAR structures. Specifically, choosing the worst-case means selecting the scenario with the lowest dose for target voxels and highest dose for OAR voxels. For this study, S is the set of 9 different scenarios taken into account during treatment plan optimization: The nominal scenario, two scenarios representing range uncertainties by changing CT values and isotropic shifts of the patient's isocenter in the six cardinal directions representing patient setup uncertainties.
The RBE-weighted dose of a voxel i is defined as where D i phys ( ⃗ N) is the physical dose and RBE i ( ⃗ N) the relative biological effectiveness, calculated in the low dose approximation (Krämer and Scholz 2006). The vector ⃗ N contains the particle numbers of all pencil beams from all treatment fields. For the non-robust implementation of 4D-optimization (Graeff 2014, Anderle et al 2018, the dose contributions from each pencil beam spot from the beam spot grid are calculated and stored in the dose matrices for all target and OAR voxels from all motion states from a 4DCT. As typically 10 motion states are used, this results in 10 dose matrices for the non-robust implementation. For the robust implementation, a dose matrix is calculated at each motion state for each of the nine uncertainty scenarios, which results in a total of 90 dose matrices.
The cost function used for the robust 4D-optimization is as follows is the actual RBE-weighted dose of voxel i in motion state m dependent of the uncertainty scenario s and pencil beam weights vector ⃗ N. The voxel specific penalty factors are denoted as w T and w OAR . The prescribed target dose is denoted as D presc. and the dose limit for surrounding critical structures considered during treatment planning is stated as D limit . The D limit values used in this study depend on organ type and fractionation scheme and can be found in Benedict et al (2010). The Heaviside function H (D act. − D limit ) is used to neglect underdosage of OARs: it is unity if D act. > D limit and zero if D act. ⩽ D limit . The set of all motion states of a 4DCT is denoted as M. For patients with multiple tumors, T represents the set of all targets included in the optimization. The additional subdivision of all CTV voxels in specific targets T is necessary as the treatment fields are assigned to their specific CTVs, so that the optimizer adapts particle numbers only for the corresponding fields but still accounts for the influence from the remaining targets. A detailed description of the implementation of the robust 4D approach can be found in Wolf (2018).
This cost function also contains a maximal dose term (second line in equation (2)), used for target voxels exclusively. It has been proposed as an optional term by Liu et al (2012) to reduce hot spots in the robust optimization of proton treatment plans. For carbon ion treatment plan optimization, the maximal dose term has been determined to be mandatory in order to avoid an unrestricted overdose to the target (Wolf 2018).

Patient data
The robust 4D algorithm was tested on a set of 8 patients with 2-5 tumor lesions in the lung. In total, there were 24 targets. This set of patients represents a subset of the most complex cases of a patient cohort presented in Anderle et al (2016). This patient cohort was originally treated with photon SBRT at the Champalimaud Center for the Unknown in Lisbon, Portugal. To maintain consistency, the patient numbering from Anderle et al (2018) is preserved in this study. For the majority of tumor lesions (17 of 24), the fractionation scheme was 1 × 24 Gy. For the remaining 7 tumor lesions, due to OAR constraints, either single fractions with reduced doses such as 1 × 20 Gy and 1 × 22 Gy were used or fractionation schemes with 3 × 9 Gy and 5 × 7 Gy. A detailed overview of the patient cohort can be found in Anderle et al (2018).
The average tumor volume was 28.8 cm 3 , with a range of 0.4 cm 3 to 139 cm 3 for the reference state. The average peak-to-peak motion from the 4DCTs was estimated to be 6.2 mm, with a range of 0.5 mm to 17 mm.
For each patient, 4DCTs were obtained with 10 motion states (0-9), where the end inhale phase is defined as the reference state (0). Additionally, a deformable image registration (DIR) was performed for each 4DCT in order to transform CTVs and OARs, which were delineated by a physician in the reference state, to the remaining motion states.

Treatment planning
The treatment planning objective was to achieve a target coverage of D95% ≥ 100% while simultaneously meet the OAR constraints stated in Benedict et al (2010). When a sufficient compromise could not be achieved, OAR sparing was favored over target coverage.
In the presented study, two optimization strategies were compared: 1. For conventional 4D-optimization, the treatment planning were calculated on the CTVs from all motion states. Each CTV was extended by adding 3 mm isotropic PTV margins. The magnitude of 3 mm was derived from the original photon SBRT treatment plans. 2. For robust 4D-optimization, the 4D-optimization was performed on the CTVs from all motion states without any additional safety margins. This was possible as the robust optimization algorithm implicitly generates a margin of safety by considering several uncertainty scenarios during optimization. For robust optimization, the final extension of those safety margins is determined by the two parameters which describe patient positioning uncertainties and range uncertainties. In order to maintain comparability with the 3 mm PTV margins used in the previous study, the setup uncertainty was chosen to be ± 3 mm and the range uncertainty parameter was set to ± 3.5% for the robust optimization algorithm.
The relevant OARs were accounted for in both optimization strategies. In cases where an overlap between target volumes and critical structures was found, the overlap was subtracted from the corresponding CTV in all motion states.
As described above, uncertainties in patient positioning can be modeled by shifting the patient's isocenter during treatment planning. Since treatment plan optimization is performed in the beam's eye view (BEV) coordinate system, the effective shift is obtained by transforming the initial position shift from the CT coordinate system to the BEV coordinate system. As the BEV coordinate system itself, the effective shifts in the BEV system also depend on couch and gantry angle; however, in the BEV coordinate system, only planes orthogonal to the BEV axis are affected by the shifts in the CT coordinate system. Hence, only the lateral projections in x and y of the BEV coordinate system are of relevance. The particle ranges are only marginally affected by adding or subtracting an additional 3 mm of air while traversing patient tissue in BEV; however, altering the CT density by ± 3.5% creates significant particle range changes which meets the requirement for longitudinal safety margins.
The treatment plan optimization was performed separately for each of the lung lobes since the entry channels did not overlap and the effect of overlapping fragmentation dose tails was not relevant for plan optimization; however, the fragmentation dose tails were accounted for during the analysis of OAR exposure and target coverage, as the dose was calculated for all fields simultaneously. As in the previous study, due to the low density of lung tissue, an 80 mm PMMA bolus was used to account for the short particle ranges for targets in close proximity to the chest wall. In order to reduce the size of the calculation problem, large OARs, such as the heart or the lung, were cropped to only include the portions which intersected with the treatment fields. In this study, the same OARs as in Anderle et al (2018) were used for the robust optimization.
In order to maintain comparability to the previous study, treatment planning parameters for the robust 4D-optimization such as number of fields, couch and gantry angles, target points, spacing of beam spot grid, and size of ripple filter were taken from the treatment plans created with conventional 4D-optimization. The selected number of fields per target depended on the presence of adjacent OARs: (i) two opposing fields were used in absence of an OAR, (ii) two or more opposing or oblique fields were used in presence of critical structures in close proximity.
The RBE was calculated by applying LEM IV (Elsässer et al 2010, Friedrich et al 2014, using the low dose approximation presented in Krämer and Scholz (2006). As in the previous study, alpha-beta ratios of 6 and 2 were selected for target and normal tissue, respectively.
The penalty factor for the maximal dose term shown in equation (2) was set to 1.0 for all patients except for Patient 1, where the value was set to 1.1. This was necessary in order to account for a higher normal tissue dose, as this patient was optimized without considering OARs.
For target volumes, the robust optimization typically generates a steep and narrow dose fall-off which is centered on the D50% value. In order to improve fulfilling the target coverage objective, it was necessary to manually increase the planned dose between a range of 1.8% to 12.8% of the planned fractions (median value: 2.1 %).
Iterative objective optimization, as described in Anderle et al (2018), was only necessary for Patient 6. Here, a hard constraint was chosen for the smaller airways of the lung in order to reduce the maximum point dose which initially prevented fulfilling the dose volume limit defined in Benedict et al (2010).
Although, the focus of this study is on OAR sparing, treatment plans with low OAR penalty factors (LOW) were created for Patient 6, which resulted in a prioritization of the target coverage. Patient 6 was chosen, as the patient showed a pronounced conflict between the two treatment planning objectives for OAR sparing and target coverage. This comparison was done to further asses the capabilities of the conventional and robust 4D-optimization under varying objectives.

Motion mitigation
In addition to the consideration of motion-induced range changes by using 4D-optimization, residual target motion is mitigated by using slice-by-slice rescanning. In order to maintain comparability with the (Anderle et al 2018) study, up to 20 rescans were also performed during 4D dynamic dose calculation; however, the number of rescans for each iso-energy slices (IES) was limited by the minimal particle number necessary for detection of one monitor unit during treatment. For treatment plan optimization, a minimum particle number limit of 2 · 10 4 was used.

2.5 4D dynamic dose calculation
4D dynamic dose distributions were calculated for the robust and the conventional treatment plan presented in the previous study. The 4D dynamic dose calculations were performed with TRiP4D using a patient specific DIR, to propagate all partial state doses to the reference state (0). Temporal correlations between respiratory motion and the pencil beam deliveries were simulated with a dose delivery simulation tool (Richter 2012). Machine parameters, including average spill duration and spill structure, were taken from the synchrotron accelerator at the Heidelberg Ion-Beam Therapy Center (HIT).
Breathing motion was modeled with a periodical sinusoidal function using 3.5 s and 5 s periods, each combined with two possible starting phases of 0 • and 90 • , resulting in 4 respiratory motion patterns, as proposed by Lujan et al (1999). For each of these respiratory motion patterns, 4D dynamic dose distributions were calculated for 21 uncertainty scenarios, which combine three range scenarios (nominal range, range over-and undershoot) and patient position shifts in 6 cardinal anatomical directions. In total, this results in 84 different 4D dose distributions per treatment plan.

Analysis
In order to evaluate the quality of the conventional and robust treatments plans, a RA was performed. For the RA, dose volume histograms (DVHs) were calculated for each of the 84 4D dynamic dose distributions. From this set of DVHs, average values for DVH metrics such as D95%, D50% for the CTVs and D 0.5cc , D 15cc , D 4cc , D 5cc , D 10cc , D 0.35cc and D 1500cc were calculated for smaller airways, heart, trachea, esophagus, aorta, spinal cord and lung without CTVs, respectively. As for each DVH metric there were 84 values, the median value and average value with standard deviation in parenthesis will be stated.
As described in Benedict et al (2010), dose-volume constraints for a given OAR were evaluated as the dose in a defined critical volume, D vol , which is not allowed to exceed a threshold dose, D tresh , in order to be fulfilled. Taking the smaller airways as an example, the dose-volume constraint is fulfilled, if D 0.5cc ≤ 12.4 Gy. Further values of the threshold doses were used as stated in Table III in Benedict et al (2010), depending on the fractionation schemes as defined in section 2.2. In order to summarize several OARs, all D vol values were normalized to their corresponding threshold doses; therefore, if D vol exceeded 100 %, the constraint was counted as violated for a given scenario.
In order to estimate the uncertainty of the target coverage over the set S of all 84 scenarios per a patient and plan, the target uncertainty distribution (TUD) was computed as the deviation of D50% values for the nominal scenarios from the remaining scenarios and is defined as: As the narrowness of the target uncertainty distribution is more relevant than its median or mean values, the narrowness is evaluated with the uncertainty span, which is defined as the interval between 5th and 95th percentiles of the TUD. A narrow distribution of TUD and, hence, a small uncertainty span are associated with a decreased susceptibility to uncertainty scenarios.
Since the distributions of DVH metrics are of non-Gaussian shape, statistical significance was calculated by performing a signed-rank Wilcoxon test on the distribution of paired differences between the conventional and the robust plans. As null hypothesis, it was assumed that the mean value of the difference distribution is zero.

Hardware
All treatment plan optimizations were performed on a rack-mounted Linux machine equipped with two Intel® Xeon® CPUs (Modell E5-2690 v3 at 2.60 GHz) which provide 48 threads in total, and 256 Gb of working memory in total.
The 4D dose calculations for the RA were performed on the above mentioned Linux machine and a rack-mounted AIX machine equipped with 20 IBM® Power8® CPUs (at 3.425 GHz) which provide 96 threads in total, and 128 Gb of working memory in total.
In order to provide the execution of several TRiP4D calculations in parallel, a limit of 16 threads per process was used.

Example patient
To demonstrate the performance of the robust 4D-optimization in a more comprehensive manner, Patient 6 is presented here in more detail. Although Patient 6 has only one tumor per lung lobe, this case posed a challenge for the conventional and the robust optimizer, due to the close vicinity of the CTV to the smaller airways in each lung lobe. At some locations, the distance between targets and smaller airways was less than 10 mm, which requires a compromise between providing sufficient target coverage and adequate OAR sparing. In the Anderle et al study, both the original stereotactic body radiation therapy (SBRT) treatment plans and the conventional 4D carbon ion plans tried to find a compromise between these two opposing objectives. When a sufficient compromise could not be achieved, OAR sparing was favored over target coverage. For the conventional treatment plan from Anderle et al, target coverage of the CTV in the left lung lobe was sacrificed to spare the smaller airways adequately. As the conventional 4D-optimization only considers the nominal dose scenario, it was not possible to account for additional range or setup uncertainties. The axial view of the resulting dose distribution can be seen in figures 1(a) and 1(c), where for both the conventional and robust treatment plan, the dose distributions showed a cold spot in target coverage for the medial area of the left CTV, which will further be referred to as 'low dose area' . For the robust plan, the low dose region between the left CTV and smaller airways was already larger than in the conventional plan, and in the right lung lobe, the low dose area in the CTV is even more pronounced than for the conventional plan. For the robust treatment plan, this means an additional sacrifice of target coverage in order to provide improved adequate sparing of the smaller airways (SA) in all scenarios.
Figures 1(b) and 1(d) show axial views of the dose distributions of robust and conventional 4D treatment plans with the CT density increased by 3.5 %. This high-density scenario (hd) for instance demonstrates the performance of the robust optimization algorithm under the presence of uncertainties. Since increasing CT  density will result in undershooting ranges of the planned treatment fields, the impact on the dose distribution can be substantial, as seen in the increased number of hot and cold spots. When comparing the nominal and the hd dose distributions for the conventional plan, both CTVs already contain hot spots; however, in case of increased CT density the left target sustains extensive cold spots up to several Gray dose reduction. Apart from the inherently created cold spots close to the smaller airways, the robust dose distribution is visibly more uniform and contains only small hot spots, as seen in the dose distributions in figures 1(c) and 1(d).
The major difference between the robust LOW plan and the robust plan can be seen around the boundary areas of the target volume, proximal to the smaller airways. Since the penalty factors for the target voxels were set to be larger than the penalty factors for the OAR voxels for the robust LOW plan, the cold spots in the target were no longer present. When compared to the conventional and the robust plan in the case of increased CT density, the dose distribution of the robust LOW plan in figure 1(f) presents almost no deteriorations. Figure 2 shows the DVHs of the already mentioned three plans and the conventional LOW plan for Patient 6. For the given two objectives, an ideal treatment plan would generate a DVH, where the CTV dose curves would converge to the ideal step-shape for all considered uncertainty scenarios, and where the CTV dose curves would provide a steep dose fall-off shortly after 100% of the target dose is reached. This would result in narrow uncertainty bands around the nominal DVH curves, and that the CTV objective of D95% ≥ 95% would be fulfilled in the majority of cases. At the same time, the ideal plan would meet the DVH constraints for both, the volume dose and the maximum delivered dose for each OAR in all uncertainty  Table 2 in the annex states the resulting target coverage and dose exposure of the smaller airways from the RA for the four treatment plans shown in figure 2.
For the 4D conventional plan with priority on OAR sparing in figure 2(a), there was an uncertainty span of about 13.5% and 2.6% for the left and right CTV, respectively. The pronounced uncertainty span of the left CTV is approximately a factor 20 larger than for the robust plan. For the 4D robust plan in figure 2(b), the uncertainty span (see equation (3)) was reduced to 0.5% and 1.7% for the left and right CTV, respectively. Also, the dose exposure of the smaller airways was considerably reduced to be below the constraint in the majority of uncertainty cases; however, at the cost of reduced target coverage (see DVH metrics in table 2 in the annex).
In figures 2(c) and (d), show the DVHs for the conventional and robust 4D plans with priority on target coverage (LOW). In both cases, the DVH shape was closer to the ideal step-shape, while dose exposure to the smaller airways was considerably increased. Compared to the robust LOW plan, the uncertainty span increased from 0.3% to 0.9% and 0.7% to 3.2% for left and right CTV, respectively.
In order to highlight the impact of individual patients, figure 3(a) shows a paired scatter plot of D95% values versus volume doses of the smaller airways of Patient 6 for robust 4D and conventional 4D plans. Both objectives, for target coverage and OAR sparing are depicted as dashed lines. To fulfill both objectives, the data points would ideally be situated in the top left quadrant with D95% ≥ 95% and D 0.5cc ≤ 12.4 Gy (normalized to 100 %). In figure 3(b) the same plot is shown for the conventional and robust plans with low OAR penalty factors (LOW). As in figure 3(a), the data points for the robust plan are more clustered than for the conventional plan and the cluster is shifted towards the beneficial upper left corner where both objectives for 100% target coverage and 100% OAR sparing would be fulfilled simultaneously. For the LOW plans, the planned dose was increased to 25 Gy. As a result, the D95% objective could be achieved in all the scenarios for both LOW plans.

Entire patient cohort 3.2.1. CTV dose coverage
figure 4(a) shows the D95% target coverage for all 24 tumors, comparing the resulting robust treatment plans with the conventionally optimized plans for all eight patients. For each patient, 21 uncertainty scenarios and 4 respiratory motion patterns are considered, which results in a total of 24 · 21 · 4 = 2016 D95% values. Violin plots are used to account for the multimodal character of D95% distributions which is due to variability between patients. Median values, first and third quartiles of the distributions, as well as minima and maxima excluding outliers, are shown in the overlaid boxplots. In order to eliminate the patient variability, the distribution of the D95% values of robust plans subtracted from D95% values of the conventional plans were also plotted. As the distributions are not Gaussian, statistical significance was calculated using the Wilcoxon signed-rank test.
The D95% median values were at 100.2% for the robust and 100.7% for conventional plans, respectively. The average values (standard deviation) were at 97.6% (8.0 %) for the robust plans and 98.7% (6.5 %) for the conventional plans, respectively. The mean value of the differences of both plan types was 1.1 %, which was considered as not statistically significant (p-value at 0.1). This is also reflected in the target coverage  objective (D95% ≥ 95 %), which the conventional plans could fulfill in 83.1% of cases, whereas the robust plans fulfilled it in 80.7% of cases.
In order to illustrate the increase in plan robustness, figure 4(b) shows the TUD from the RA as defined in section 2.6. For the robust plans, the median value is 0.25 %, with an uncertainty span of 3.36 %, which is a factor 2 smaller than for the conventional plans, with an uncertainty span of 6.38% and a median value of 0.15 %. Figure 5 shows the summarized values of D vol for all relevant OARs (smaller airways, heart, trachea, esophagus, aorta, spinal cord, lung without CTVs), D 0.5cc for the smaller airways only and D 15cc for heart Figure 5. OAR dose exposure for all 8 patients shown as violin plots (with overlaid boxplots) for the summarized volume doses D vol of all relevant OARs (smaller airways, heart, trachea, esophagus, aorta, spinal cord, lung without CTVs), D0.5cc for the smaller airways and D15cc for the heart, comparing the conventional 4D plans with the robust 4D plans, in red and blue, respectively. The purple violin plots show the distribution of the paired difference values of both plans. In all three cases, OAR dose exposure for the robust plans was significantly smaller than for the conventional plans (p-values ≪ 10 -4 ).

OAR exposure
only. The violin plots show the data for all 8 patients and are normalized to the corresponding threshold doses.
Summarized for all relevant OARs, the D vol median value is 2.3% for the conventional plans and 0.0% for the robust plans. The mean value is 17.6% (30.3 %) and 15.6% (26.5 %) for the conventional and robust plans, respectively. The mean value of the paired difference distribution is 2.1% which is significantly different from 0.0% with a p-value much less than 10 -4 . The median value of the differences is 0.0% and the standard deviation is 7.7 %. The volume dose limits are fulfilled in 96.2% of cases for the conventional plans, and in 99.5% of cases for the robust plans.
The D 0.5cc of the smaller airways are shown for 7 of the 8 patients only, as for Patient 5 no contours of the smaller airways were available. The D 0.5cc median value was found to be 83.2% for the conventional plans and 79.4% for the robust plans. The mean value was 66.5% (42.4 %) conventional plans and 58.3% (38.4 %) for the robust plans. The mean value of the pair-wise difference distribution was 8.2% which was significantly different from zero with a p-value much less than 10 -4 . The median value of the differences was 3.9% and the standard deviation was 13.8 %. The volume dose limit for the smaller airways was fulfilled in 76.2% of cases for the conventional plans, and in 96.0% of cases for the robust plans.
For the heart, the D 15cc median value was found to be 13.5% for both the conventional and the robust plans. The mean value was 24.1% (30.8 %) conventional plans and 18.9% (20.6 %) for the robust plans. The mean value of the pair-wise difference distribution was 5.3% which was significantly different from zero with a p-value much less than 10 -4 . The median value of the differences was 0.3% and the standard deviation was 11.0 %. The volume dose limit for the heart was fulfilled in 100.0% of cases for the robust plans, and in 94.0% of cases for the conventional plans, where the constraint was only violated for Patient 8.

Average treatment delivery times
The delivery times were calculated as described in section 2.5 and are shown in table 1. The average delivery time per patient was approximately 40 min. The longest delivery time was found for Patient 2 at 1 h and 46 min; however, this patient also has 5 tumors and was planned with 13 fields in total.

Average calculating times and memory usage
The calculating times needed for the robust 4D treatment plan optimization was on average 71.5 min; however, the variance of calculating times between single patients was very pronounced: The standard

Discussion
The results of this study have shown that treatment planning with robust 4D-optimization permits tailoring dose distributions to specific conditions which were more reliable in presence of uncertainties than conventionally 4D-optimized plans. In addition, the study has shown that a robustness analysis is necessary to identify patients with a high risk of dose degradation, beyond a dynamic dose calculation under different motion conditions. By combining the calculation of 4D dose distributions for 4 respiratory motions, as performed in the previous study by Anderle et al, with a RA on 21 uncertainty scenarios, a comprehensive overview of potential dose degradations was obtained. For example, interplay was visibly stronger in Patient 6 for the left CTV when the CT density was increased, as shown in figure 1.

Target coverage
As hypothesized, robust 4D-optimization reduces the variation of target coverage, which is discernible in the smaller DVH uncertainty bands for the robust plans, as shown in figure 2; however, due to the prioritization of OAR sparing during treatment planning, the robust optimization leads to a 1.1% loss of the average D95% target coverage when compared to the conventional plans. This loss of average target coverage is primarily caused by individuals like Patient 6, where the critical organs are particularly close to the target volume. If the distance between a CTV and an OAR is on the same order of magnitude as the shift cause by position or range uncertainty, the robust optimizer is confronted with scenarios where the dose cloud is shifted into the critical organ. Due to the given IMPT penalty factors for the considered VOIs, the CTV dose is consequently lowered in regions close to the OAR to fulfill the corresponding constraints. In comparison, the conventional 4D-optimization only considers the undisturbed nominal scenario and consistently only accounts for motion-induced shifts to the dose cloud. Therefore, the increased degree of complexity of any given patient case only presents itself when trying to perform a robust 4D-optimization.
In this study, the optimizer uses the same penalty factors for all voxels from the same VOI, not considering any dependency on the given scenarios, the pencil beams or the motion states. As described by (Bokrantz and Fredriksson 2017), the usage of fixed penalty factors in combination with worst-case optimization can cause unnecessary negligence of fulfilling a given objective in 'easy' scenarios, although the optimizer only failed to satisfy the objective in one single scenario. Using the proposed scenario-based margins method might help to further improve the dose coverage.
Compared to the conventional plan for Patient 6, the remaining parts of the dose distribution of robust plans-apart from the inherently generated low dose regions-stayed more homogeneous, as illustrated in the transversal dose views in figure 1. Even under uncertainty conditions such as lowering particle ranges, the robust plans result in less hot and cold spots. This can be explained by the reduced presence of internal dose gradients, which reduces the susceptibility to interplay effects of the robust dose distributions. In case of the conventional plan in figure 1, Patient 6 presents more hot and cold spots due to increased interplay.
An increase in robustness of the target coverage could also be seen in the reduced spread between the nominal D50% value and the remaining scenarios: i. For the robust plans, the target uncertainty as shown in figure 4(b) was considerably narrower than for the conventional plans, with the uncertainty span being a factor 2 smaller for the robust plans. ii. For the robust plans, the difference between nominal D50% and remaining D50% values was positively skewed, which means that overdosage was effectively limited by the maximal dose term of the robust objective function. In cases where an ideal target coverage could not be achieved due to OAR constraints, the robust 4D-optimization did nonetheless lead to an increased plan robustness for doses larger than 100 %, as shown in figure 2(b).
The large uncertainty band in the DVH for the left CTV of Patient 6, which is primarily due to the range uncertainty scenarios in the RA, exemplifies the situation of the conventional optimization failing to provide adequate certainty for delivering the planned dose distribution under the presence of uncertainties.
The DVH for the robust plan with low OAR penalty factors in figure 2(d) show an improvement of the target coverage while increasing OAR doses, in this case the dose to the smaller airways. Both, the uncertainty band and the average homogeneity index were improved. While target coverage could be achieved in all scenarios for robust and conventional optimization, robust optimization resulted in consistently lower OAR doses as well as uncertainty in OAR doses.
Considering the scatterplots of D95% versus D 0.5cc of the smaller airways in figure 3(a), the data points of the robust plan are more clustered than for the conventional plan and the dose volume constraint is fulfilled for the majority of uncertainty cases; however, this results in a considerable reduction of target coverage. For the conventional plan, the data points come closer to fulfilling the D95% objective, but at the same time, the constraint for the smaller airways is violated more often. It is noticeable that for the conventional plan, there are separated clusters for the left and right lung lobe while for the robust plan the clusters do visibly overlap. As mentioned above, the target coverage of the left CTV is heavily affected by the range uncertainty scenarios, which are accounted for during robust optimization.
Considering the LOW plans, the D95% is always larger than 95% of the prescribed target dose; however, only for the robust plans scenarios were found, where the dose volume constraint was fulfilled at the same time.

OAR exposure
When comparing the robust and conventional 4D-optimization with priority on OAR sparing, OARs on average receive less dose with robust 4D-optimization. However, the differences in the OAR dose exposure between the robust and the conventional plans were not as pronounced as between SBRT and carbon ion therapy (Anderle et al , 2018. As an example, for the original SBRT plans of the same patient cohort, the average D 15cc of the heart (also normalized to the constraint limit) was at 74% with a range of 50% to 98% even without performing RA. For the conventional plans in this study the average heart dose was considerably lower at 24.1 %, but with a range from 0% to 120.5% also scenarios occurred which violated the heart constraint (solely caused by Patient 8). In contrast, the average D 15cc of the heart for the robust plans was even lower at 18.9% (0 %-82.7 %). Hence, robust 4D-optimzation was able to push the heart volume dose below the constraint limit for all considered uncertainty scenarios.
The same holds true for the remaining considered critical structures, resulting in a significant reduction of the average volume dose by 2.1 %, comparing conventional plans to robust plans. For the robust plans, only 0.5% of the scenarios violated the dose volume constraints slightly by at maximum 12.6 %, where the conventional plans violated the constraint in 3.8% of cases with by at maximum 51.9 %.
Initially, all patients were optimized with the basic robust 4D-optimization. As those plans already yielded sufficient OAR sparing, the use of iterative objective optimization, introduced into TRiP4D by Anderle et al (2018), was only necessary for the robust optimization of Patient 6 to generate an adequate nominal plan. This is notable, as for treatment planning with conventional 4D-optimizationin the previous study, the use of iterative objective optimization was required for 4 of the 8 patients in order to create treatment plans with sufficient OAR sparing. As robust 4D-optimization required iterative objective optimization less often, less computational effort was necessary to meet the required OAR constraints, and the robust optimizer generally performed well in sparing critical structures.
OAR sparing was enhanced when they were located further from the CTVs (distances larger than 10 mm to 15 mm). These OARs were also spared by robust plans, although they were not explicitly considered in the optimization. This behavior may result from the more uniform and compact dose distributions to the target which avoided the use of unnecessary margins. This effect was especially prominent for the heart, though it was only considered during optimization for 2 of 8 patients. As described above, especially Patient 8 benefits from the improved sparing of the heart, as for this patient only with robust 4D-optimization, the dose volume constraint could be fulfilled in all uncertainty scenarios.
For critical structures in close vicinity to the target volume, robust 4D-optimization can show the most pronounced impact. For the presented lung tumor cases, the relevant structures were the smaller airways. When the OAR penalty factors were chosen to be larger than the CTV penalty factors for the robust optimization, this led to a priority on OAR sparing, and, in effect, generated low dose regions between the target and the critical organ. Although the exposure of the smaller airways was significantly reduced with this method, the superior OAR sparing caused a reduction to the target coverage. For the entire patient cohort, the robust optimization led to an average decrease of the volume dose to the smaller airways of 8 %, while the average decrease of D95% target coverage was only 1.1 %. This enabled an increase of the fulfillment rate of the smaller airways constraint by 19.8% for the robust plans, reaching 96% pass rate, which indicates that the increase in smaller airways sparing could outweigh the loss in target coverage.
Considering the importance of OAR sparing for hyperfractionated 4D treatments, robust 4D-optimization yields more predictable and reliable planning results than the conventional plans. Despite the partial loss in target coverage, the variation of the remaining homogeneous robust dose distribution is considerably less. The increased predictability of the robust treatment plans is also supported by the more clustered scatter plots for Patient 6 found in figure 3. Additionally, if situations occur where the objectives for target coverage and OAR sparing cannot be fulfilled simultaneously, the physician could still opt for a robust LOW plan with priority on target coverage as shown for Patient 6. This would result in significant increases in dose exposure to the critical organ, but the additional normal tissue dose would be applied in an intentional and controlled manner, even in the presence of uncertainties. This would be not possible with conventional 4D-optimization, were the OAR dose could be still higher and the target dose would be less certain than for the robust 4D-optimization, as presented for Patient 6 in figure 3.
Even further normal tissue dose reductions could be achieved with a robust implementation of conformal 4D-optimization, as proposed by Graeff (2014), where a treatment plan is optimized for each motion state of a 4DCT. As the worst-case method is a very conservative approach of robust optimization which can generate an unphysical superposition of uncertainty scenarios, methods like the minimax approach (Fredriksson et al 2011) could further improve the OAR sparing, as it considers the uncertainty scenarios on a plan-wise and not voxel-wise manner.

Average treatment delivery times
At first glance, an average treatment delivery time of 40 min seams to be rather long, but for high dose fractions and for patients exhibiting two or more tumors, the delivery times are rather reasonable.
The delivery times could be further optimized by either decreasing the number of desired rescans or by increasing the minimum particle limit during optimization. We recently implemented a new function into our optimizer which allows for increased minimum particle numbers (up to values of 10 6 ), while still accomplishing a satisfying convergence of the cost function value, which will be the topic of future research.

Average calculating times and memory usage
Compared to the times needed for treatment plan optimization, in the range of hours, the average 4D dose calculation times for RA were in the order of more than a day. This had a clear impact on the workflow, as not for every change in the optimization, it was possible to perform the comprehensive RA. Typically, the optimized treatment plan was tested for the nominal dose scenario, and if satisfying, was tested on all 84 dose scenarios. Therefore, it would be beneficial to have an even faster 4D dose calculation in order to perform more comprehensive tweaking of the treatment plan optimization.
As the focus of this study was not on absolute measurements of optimization durations, the calculations were performed indeed on powerful hardware, but still on a single blade insert or a single motherboard. In addition, the hardware was never reserved exclusively for our calculations. Hence, the time measurements are not entirely conclusive.

Comparison to similar optimization approaches
The robust 4D-optimization presented in Liu et al (2016) is based on 4D-optimization as presented in Graeff (2014). Compared to the method by Liu et al, the main difference of the robust 4D-optimization presented here is the capability to optimize RBE-weighted dose distributions with a variable RBE. In treatment planning for protons, the constant RBE value of 1.1 allows Liu et al to set up motion-state specific dose matrices and to later consider the average dose distribution over all usually 10 motion states. Only afterwards, Liu et al accounts for range changes and patient position shifts.
Due to the motion-state dependent variable RBE values, this is not possible for 4D-optimization of carbon ion. For the presented approach, the dose matrices for all 9 uncertainty scenarios are set up for all 10 motion states, resulting in 90 dose matrices which are persistent in the working memory throughout the entire robust 4D-optimizaion.
Apart from the technical differences, the presented robust 4D-optimization also reduces the susceptibility to interplay effects, even when compared to conventionally 4D-optimized treatment plans (as shown for Patient 6 in figure 1). For both robust 4D-optimization methods, the reduced interplay effect is presumably due the smoothed internal dose gradients.
One step further in reducing interplay would be the explicit consideration of different treatment delivery patterns already during optimization as proposed by Engwall et al (2018); however, such a method is not yet implemented in TRiP4D.

Study limitations
All 4D dose calculations presented in this study were based on regular respiratory motion patterns, which cannot be expected in clinical reality. In order to account for realistic respiratory patient motion, an adapted 4D dose calculation was recently implemented in TRiP4D; however, this algorithm needs series of 4DCTs in order to represent the temporal changes of patient's anatomy. So far it is still unclear how to reasonably generate those extended 4DCT data sets. Currently, the 4D XCAT digital phantom is used to create those data sets (Segars et al 2010). In order to create more patient specific extended 4DCTs, 4D MRI images registered on a CT image could be a potential solution (Boye et al 2013). So far, the presented results take into account only regular motion, and will likely show further dose deterioration in the presence of motion uncertainty. Though not explicitly shown in the results, the influence of variations in simulated motion period and starting phase was small compared to the impact of range changes and position shifts. For a clinical realization of the presented strategy-where daily changes in respiratory motion patterns are to be expected-likely larger safety margins have to be added, especially while gathering experience with such a novel strategy. In order to further minimize deviations, patients should be trained in regular breathing and be assisted by respiratory feedback monitoring.
In addition, the treatment plan optimization and RA with its 4D dose calculation were performed on the same 4DCT data sets, although changes in patient anatomy are to be expected between the day of image acquisition and the day of the treatment. The better solution would be to have consecutive sets of 4DCTs for the same patients (Graeff 2017), which were unfortunately not available for the presented patient cohort.

Conclusion
With the introduced robust 4D-optimization, it was possible to increase the robustness of the OAR sparing for 8 complex lung cancer patients compared to conventional 4D-optimization; however, the reduction in dose exposure of critical organs was found to potentially cause a decrease of target coverage in overlapping regions. This local loss in target coverage was especially pronounced in cases where the proximity of CTVs to critical organs was on the order of the given uncertainty parameters for the robust optimization. As robust 4D-optimization can consider uncertainty scenarios beyond motion induced range changes, the robust 4D-optimization will enhance local dose reductions typical of IMPT when prioritizing OAR sparing during treatment planning. However, the remaining homogenous regions of the robust dose distribution experienced less variation in the presence of uncertainty scenarios.
In patient cases where the objectives for target coverage and OAR sparing could not be fulfilled simultaneously due to the anatomical configuration of CTV and critical organs, robust 4D-optimization could also provide a plan with robust target coverage by prioritizing target voxels over OAR voxels. Although this would increase the OAR exposure, the dose given to the critical structures would then still experience less variation than for the conventional plan.