A study on the effect of detector resolution on gamma index passing rate for VMAT and IMRT QA

Abstract The main objectives of this study are to (1) analyze the sensitivity of various gamma index passing rates using different types of detectors having different resolutions and (2) investigate the sensitivity of various gamma criteria in intensity‐modulated radiation therapy (IMRT) and volumetrically modulated arc therapy (VMAT) quality assurance (QA) for the detection of systematic multileaf collimator (MLC) errors using an electronic portal imaging device (EPID) and planar (MapCheck2) and cylindrical (ArcCheck) diode arrays. We also evaluated whether the correlation between the gamma passing rate (%GP) and the percentage dose error (%DE) of the dose–volume histogram (DVH) metrics was affected by the finite spatial resolution of the array detectors. We deliberately simulated systematic MLC errors of 0.25 mm, 0.50 mm, 0.75 mm, and 1 mm in five clinical nasopharyngeal carcinoma cases, thus creating 40 plans with systematic MLC errors. All measurements were analyzed field by field using gamma criteria of 3%/3 mm, 3%/2 mm, 3%/1 mm, and 2%/2 mm, with a passing rate of 90% applied as the action level. Our results showed that 3%/1 mm is the most sensitive criterion for the detection of systematic MLC errors when using EPID, with the steepest slope from the best‐fit line and an area under the receiver operating characteristic (ROC) curve >0.95. With respect to the 3%/1 mm criterion, a strong correlation between %GP and %DE of the DVH metrics was observed only when using the EPID. However, with respect to the same criteria, a 0.75 mm systematic MLC error can go undetected when using MapCheck2 and ArcCheck, with an area under the ROC curve <0.75. Furthermore, a lack of correlation between %GP and %DE of the DVH metrics was observed in MapCheck2 and ArcCheck. In conclusion, low‐spatial resolution detectors can affect the results of a per‐field gamma analysis and render the analysis unable to accurately separate erroneous and non‐erroneous plans. Meeting these new sensitive criteria is expected to ensure clinically acceptable dose errors.


| INTRODUCTION
Patient-specific quality assurance (QA) for intensity-modulated radiation therapy (IMRT) and volumetrically modulated arc therapy (VMAT) is extremely important in ensuring quality care for cancer patients in radiation therapy. Various methods, including the use of an ion chamber, 1 two-dimensional (2D) array detectors, 2,3 and an electronic portal imaging device (EPID), 4,5 have been employed during patient-specific QA in pretreatment verification to detect possible errors between the dose calculated by the treatment planning system (TPS) and the measured dose. Due to the increasing complexity of modulated treatment plans and delivery, point dose measurements using an ion chamber alone may not be sufficient to verify dosimetric accuracy because a modulated plan can generate a steep dose slope near the organs at risk.
A common tool for evaluating the agreement between the calculated dose and the measured dose is the quantitative comparison of the planar dose distribution using the gamma index 6 . Task Group (TG) 119 generated by the American Association of Physicists in Medicine (AAPM) described the following acceptance criteria: a 3% dose difference (%DD) with a global normalization method and a 3-mm distance-to-agreement (DTA) for a per-field analysis. In addition, an action level of a 90% gamma passing rate (%GP) is applied with a dose threshold of 10% to remove background noise. 7 However, many studies [8][9][10][11] have suggested that a lack of correlation exists between %GP and dosimetric accuracy even when more stringent gamma acceptance criteria are used.
Previous studies [8][9][10][11][12] suggesting the insensitivity of gamma analysis have been based on similar approaches, such as (1) a per-field analysis by reducing the acceptance criteria %DD and DTA simultaneously, for example, 3%/3 mm, 2%/2 mm, 1%/1 mm; (2) measurements made with commercial QA devices with a detector spacing of at least 7 mm; and (3) a correlation of the %GP with the percentage dose error (%DE) from a dose-volume histogram (DVH) model. The last approach uses a poor-resolution detector on a homogeneous phantom and applies the data to a patient CT dataset to derive DVH. In addition, Bailey et al. 13 reported that undersampling by low-spatial resolution array detectors may potentially affect the responses of a gamma index analysis. Moreover, a recent study showed that not all induced errors can be captured by the 3DVH software 14 and that a huge discrepancy in %DE is found on certain DVH metrics, ranging from an average value of À67.88% to 15.26% between the TPS and a COMPASS reconstructed dose, 15 in addition to large DDs observed between the TPS and 3DVH. 12 Furthermore, Nelms et al. 16 showed that a major contributor to the insensitivity of gamma analysis is the DTA threshold due to modern linear accelerators that can maintain an accuracy of 1 mm using a multileaf collimator (MLC). This finding raises concern about whether the lack of a correlation between % GP and %DE will occur only on QA devices with low-spatial resolution and a stringent acceptance criterion of only 2%/2 mm and 1%/1 mm. Although an acceptance criterion of 3%/3 mm has been reported by many authors [8][9][10][11][12]16 to be a poor predictor of dosimetric accuracy, new standardized gamma acceptance criteria for IMRT and VMAT QA have yet to be established.
Our main objective is to study the effect of detector resolution on the gamma index passing rate. This goal was achieved by investigating (1) the sensitivity of various gamma acceptance criteria by simulated MLC systematic errors in IMRT and VMAT plans; (2) the correlation between patient DVH errors reconstructed using trajectory log files and %GP; (3) the consistency, sensitivity, and performance across EPID, planar, and cylindrical diode arrays; and (4) whether the same action level and gamma criteria applied in IMRT QA can be applied in VMAT QA.

2.A | Patient selection and treatment planning
Five head and neck patients diagnosed with nasopharyngeal carcinoma (NPC) were selected from our database for this study. All five cases were generated with the Eclipse TM planning system (version A two-arc VMAT and a nine-field IMRT plans were generated using 6 MV photon beams with a 600 MU min À1 dose rate and the following prescription: 70 Gy (2 Gy/fraction) to the planning target volume (PTV) containing a primary gross tumor and gross positive lymph nodes, a 63 Gy (1.8 Gy/fraction) to the PTV with high-risk nodes, and a 56 Gy (1.6 Gy/fraction) to the PTV with low-risk nodes. When planning a risk volume, a 5-mm margin was added around critical organs such as the spinal cord and brainstem to account for the geometric uncertainties of an organ and thereby achieve maximum doses of <45 Gy and <54 Gy, respectively. Many other normal structures, such as the parotid glands (left-L, right-R), the mandibular and temporal mandibular joints, and the optic chiasm WOON ET AL. | 231 and the optic nerves, were included in the optimization process; however, only the parotids, spinal cord, brainstem, and the PTV receiving 70 Gy (PTV 70 ) were analyzed in this study. For all NPC plans, at least 98% of the PTVs must be achieved with 95% of the prescription dose, not exceeding more than 107% of the prescription dose.

2.C | Dose evaluation in DVH-based metrics
To evaluate the DD in each DVH metric, all of the modified plans were compared with the original plan, and the %DE was subsequently calculated using the following equation:

2.D | Detectors and software for dose evaluation
All IMRT plans were delivered for pretreatment verification and mea-

2.G | True errors and true error positions
Forward IMRT planning using a single field generated such that 20% of the prescription dose was delivered to a field size of 10 9 8 cm 2 while simultaneously boosting a 0. Furthermore, gamma analysis with the same acceptance criteria previously described was also used to assess whether the %GP could correctly include these simulated errors when different detectors were compared.  MLC error for both the IMRT and the VMAT plans. However, verification of VMAT plans using absolute gamma comparison with 3%/1 mm failed to achieve a passing rate of 90%. In contrast, verification of the VMAT plans using a relative gamma comparison with 3%/1 mm was less sensitive, as indicated by a lower negative slope than that for the absolute gamma method. Moreover, the passing rate was much higher than 90%, even when a 1-mm systematic MLC error was considered. When a 95% passing rate was applied as a new action level for 3%/1 mm using the relative gamma method, a 0.25-mm systematic MLC error could be detected.

3.C | Sensitivity and performance of various gamma criteria based on ROC analysis
Further analysis of the sensitivity and performance of the various acceptance criteria for each QA device with an ROC is shown in  3.D | Changes in the DE% with respect to the MLC error Table 1 shows the relative %DE values for the original plan and the modified plan edited with log files referred to as "Random." Table 1 and Fig. 6 also show the relative %DE between the original plan and

3.E | Statistical correlation between %GP and %DE
The statistical correlations (R 2 and r) between %DE and %GP with their respective P-values are shown in Fig. 7 and Table 4. The most sensitive acceptance criterion of 3%/1 mm for the pretreatment verification using the EPID shows a better correlation between the %GP and the relative %DE with respect to each structure than the other acceptance criteria. However, the correlation between the %GP and the relative %DE with respect to each DVH metric from the ArcCheck and MapCheck2 was better with 2%/2 mm, indicating that the sensitivity of the various acceptance criteria differs in certain cases.

3.F | Consistency analysis of different QA tools
Pretreatment verification of the IMRT and VMAT plans with the EPID is more consistent than verification with MapCheck2 and Arc-Check, as shown in Fig. 4. An acceptance criterion of 3%/1 mm was the most sensitive for all plans, with simulated MLC systematic errors of similar magnitude. In addition, Fig. 7 and Table 4  IMRT QA performed using the EPID was observed, which has not been previously reported.
Using the most sensitive criterion for the MapCheck2 with a 90% passing rate as the action level for the IMRT QA, false positives and negatives occurred, and a passing rate below 90% did not indicate large differences in the DVH and vice versa. Furthermore, a weak correlation was observed between the %GP and the %DE for all of the IMRT QA performed with the MapCheck2. These results are similar to previously reported results. 9,12,15 The ArcCheck displayed the worst performance among all three devices as a QA tool. As shown in Fig. 1, the device failed to detect a simulated MLC error of 1 mm in the IMRT plan. Furthermore, a reasonable action level could not be established when a more stringent criterion was considered. The original plan had already failed to achieve a passing rate higher than 90% with respect to the most sensitive gamma criteria used. Similar to the MapCheck2 results, false-positive and false-negative errors were also observed with the ArcCheck; the red box in Fig. 4 indicates the inability of the MapCheck2 and ArcCheck to distinguish between an original and an erroneous plan, which suggests that low-spatial resolution affects the gamma index analysis because the dose distribution was undersampled, 17 as confirmed by the ROC and AUC results. Worse yet is the result for certain cases in which a 0.75 mm systematic MLC error was undetected due to the poor T A B L E 2 Slope of the best-fit line between %GP of various gamma criteria and MLC errors.  A strong correlation was observed between the %GP and the %DE when performing the VMAT QA using the EPID and an absolute gamma analysis. However, the original plan did not achieve a 90% passing rate; therefore, a relative gamma analysis was used instead. The passing rate in the relative gamma analysis was higher than in the absolute gamma analysis because the average DD between the calculated and measured dose distributions was minimized. This condition weakened the correlation between the %GP and the %DE (as indicated by the blue box in Fig. 4) and rendered the technique unable to detect the erroneous condition at a 90% passing rate applied as the action level. However, when a 95% passing rate with relative gamma analysis was used instead for the VMAT QA using the EPID with a weak-to-moderate correlation between the %GP and the %DE, a clear distinction could be drawn between the original and the erroneous plan.

Slope of the best-fit line
It was also investigated whether the MapCheck2 and Arc-Check could produce correctly detected errors at the points where the true errors occurred in this study. As shown in Fig. 3 The question may still remain regarding whether a binary pass or fail classifier in a per-field analysis can indicate the location and magnitude of a DE, but if the correct acceptance criteria are employed, 2-3% changes in the DVH metrics can be detected using a reasonable action level. Furthermore, our results are consistent with those of the study by Nelms et al., 16 which showed that the DTA threshold is one of the primary insensitive metrics for the gamma criteria for detecting systematic errors. One of the main limitations of this study was the limited number of patients used to investigate whether the established action levels and acceptance criteria were consistent; however, this is a pilot study, and more samples will be included in future studies. The results of this study indicate that an acceptance criterion of 3%/1 mm is the most sensitive for IMRT and VMAT QA to detect any systematic MLC errors; however, the criteria may vary between detector systems with different resolutions.
Therefore, it is important to evaluate a system's limitations with respect to its detectable error range, uncertainty, and reliability before deciding on a more sensitive gamma criterion. In addition, care should be taken when establishing the action level, as this level may vary due to differences in TPS commissioning and the QA devices employed.

| CONCLUSION
This study investigated the sensitivity of various gamma criteria for the detection of changes in the DVH by deliberately introducing systematic MLC errors of the same magnitude into all IMRT and VMAT plans. The correlation between the %DE and the %GP evaluated by different QA devices was also investigated. Our findings confirmed that the lack of correlation between the %DE and the %GP was due to the resolution, which was not sufficient to detect MLC systematic errors when using array detectors. This analysis suggested that detector resolution can affect gamma analysis and lead to misleading IMRT/VMAT QA results by incorrectly detecting MLC systematic errors. Our study showed that an acceptance criterion of 3%/1 mm is the most sensitive and can distinguish the original condition from an erroneous condition with a systematic MLC error using the EPID.
A strong correlation between the %GP and the %DE was observed when QA was performed on a high-resolution device such as the EPID using a gamma criterion of 3%/1 mm. Moreover, an acceptance criterion of 3%/1 mm can be applied to both the IMRT and VMAT QA; however, the action levels for the IMRT and VMAT are slightly different. The adoption of a more sensitive criterion can ensure that a plan is clinically acceptable with no systematic MLC errors when every field passes the gamma criterion.

ACKNOWLEDG MENTS
None to declare.

CONFLI CT OF INTEREST
The authors declare no conflict of interest.