Statistical approach to the selection of the tolerances for distance to agreement improves the quality control of the dose delivery in radiotherapy

Mateusz Baran; Damian Kabat; Monika Tulik; Krzysztof Rzecki; Tomasz Sośnicki; Zbisław Tabor

doi:10.1088/1361-6560/ab86d5

1. Introduction

The gamma index analysis (Low et al 1998) is a current standard method used in radiotherapy for both machine quality control purposes and treatment plan evaluation (Hussein et al 2017a, 2017b). It was originally proposed as a method for defining dose delivery error in an evaluated point of dose delivery; if the value of gamma index in an evaluated point exceeds some threshold γ_Th (usually set to 1), then a dose delivery error is reported in this point. The gamma index analysis is summarized in the form of a gamma index passing rate; the percentage of points with gamma index values less than γ_Th . However, the gamma index analysis provides more rich feedback as it also returns the spatial locations of dose delivery errors, in contrast to other plan evaluation approaches, for example, dose volume histograms (DVHs) (Visser et al 2014). The formula for gamma index involves some user-selected constants: the dose difference (DD) tolerance and the distance-to-agreement (DTA) tolerance. Most frequently, the DD tolerance is set in the range from 1%–3% of the maximal planned dose while DTA tolerance is set in the range from 1 mm to 3 mm for advanced treatment plans (Blanck et al 2015). If the passing rate exceeds some user-selected passing rate criterion, then the plan delivery is accepted for further steps of radiotherapy procedure. The passing rate criterion is set in conjunction with the DD and DTA tolerances. Clearly, setting more stringent DD or DTA tolerance forces selection of a lower value of the gamma passing rate criterion.

The selection of the constants necessary for conducting gamma index analysis is not an obvious task with any erroneous decisions posing a risk of harm to a treated patient. This is one of the reasons making conclusions on the quality of dose delivery based on gamma index values is not an easy and straightforward task. Moreover, it seems that the selection of DD and DTA tolerances and the passing rate criterion are inconsistent among different studies and radiotherapy centers. For example, recommendations of AAPM Task Group No. 218 (Miften et al 2018) contain a thorough review of these inconsistencies.

As claimed in (Li et al 2011), DD and DTA tolerances and the passing rate criterion are set empirically and there are few studies focused on developing physically or statistically sound methods for selecting these constants. In the study of (Li et al 2011) surface-based distance was proposed to measure the difference between planned and delivered dose distributions. The authors proposed that the dose gradient factor represents the weighting between spatial and dose components in the comparison of planned and delivered dose distributions. Then, the gradient factor is used to select appropriate DTA and DD tolerances. The importance of gradient information for the selection of appropriate DTA and DD tolerances was further considered in the study of (Low et al 2013). In particular, it was shown in (Li et al 2011) that for steep dose gradients gamma index dose comparison reduces to the DTA analysis. Consequently, to make correct conclusions about the quality of dose delivery, the ratio of DD to DTA tolerances must be adjusted to gradients present in the plan.

In recently published recommendations of AAPM Task Group No. 218 (Miften et al 2018) it is stated that 'tolerance limits are defined as the boundaries within which a process is considered to be operating normally, that is, subject to only random errors.' It is further recommended in (Miften et al 2018) that an action limit for an evaluated quantity is dependent on the standard deviation of this quantity, as calculated for a system operating normally, subject to only random errors. A statistical approach to DD tolerance selection following this recommendation was demonstrated in (Tulik et al 2019). It has been shown in (Tulik et al 2019) that if the DD tolerance is set locally proportional to the standard deviation of delivered doses, then for a clinically relevant range of passing rates the true positive rate is higher than for a current standard method. The statistical approach to action-level selection in radiotherapy is not limited to dosimetric procedures. In a study of (Baran et al 2019) a statistical framework for conducting tests of linac geometry was described.

Setting DD tolerances locally based on features of an assessed plan of dose delivery is not uncommon (Stojadinovic et al 2015, Tulik et al 2019). In contrast, setting DTA tolerances locally based on features of an assessed plan of dose delivery, to the best of our knowledge, has not been considered in spite of possibly serious consequences of wrong selection of these tolerances (Low et al 2013). Keeping DTA tolerances constant over an entire field of the assessed plan had until recently some justification. In particular, quality control is based on the comparison of doses measured by specialized phantoms, for example, 2D dosimetric arrays. According to a traditional approach an evaluated plan is recomputed to the geometry of a measurement device and then the measurement results are directly compared with the recomputed plan. Consequently, any dose gradient present for inhomogeneous anatomy of a patient is not directly reflected in gradients of a plan recomputed for typically homogeneous geometry of a phantom. Therefore, controlling the shift of isodoses in steep gradient regions in a plan recomputed for a phantom has only limited consequences for the precision of dose delivery to a real patient.

This perspective has however changed since the introduction of measurement-guided dose reconstruction (Nelms et al 2012, Stasi and Bresciani 2012, Visser et al 2014). Measurement-based dose reconstruction delivers tools to estimate doses delivered to a patient anatomy based on the results of measurements made with conventional 2D phantoms. Then, the dose estimated for the volume of a patient body is compared with the dose planned for a patient using commonly used approaches such as DVH or gamma index. With the availability of such precise dose estimates the arguments listed in the previous paragraph do not hold any more and precise control over the location of gradients must become a vital part of a reliable treatment plan quality control procedure.

Here, we propose a local approach to setting reference tolerance values for the DTA component of gamma index. The motivation for setting reference tolerance for DTA locally is as follows. The dose delivery plan contains regions of both high and low dose gradient. Regions of high gradient may be related to organ or target borders while regions of low gradient may correspond to regions of homogeneous dose, for example, organ/target interiors. Small spatial shifts in dose delivery within organ boundary regions may possibly lead to delivering high doses to organs at risk and small dose to a planned target. Therefore, the quality control should be more sensitive to spatial shift errors within high dose gradient regions. On the other hand, higher spatial shifts in homogeneous dose regions are less dangerous because for relatively homogeneous doses they lead to relatively small changes in dose delivered compared to planned dose. For the above reasons, it is reasonable to expect that tolerance values for DTA will vary, depending on a local dose gradient magnitude.

Table 1 contains a list of symbols used in this work.

Table 1. List of symbols.

P, Q	Points
P_i,j	Point at position (i, j)
d(P, Q)	Distance between points P and Q
D(P)	Planned dose at point P
D_M (P)	Measured dose at point P
DD(P, Q)	Equal to D(P)-D_M (Q)
Δ_DD (P)	Dose difference tolerance at point P
Δ_DTA (P)	Distance-to-agreement tolerance at point P
γ(P)	Gamma index at point P
GP	Gamma index passing rate
DTA(P)	Explained below equation (2)
σ_DTA (P)	Explained below equation (4)
P_TH	Probability threshold
f_P	Probability density function of dose at point P
Pr(P, d)	Probability that the dose measured at point P belongs to an interval defined below equation (7)
Θ_PTH (P)	Distance d at which Pr(P, d) at point P is equal to P_TH

2. Materials and methods

The study is based on analysis applied to 22 cases: five AAPM TG 119 IMRT tests (except I4 and I5) (Ezzell et al 2009), 16 prostate cancer cases and one sarcoma case. Further details of the data set are described in (Tulik et al 2019). Plans were calculated for a grid with density of 1 pixel per millimeter.

The gamma index evaluation is performed as follows. For each point P of a plan, point-to-point DDs DD(P, Q) were calculated. They are the differences between the dose D(P) planned for an evaluated point P and the doses D_M (Q) that were measured during plan validation at reference points Q, that is, DD(P, Q) = D(P)—D_M (Q). The formula for the gamma index γ(P) at point P is given by:

$\begin{equation}{\rm{\gamma }}\left( P \right) = \mathop {{\rm{min}}}\limits_{Q \in {N_P}} \sqrt {\frac{{d{{({\rm{P, Q}})}^2}}}{{{{\rm{\Delta }} {\Delta}_{{\rm{DTA}}}}{{\left( P \right)}^2}}} + \frac{{{\rm{DD}}{{({\rm{P, Q}})}^2}}}{{{{\rm{\Delta }} {\Delta}_{{\rm{DD}}}}{{\left( P \right)}^2}}}} ,\end{equation} \tag{ 1 }$

where Δ_DD and Δ_DTA are selected tolerances for, respectively, DD and DTA, the point Q belongs to some neighborhood N_P of P and d(P, Q) is the distance between P and Q. The tolerances can be set either globally, that is, equal for all points in a plan, or locally in which case both Δ_DD and Δ_DTA can vary between points.

The values of gamma index are thresholded using a user-selected threshold value γ_Th . In clinical practice, typically γ_Th is equal to 1. If γ(P) is greater than γ_Th _, then, by definition, P is a point of an erroneous dose delivery, otherwise P is a point of a correct dose delivery. Eventually, the fraction of points with correctly delivered doses is also calculated (gamma index passing rate GP) with the value of 100%, meaning that all the deviations between the planned and measured doses fall below γ_Th . The plan is rejected if the value of GP is below a certain threshold, for example, 90% (Li et al 2011) or above 95% (Sothmann et al 2016), although even plans with a high value of GP may not be clinically acceptable (Miften et al 2018).

The expression for gamma index is fairly complicated as it depends on planned and delivered doses at reference and evaluated points, distance between reference and evaluated points, tolerances Δ_DD and Δ_DTA and requires searching for a minimum over some set of reference points. This is one of the reasons making conclusions on the quality of dose delivery based on gamma index values is not a straightforward task. Here, we focus on how the distance component and tolerance of DTA influence the percentage of points GP passing the quality control procedure. To separate the effect of the dose component from the effect of the distance component onto quality control results, we consider a limit of a very small Δ_DD . When continuous interpolations of measured realizations are considered one has in this limit:

$\begin{equation}\mathop {{\rm{lim}}}\limits_{{{\Delta }_{DD}} \to 0} {\rm{\gamma }}\left( P \right) = \frac{{DTA\left( P \right)}}{{{{\Delta }_{DTA}}\left( P \right)}},\end{equation} \tag{ 2 }$

where DTA(P) is the distance between the pixel P and the closest position Q in the interpolated measurement, so that DD(P, Q) = 0 (which may lie between some pixels). This limit holds because the agreement is exact at point Q. Note that the ratio of DTA to Δ_DTA is also used in a compositional approach to quality control (Harms et al 1998).

2.1. Probability-based approaches to DTA action levels

It would be plausible to state formally the comparison of planned and delivered doses within the framework of statistical inference as a problem of multiple statistical testing. In particular, for every evaluated point P we formulate a null hypothesis that there is no delivery error against an alternative hypothesis that there is a delivery error in P. Formal statistical testing of the presence of a delivery error in some evaluated point must be based on an appropriate universal function—a test statistic that does not depend on particular features of a plan. In particular, it would be plausible to define a random variable Γ depending on the dose planned in P and doses measured in some neighborhood of P, so that Γ is independent of P and thus also independent of a particular plan being delivered to a patient.

For example, if X is a normal random variable with expected value equal to m_X and standard deviation equal to s_X , then a random variable X_U defined as:

$\begin{equation}{X_U} = \frac{{X - {m_X}}}{{{s_X}}},\end{equation} \tag{ 3 }$

is distributed in accordance with a standard Gaussian distribution, that is, the distribution of X_U is universal, independent of a particular distribution of X. The existence of such universal distributions derived from measured data is a prerequisite of a model-based statistical inference.

Inspired by the analogy from statistical inference, we propose that the distance component in the definition of gamma index be normalized in the following way:

$\begin{equation}{\Gamma } = \mathop {{\rm{lim}}}\limits_{{{\Delta }_{DD}} \to 0} {\rm{\gamma }}\left( P \right) = \frac{{DTA\left( P \right)}}{{{{\Delta }_{DTA}}\left( P \right)}} = \frac{{DTA\left( P \right)}}{{{{\rm{\sigma }}_{DTA}}\left( P \right)}},\end{equation} \tag{ 4 }$

where σ_DTA (P) is the standard deviation of the distribution of DTAs calculated in P, for example, over multiple dose deliveries of the same plan. In the following section, we demonstrate that Γ has indeed the property of being a universal statistic.

The standard deviation σ_DTA (P) for DTA can be in principle determined from simulations. For example, based on the noise model described in (Tulik et al 2019) and summarized in the next section, simulated dose delivery data can be generated and then the DTA can be calculated for every point in a plan using, for example, the algorithm described in (Harms et al 1998). Data sufficient for determining σ_DTA (P) in every evaluated point can be collected by repeating this procedure multiple times. It is also possible to estimate from the simulations the probability density function (PDF) of DTA and in particular quantiles corresponding to user-selected probability thresholds P_TH . Given the PDF of DTA, σ_DTA (P) can be estimated in a standard way.

The simulation-based procedure to estimate the PDF of DTA while being straightforward is computationally expensive. Thus, below we estimate the PDF of DTA using analytical considerations.

Let the dose planned for P be equal to D(P) and the dose measured at P be equal to D_M (P). It is assumed that the measured dose D_M (P) at pixel P is a random number drawn from a Gaussian distribution described by a PDF f_P with an expected value equal to D(P) and some standard deviation σ(D(P)). In our previous study (Tulik et al 2019), we have shown that σ(D(P)) grows linearly with dose D(P) planned in P with proportionality coefficient equal to approximately 0.1. It is also assumed that doses delivered at different locations are independent. While it is true that this assumption generally does not hold, its influence on the conclusions of our analysis will be tested.

To calculate the quantiles of PDF of DTA, given a dose D(P) planned for P, we calculate the distance from P to the isodose D(P) in the grid of measured doses. To find the probability that isodose D(P) is found at distance d from P, we define the following function Z(P, d):

$\begin{equation}Z\left( {P,d} \right) = \left\{ {Z\left( {{P_i}} \right):d\left( {{P_i},P} \right) \leqslant d} \right\},\end{equation} \tag{ 5 }$

where Z(P_i ) is a random number drawn from ${f_{{P_i}}}$ for all N(P, d) pixels P_i , so that their distance d(P_i, P) to P is smaller than or equal to d. Given Z(P, d), we define the following two random variables:

$\begin{equation}{Z_{min}}\left( {P,d} \right) = {\text{min}}\; Z\left( {P,d} \right),\end{equation} \tag{ 6 }$

$\begin{equation}{Z_{max}}\left( {P,d} \right) = {\text{max}}\; Z\left( {P,d} \right).\end{equation} \tag{ 7 }$

Given Z_min (P, d), Z_max (P, d) and D(P), one may ask what is the probability Pr(P, d) that $D\left( P \right) \in \left[ {{Z_{min}}\left( {P,d} \right),{Z_{max}}\left( {P,d} \right)} \right]$ . The probability Pr(P, d) is a function of point P and the probed distance d (figure 1), it is monotonic, equal to zero at d= 0 and approaches one as d increases to infinity.

In the plot of Pr(P, d) versus d (figure 1), P_TH corresponds to some critical value of d, which we choose as a quantile Θ_PTH (P) for the tested pixel P. For high gradients of the planned doses, the dose values around P are more scattered and thus Pr(P, d) grows faster with distance d than in the case of small gradient in which case the plot of Pr(P, d) versus d is more flat (figure 1). Consequently, setting exactly the same P_TH , we get smaller Θ_PTH (P) for high gradient regions than for small gradient regions. Varying P_TH , the other quantiles of the PDF of DTA can be computed from which the PDF of DTA can be estimated.

Given the set Z(P, d) of N(P, d) random numbers (equation (6)) corresponding to doses delivered in pixels P_i, so that $d\left( {{P_i},P} \right) \leqslant d$ and assuming that doses delivered at different locations are independent, Pr(P,d) can be calculated from the following expression:

$\begin{equation}Pr\left( {P,d} \right) = \mathop \sum \limits_{i = 1}^{N\left( {P,d} \right)} \mathop \sum \limits_{j = 1,j \ne i}^{N\left( {P,d} \right)} \mathop \smallint \limits_{ - \infty }^{D\left( P \right)} d{x_i}{f_{{P_i}}}\left( {{x_i}} \right)\mathop \smallint \limits_{D\left( P \right)}^\infty d{x_j}{f_{{P_j}}}\left( {{x_j}} \right)\mathop \prod \limits_{k = 1,k \ne i,j}^{N\left( {P,d} \right)} \mathop \smallint \limits_{{x_i}}^{{x_j}} d{x_k}{f_{{P_k}}}\left( {{x_k}} \right),\end{equation} \tag{ 8 }$

where ${f_{{P_i}}}$ is the probability density function of the dose D_M (P_i ) delivered at the location corresponding to pixel P_i and x_i for $i \in \left\{ {1,2,\ldots,N\left( {P,d} \right)} \right\}$ corresponds to possible doses at point P_i . Assuming that each ${f_{{P_i}}}$ is a Gaussian distribution, Pr(P,d) can be calculated directly from equation (8).

Instead of calculating Pr(P,d) directly from equation (8) one can transform it to a form that is easier to compute. It is always true that either D(P) is smaller than or equal to Z_MIN (P, d), D(P) is greater than or equal to Z_MAX (P, d) or D(P) is between Z_MIN (P, d) and Z_MAX (P, d). Hence, the sum of probabilities of these three events is equal to 1. Instead of calculating the probability of the third event, which is equal to Pr(P, d), the probabilities of the other two events can be calculated and subtracted from 1.

The probability Pr_MIN (P, d) that D(P) is smaller than Z_MIN (P, d) is equivalently the probability that D(P) is smaller than all elements of Z(P, d). Since we assume that distributions of elements of Z(P, d) are independent, Pr_MIN (P, d) can be expressed as a product of probabilities:

$\begin{equation}P{r_{MIN}}\left( {P,d} \right) = \mathop \prod \limits_{i = 1}^{N\left( {P,d} \right)} \mathop \smallint \limits_{ - \infty }^{D\left( P \right)} {f_{{P_i}}}\left( {{x_i}} \right)d{x_i}.\end{equation} \tag{ 9 }$

The probability Pr_MAX (P, d) that D(P) is greater than Z_MAX (P, d) can be calculated in a similar way:

$\begin{equation}P{r_{MAX}}\left( {P,d} \right) = \mathop \prod \limits_{i = 1}^{N\left( {P,d} \right)} \mathop \smallint \limits_{D\left( P \right)}^\infty {f_{{P_i}}}\left( {{x_i}} \right)d{x_i}.\end{equation} \tag{ 10 }$

Finally, Pr(P, d) can be calculated as 1—Pr_MAX (P, d)—Pr_MIN (P, d).

2.2. Noise model

To simulate the spatial distributions of delivered doses, we use a noise model proposed and verified in our previous study (Tulik et al 2019). For convenience, some details of this noise model are summarized below. In particular, the plan D_PL is a rectangular matrix of planned doses D(P_i,j ) for pixels P_i,j at coordinates (i, j) in the grid of size I by J pixels:

$\begin{equation}{D_{PL}} = \left\{ {D\left( {{P_{i,j}}} \right):i = 1,2, \ldots ,I,j = 1,2, \ldots ,J} \right\}.\end{equation} \tag{ 11 }$

To simulate dose delivery with no systematic error, we add noise to D_PL; in each (i, j) we generate a random number sampled from a Gaussian distribution with zero average and standard deviation proportional to D(P_i,j ). The resulting matrix of random errors E is convolved with a Gaussian smoothing kernel to reproduce autocorrelation of the noise, which can be seen in the real data (Low et al 2013). The resulting matrix SE is added to D_PL . In agreement with experimental data, we assumed that the standard deviation of the Gaussian noise in (i, j) is equal to 1.25 times D(P_i,j ) and the size of the smoothing kernel is equal to 3.6 mm. After smoothing the standard deviation of the noise, it is equal to 0.1 times the dose value. A noise model without autocorrelation was also considered.

2.3. Simulations

In the first computational experiment, dose delivery simulation was based on the Gaussian noise model without autocorrelation. In these simulations, standard deviation of delivered doses was equal to 0.1 times the dose value, in agreement with published data (Tulik et al 2019). One hundred dose deliveries were simulated.

Next, we considered simulation of dose delivery based on the Gaussian noise model with autocorrelation. In these simulations, standard deviation of delivered doses was equal to 1.25 times the dose value and the size of the Gaussian kernel was equal to 3.6 mm, in agreement with published data (Tulik et al 2019) that together leads to standard deviation of delivered doses effectively equal to 0.1 times the planned dose value, as in the case of uncorrelated noise. Again, one hundred dose deliveries were simulated.

In the final experiment, we considered simulation of dose delivery based on the Gaussian noise model with autocorrelation and a systematic error—a shift by a random vector with components drawn independently from a Gaussian distribution with some non-zero mean and standard deviation equal to 2 pixels. For each considered value of the systematic error 100 random shifts were simulated for each of the 22 plans.

3. Results

In figure 2, we show a sample plan together with pictures demonstrating variations of σ_DTA and Θ_PTH over the plan. In the figure, we show Θ_PTH for P_TH = 0.8, denoted as Θ_0.8, as calculated directly from the simulations as well as using the analytical approach. The average and standard deviation of σ_DTA for the sample plan were equal to 3.1 and 2.8 pixels, respectively. The average and standard deviation of Θ_0.8 calculated in simulations were equal to 5.4 and 5.9, respectively. The average and standard deviation of Θ_0.8 calculated analytically were equal to 5.5 and 4.5, respectively. In the figure, we also show the regions of high and low gradient determined using k-means clustering algorithm (Kaufman and Rousseeuw 1990). Note the close similarity of figures 2(b)–(d).

In figure 3, we show how quantiles Θ_PTH of DTA distribution increase with increasing threshold probability P_TH . Note that there is a very good correspondence between quantiles calculated from simulations and those based on the analytical model. The quantiles were calculated for an exemplary plan shown in figure 2. Similarly good agreement between analytical predictions and numerical results was observed for other analyzed plans. In figure 4, we show the values of the Pearson's coefficient of correlation between quantiles Θ_PTH of DTA distribution calculated from simulations and from the analytical model. In the figure, we also show the values of the Pearson's coefficient of correlation between σ_DTA and quantiles Θ_PTH of DTA distribution calculated from the analytical model as well as the values of the Pearson's coefficient of correlation between σ_DTA and quantiles Θ_PTH of DTA distribution calculated from simulations. The coefficients of correlation were calculated separately for each of 22 plans and then averaged over the set of plans. Thus, the error bars represent standard deviation of the coefficient of correlation over the set of different plans. Note that the coefficient of correlation between quantiles of DTA distribution calculated analytically and numerically (in simulations) is very high in the entire range of P_TH . Note also that quantiles of DTA distribution corresponding to P_TH greater than or equal to 0.8 correlate very well with σ_DTA , that is, these quantiles could be surrogate measures of the scatter of DTA values and thus they can be used as a factor normalizing DTA in equation (2).

**Figure 4.** Values of the Pearson's coefficient of correlation between quantiles Θ*_PTH* of DTA distribution calculated from simulations and based on the analytical model (a), values of the Pearson's coefficient of correlation between ${\sigma _{DTA}}$ and quantiles Θ*_PTH* of DTA distribution calculated from the analytical model (b), and values of the Pearson's coefficient of correlation between ${\sigma _{DTA}}$ and quantiles Θ*_PTH* of DTA distribution calculated from simulations (c). Error bars represent standard deviation calculated based on the selected noise model.
Download figure:
Standard image High-resolution image

**Figure 4.** Values of the Pearson's coefficient of correlation between quantiles Θ*_PTH* of DTA distribution calculated from simulations and based on the analytical model (a), values of the Pearson's coefficient of correlation between ${\sigma _{DTA}}$ and quantiles Θ*_PTH* of DTA distribution calculated from the analytical model (b), and values of the Pearson's coefficient of correlation between ${\sigma _{DTA}}$ and quantiles Θ*_PTH* of DTA distribution calculated from simulations (c). Error bars represent standard deviation calculated based on the selected noise model.
Download figure:
Standard image High-resolution image

It has been argued in the previous section that normalizing DTA with a position-dependent value of σ_DTA could be advantageous as it can possibly lead to a distribution of normalized DTA values, which is universal over different plans and different points within the same plan. In figure 5, we show PDFs of DTA normalized with different factors: a constant value of 6 pixels (the normalization of a distance component with a fixed value is current clinical practice), σ_DTA , Θ_PTH of DTA distribution calculated from analytical model and from simulations for P_TH = 0.8. In the figure, we show PDFs of DTA of all 22 analyzed plans as well as point-wise PDFs of DTA, calculated for randomly selected 20 points of the plan shown in figure 1. In the latter case, the point-wise PDFs were generated based on 10 000 simulated dose deliveries.

**Figure 5.** PDFs of normalized DTA values calculated for 22 plans (a)–(d) and 20 randomly selected points of a sample plan shown in figure 1. Following normalizing factors Δ*_DTA* were used: 6 pixels (a) and (e), ${\sigma _{DTA}}$ (b), Θ_0.8 of DTA distribution calculated from the analytical model (c) and (f) and from simulations (d). Orange dotted curves are PDFs of normalized DTA values calculated for a sample plan. Points of each other color and shape correspond to a specific plan.
Download figure:
Standard image High-resolution image

According to the definition of gamma index, a dose is delivered erroneously in a point if the gamma index value in that point exceeds one. We show in figure 6 how the passing rate GP—the fraction of pixels with DTA less than Δ_DTA —depends on the selection of a factor normalizing DTA in equation (2). Note that because we consider only random noise with magnitude characteristic to real therapeutic devices, all cases when normalized DTA exceeds one can be classified as false positives. In figure 6(a), we plot the value of GP as a function of a global Δ_DTA changing in the range from 3 to 12 pixels. In figure 6(b), we show the case of Δ_DTA equal to k(P_TH )σ_DTA where the factor k(P_TH ) is the quantile corresponding to P_TH , calculated from standard normal distribution. In figures 6(c) and (d), we plot the value of GP as a function of P_TH with the corresponding value of Δ_DTA equal to Θ_PTH . Clearly, GP must grow with growing Δ_DTA . In the case of normalization based on σ_DTA and quantiles of the normal distribution, the gamma passing rate is lower than that obtained for normalization using the analytical model or simulations in the entire tested range of P_TH .

In figure 6, we show GP for the total plan area (circles) and separately for high (triangles) and low gradient (squares) regions. For global normalization with a fixed factor there is a large discrepancy between GP values determined for the low and high gradient region. This discrepancy changes nonliearly as Δ_DTA grows. The passing rate for high gradient regions remains high and false positives are reported mainly in low gradient regions. For normalization based on Δ_DTA and Θ_PTH the passing rate grows linearly with P_TH . For normalization based on Θ_PTH calculated numerically GP equals P_TH and is the same in high and low gradient regions. For normalization based on Θ_PTH calculated analytically GP overestimates P_TH by a few percent and is slightly higher in high and low gradient regions. A similar pattern is observed for normalization based on k(P_TH )σ_DTA .

The main objective of the next experiment was to check if the analytical model assuming uncorrelated noise can be used in the case of spatial autocorrelation. In figure 7, we show the PDFs of DTA calculated for spatially autocorrelated data, but normalized with Θ_PTH calculated analytically for uncorrelated data. In figure 7, we also show passing rates calculated for spatially correlated data based on normalization derived from an analytical model with uncorrelated noise. For comparison, we repeat in figure 7(b) the plot from figure 5(c) and in figure 7(d) the plot from figure 6(c). Clearly, using the analytical model of spatially uncorrelated noise for spatially correlated data makes PDFs of normalized DTA values overlap nearly exactly, but there is almost no change in the dependence of GP on P_TH . The same pattern is observed if quantiles Θ_PTH or σ_DTA calculated numerically for uncorrelated data are used as normalization for spatially correlated data.

**Figure 7.** PDFs of DTA calculated for spatially autocorrelated (a) and uncorrelated (b) simulated data, normalized with quantiles Θ_0.8 of DTA distribution calculated from the analytical model with uncorrelated noise. Points of each color and shape correspond to a specific plan. Passing rates calculated for spatially autocorrelated (c) and uncorrelated (d) simulated data as a function of a probability P_TH defining DTA tolerance based on the analytical model with uncorrelated noise. Circles: total GP, triangles: GP for high dose gradient regions, squares: GP for low dose gradient regions. Error bars represent standard deviation calculated based on the selected noise model.
Download figure:
Standard image High-resolution image

The results of the last experiment are illustrated in figure 8. We show in it how GP degrades while increasing the mean shift of the systematic error from 2 to 10 pixels. Error bars represent standard deviation calculated based on all 22 considered plans and 100 random shifts per plan. In figure 8, we compare the cases of Δ_DTA set globally to 6 pixels and Δ_DTA set locally to Θ_0.9 of DTA distribution calculated from the analytical model with uncorrelated noise. We show GP for the total plan area (circles) and separately for high (triangles) and low gradient (squares) regions. Passing rates in high gradient regions are much higher than average for the global method in contrast to a local normalization of DTA based on Θ_0.9.

4. Discussion

Let us discuss how the proposed approach can be used for quality control in radiotherapy. Assume that the DTA calculated for a pixel P is equal to some d_a (P). The quality control for dose delivery involves in particular a decision whether d_a (P) is small enough and dose delivery in P can be accepted or in contrast, is too large and the quality of dose delivery in P is not sufficient.

The procedure of evaluation of d_a (P) can proceed as follows. Set a threshold probability P_TH , which is the largest tolerated probability of an error of the second kind (that is an error of claiming that d_a (P) fits the assumed statistical model while there are some important, possibly non-random unaccounted factors influencing d_a (P)). P_TH should be set to a small value as any false negative decision (that is, falsely claiming that a device operates correctly) can potentially lead to accepting a malfunctioning device that can pose a hazard to a treated patient. On the plot of Pr(P,d) versus d (figure 1), P_TH corresponds to some critical value of d, which we may choose as the reference tolerance Δ_DTA (P) for the tested pixel P. Thus, if d_a (P) is smaller than Δ_DTA (P), then we know that the probability of an error of the second kind for P is smaller than the accepted threshold P_TH (figure 1).

Consider an example shown in figure 9. In this figure, we show PDFs of DTA values normalized with Θ_0.9 of DTA distributions calculated from analytical models for all 22 collected plans. The data shown in figure 9(a) were generated by adding noise following the model (Tulik et al 2019) and also (figure 9(b)) by global shifting of the resulting spatial distribution by a random vector with expected value equal to 6 pixels and standard deviation equal to 2 pixels. Both the distributions without and with shifts are fairly universal and can be approximated with an exponential distribution P(DTA/Θ_0.9):

$\begin{equation}P\left( {DTA/{{{\Theta }}_{0.9}}} \right) = {\rm{\lambda exp}}\left( { - {\rm{\lambda }}\frac{{DTA}}{{{{{\Theta }}_{0.9}}}}} \right).\end{equation} \tag{ 12 }$

The rate parameter λ is approximately equal to 0.55 for the data shown in figure 9(a) and to 0.23 for the data shown in figure 9(b). Now, assume that we would like to detect the cases with average systematic shift of 6 pixels with probability of at least 0.9, that is, we accept at most 0.1 probability of an error of a second kind. From the distribution shown in figure 9(b) it then follows that the threshold for DTA/Θ_0.9 is equal to 0.37, that is, if at any pixel the ratio DTA/Θ_0.9 is larger than 0.37, then the pixel does not pass quality control procedure. Unfortunately, with that high power of error detection the probability of an error of the first kind (false positives) would be as high as 0.82. These results demonstrate that the detection of errors based on DTA or more generally on gamma index may not be very effective. This finding is in agreement with the results of our previous study (Tulik et al 2019), where it was shown that under some circumstances gamma index may behave as a random classifier of dose delivery errors.

In this article, we have shown that with proper normalization of DTA, universal distributions of normalized DTA values can be obtained. This universality holds over sets of plans and sets of points within a plan. This in turn enables statistically sound inference about either the presence or absence of dose delivery errors. In particular, if the noise model is known and some DTA value was found then it is possible to estimate both the power of the testing for the presence of a delivery error and the probability of an error of the first kind.

That kind of inference is not possible for a current standard approach as the distributions of raw DTA values (or DTA values scaled by some constant) are not universal. An alternative in the case of raw DTA would be to generate the distributions in simulations. However, such an approach can be quite expensive.

In contrast, the normalizing factors related to quantiles or standard deviation of DTA distributions can be derived from an analytical model. This model requires specifying for every point P in the plan a minimal and maximal dose in some neighborhood of size d of point P, where d is varied in some range. Finding these minimal and maximal values is not expensive—the sequence of minimal and maximal values for growing d can be found by iteratively applying erosion and dilation filters to the original plan. With each iteration minimal and maximal value within a neighborhood of increasing size of every point P is found. We have shown that the predictions of the analytical model quite precisely reproduce the results of simulations.

The advantage of the proposed approach is also its interpretability. In particular, if Θ_PTH is selected as the factor normalizing DTA, then the passing rate for data without systematic error is simply equal to P_TH . Importantly, the discrepancies between passing rates estimated for high and low dose gradient regions are low compared to current standard approach (normalizing DTA with some constant). We have also shown that quantiles calculated for uncorrelated noise can be used for data with spatially correlated noise—the PDFs of DTA values normalized using this method remain universal.

It is instructive to compare how the current standard approach (normalization of DTA with a constant factor) and the proposed approach (normalizing DTA with either some quantile or standard deviation of DTA distribution) estimate passing rate in the case of errors present in the dose delivery data. We have selected the normalizing factors for the data shown in figure 8 in such a way that an average passing rate over the total plan is almost the same for the current standard and the proposed approach. However, the current standard approach estimates the passing rate in high dose gradient regions overly optimistically—the passing rate for these regions is substantially higher than average. In contrast, the passing rate for high dose gradient regions calculated based on the proposed approach is lower than average. In fact, in the case of errors consisting of shifts of dose delivery patterns, any quality assurance procedure should be especially sensitive to shift errors in high dose gradient regions for the reasons listed in the introduction. The proposed approach has this property while the current standard approach has not, including the composite test and gamma test (Miften et al 2018). While the gamma test is less sensitive to shift errors in low dose gradient regions, this effect, to our knowledge, has not been described in a statistically rigorous way.

Furthermore, we plan to integrate our approach with 3D time-dependent dose reconstruction methods (Nelms et al 2012) to allow more detailed quality assurance, including separate analysis of OAR and PTV regions. The dose gradient factor (Li et al 2011) could also be used as an alternative quantification of the dose gradient.

Finally, we considered a quality testing based only on a ratio of DTA to some normalizing factor, which is a limiting case of gamma index. We may expect that the conclusions following from our research also hold for non-zero DD tolerance. Separating the effects of the DD term from the effect of the distance term for a generic case of non-zero DD and distance tolerances would be however hardly possible.

Acknowledgments

This study was supported by the POIR.04.04.00-00-15E5/18 Project of the Foundation for Polish Science. The POIR.04.04.00-00-15E5/18 project is carried out within the "TEAM-NET" programme of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund.

Statistical approach to the selection of the tolerances for distance to agreement improves the quality control of the dose delivery in radiotherapy

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Materials and methods

2.1. Probability-based approaches to DTA action levels

2.2. Noise model

2.3. Simulations

3. Results

4. Discussion

Acknowledgments

Statistical approach to the selection of the tolerances for distance to agreement improves the quality control of the dose delivery in radiotherapy

Article metrics

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Materials and methods

2.1. Probability-based approaches to DTA action levels

2.2. Noise model

2.3. Simulations

3. Results

4. Discussion

Acknowledgments