Mathematical optimization of high dose-rate brachytherapy— derivation of a linear penalty model from a dose-volume model

High dose-rate brachytherapy is a method for cancer treatment where the radiation source is placed within the body, inside or close to a tumour. For dose planning, mathematical optimization techniques are being used in practice and the most common approach is to use a linear model which penalizes deviations from specified dose limits for the tumour and for nearby organs. This linear penalty model is easy to solve, but its weakness lies in the poor correlation of its objective value and the dose-volume objectives that are used clinically to evaluate dose distributions. Furthermore, the model contains parameters that have no clear clinical interpretation. Another approach for dose planning is to solve mixed-integer optimization models with explicit dose-volume constraints which include parameters that directly correspond to dose-volume objectives, and which are therefore tangible. The two mentioned models take the overall goals for dose planning into account in fundamentally different ways. We show that there is, however, a mathematical relationship between them by deriving a linear penalty model from a dose-volume model. This relationship has not been established before and improves the understanding of the linear penalty model. In particular, the parameters of the linear penalty model can be interpreted as dual variables in the dose-volume model.


B Morén et al
volumes are discretized into dose points where the dose is calculated; see, for example, Lahanas et al (2000) for information on how this can be done.
A dose plan can be created by either forward planning or inverse planning. In forward planning, the dose plan is constructed manually (using graphical tools available in treatment planning software). The resulting dose distribution is evaluated according to clinical criteria for the volumes of interest, and repeatedly adjusted until it fulfills or comes close enough to the evaluation criteria. Inverse planning instead starts with the desired criteria and then a plan is computed that satisfies these criteria as well as possible. For inverse planning, mathematical optimization is commonly used.
The evaluation of dose distributions is based on dosimetric indices, derived from so-called dose-volume histograms. For the tumour, the portion of its volume that receives at least a specified prescription dose is of interest, while for OAR it is rather the portion that receives at most a specified dose. In the guidelines for HDR BT for prostate cancer, one aim of a good dose distribution is that at least 95% of the dose points in the tumour receive at least the prescribed dose (Yamada et al 2012, Hoskin et al 2013. Several mathematical optimization models have been proposed for dose planning in HDR BT. This paper considers two such models which have been studied extensively and are introduced below. Other models can be found in, for example, Lahanas et al (1999), Giantsoudi et al (2013), and Holm et al (2013a), and an overview in De Boeck et al (2014).
An optimization model commonly used in clinical practice is based on linear penalties; this model is referred to as the linear penalty model (LPM). It was developed by Lessard and Pouliot (2001) and then further studied in Alterovitz et al (2006). Various versions of the LPM are available in clinical treatment planning software. Based on prescribed dose levels for the target and OAR, the model penalizes deviations from these levels at each dose point. The model is easy to solve but the objective function value has been observed to correlate weakly with dosimetric indices (Gorissen et al 2013, Holm et al 2013b. Another drawback with the LPM is that it produces solutions with long dwell times compared to manual planning. A possible remedy for this is to use a piecewise LPM where dose deviations are penalized increasingly farther away from the prescribed dose levels (Holm et al 2012). This model has been shown to produce solutions with shorter maximum dwell times (Holm et al 2012). Another way to ensure that dwell times are more evenly distributed was suggested by Baltas et al by introducing restrictions on dwell times (Baltas et al 2009). They show the results of shorter dwell times, in total and with lower variance. Three ways to model dwell time restrictions are described in Balvert et al (2015).
An alternative to the LPM is a model which imposes explicit constraints on dosimetric indices for OAR, while the objective is to maximize the dosimetric index for the target volume; it is referred to as the dose-volume model (DVM). This model has an advantage over the LPM since it explicitly includes the primary criteria from clinical guidelines which are based on dosimetric indices. The first to use dose-volume constraints in a model for HDR BT for prostate cancer was Beliën et al (2009). By using a binary indicator variable for each dose point, a mixed-integer model was obtained. Research indicates that this model cannot be solved to optimality for practical examples within a reasonable time limit, and, in order to find good solutions, heuristics are often used, as in Beliën et al (2009), Siauw et al (2011 and Deist and Gorissen (2016).
The LPM and the DVM appear to be fundamentally different, but this paper aims at establishing a mathematical relationship between them. Knowledge of this relationship improves the understanding of the LPM and its weaknesses, and it questions whether the LPM is well-posed when the aim is for a model which gives good values of dosimetric indices.
The outline for the remainder of the paper is as follows. In section 2, we state mathematically the LPM and the DVM for HDR BT. Section 3 contains the main result of this paper, which is a derivation of the LPM from the DVM. Finally, section 4 gives conclusions with suggestions for future research.

Mathematical formulations
In this section we present two mathematical models for dose planning-LPM and DVM.

Notation
First we introduce the notation that is used. Indices: s used for OAR i used for dose points (in target volume and OAR) j used for dwell positions

Models
We consider the following LPM, which is adopted from Alterovitz et al (2006), but augmented with an upper dose bound for dose points in OAR: The objective function combines the penalty for doses below the lower bound for dose points in the target volume and above the upper bound for dose points in OAR. Constraints (1a) and (1e) ensure that the penalty variables of the target volume take the correct values. Constraints (1b) and (1c) are similar but for OAR. We consider the following DVM, which is adopted from Siauw et al (2011) and Deist and Gorissen (2016), but without the dwell time modulation restriction in the latter reference: Here, the aim is to maximize the dosimetric index that describes the portion of the target volume receiving the prescribed dose. This is achieved by the objective function together with constraint (2a), which ensures that the indicator variables y i take the desired values. Constraints (2b) and (2c) impose a bound on a dosimetric index, i.e. the portion of dose points in the OAR that receives at most the prescribed dose. Constraint (2b) ensures that indicator variables v s i have the desired values while constraint (2c) is the bound on the dosimetric index, applied to the portion of dose points. Here | · | denotes cardinality.
By replacing constraints (2e) and (2f) with 0 y i 1, i ∈ T , and 0 v s i 1, i ∈ OAR s , s ∈ S, the linear programming relaxation of the DVM, DVM LP , is obtained: Note that for all three presented models, there is a feasible solution, since the choice t j = 0, j ∈ J, is always feasible. The objective value in LPM is bounded from below (by zero) and the objective value in DVM and DVM LP is bounded from above (by |T|). In particular, for DVM LP it follows that the linear programming dual has a bounded optimal solution (see Murty (2010), section 5.4.4). Consider figures 1 and 2 for a comparison of the objective function terms in the three presented models. Figure 1 compares the objective function contribution from dose points in the target volume for the LPM and the DVM. The solid line comes from the LPM-variables w i , i ∈ T , with the penalty defined as pw i = max(0, pL − p j∈J d ij t j ). The binary indicator variable in the DVM is plotted as a dashed line and the linear programming relaxation of this indicator variable in DVM LP is plotted as a dotted line. In DVM LP , constraint (3a) will be satisfied with equality in every optimal solution and thus y i = min(1, (1/L) j∈J d ij t j ). In the two latter models, we want to maximize the objective function contrib- Note that the figures are only meant to show the principal behaviour of the functions. The figures indicate that the three models are related and this observation leads to the result presented in section 3, which is a mathematical derivation of the LPM from the DVM.

Preliminaries
We here give a basic result which connects linear programming (LP) duality with Lagrangian duality. It is used in the proof of the main result in this paper. For an introduction to linear programming, see, for example, Murty (2010). With general notation, problem (4) is the primal LP problem. Here, A and D denote matrices, whereas b, c and d are vectors of compatible sizes, and x is the vector of variables: Problem (5) is the LP dual of problem (4) with μ as a vector of dual variables for the first set of constraints and ν as a vector of dual variables for the second set of constraints. We assume that both problems (4) and (5) are feasible: The third problem is (6), which is a Lagrangian relaxation of problem (4) where the first set of constraints is relaxed with Lagrange multipliers µ 0: The Lagrangian dual problem is then: The following result is widely known and included here to simplify the presentation of the main result.

Proof.
The solution x * is feasible in problem (6) since the problem is a relaxation of problem (4). For the given µ * we can see that (x * , ν * ) is a primal-dual optimal solution to problem (6) by verifying that the criteria for optimality are satisfied, which they are since by assumption the optimality criteria for problem (4) are satisfied. For any µ 0, h(µ) c T x * + µ T (b − Ax * ) c T x * , since x * is feasible in (4) and thus Ax * b. Further, since x * is optimal in problem (6) for µ = µ * , due to complementarity, it holds that holds for all µ 0, which means that µ * is optimal in (7). Finally, since x * is optimal in (4) and µ * is optimal in (7), h * = h(µ * ) = z P . □ We can also draw the same conclusions as in proposition 1 if we start with a model that is similar to problem (4), for example, with constraints of opposite signs.
Remark 1. For µ = µ * , problem (6) typically has more optimal solutions than x * , some of them being infeasible in problem (4) while some are feasible but non-optimal (see Dirickx andJennergren 1979, p 14, or Fisher 1981 which examines the more frequently studied case with integer variables). This phenomenon is known as non-coordinability.
A result that is related to proposition 1 is given in Everett (1963) and will be used later in the proof of theorem 2. In that result, objective function terms are turned into constraints with specific right-hand sides obtained from an optimal solution to the original problem, and it is shown that this solution remains optimal in the constrained problem. This result is also related to the multiobjective ε-constraint method (Haimes et al 1971, Chankong andHaimes 1983) in which objective terms are turned into constraints.

Main result
The first result is a mathematical derivation of the LPM from the DVM LP . We show that for every instance of DVM LP , there is an instance of LPM with specific parameter values such that any optimal solution to DVM LP is also optimal in LPM. Further, both models have the same optimal objective value. (The reader can skip the proofs without loss of continuity of the presentation.) Theorem 1. For an instance of DVM LP , let µ s * = −π s * 0, s ∈ S, where π s * , s ∈ S, are optimal dual variables associated with constraints (3c). Consider an instance of LPM with parameter values p = 1/L and q s = µ s * /(M s − U s ), s ∈ S. Then, (i) optimal solutions to DVM LP are also optimal in LPM, and (ii) the optimal objective value in LPM equals that of DVM LP .
Proof. We obtain model (8) below as a Lagrange relaxation of DVM LP : Let z * 1 be the optimal objective value of DVM LP ; as stated earlier there is an optimal solution with a bounded objective value. Thus, with reference to proposition 1, it follows that any optimal solution to DVM LP is also optimal in (8) and that z * 1 = z * 2 . By setting Ly i = L − w i and (M s − U s )(1 − v s i ) = x s i and switching from maximization to minimization, we obtain formulation (9) below. This can be seen since setting Finally we can transform the maximization problem to a minimization problem by changing the sign of the objective function and also the sign of the complete term: Model (9) is an instance of LPM (disregarding the constant term) with penalty parameters p = 1/L and q s = µ s * /(M s − U s ). Models (8) and (9) are equivalent, thus the optimal dwell times in DVM LP are also optimal in model (9). Furthermore we have that z * 3 = z * 2 = z * 1 . □ Remark 2. As was noted in remark 1, there are optimal solutions to model (9) that are infeasible, or feasible but non-optimal in DVM LP . Thus, the converse of the first statement (i) is not true.
By making a Lagrangian relaxation of DVM LP , of the form (8), with non-optimal dual variables µ, we obtain a model which is structurally the same as (9) (and LPM). However, in this case, the conclusions from theorem 1 can no longer be drawn and there is no precise mathematical relationship between the models.
The derivation can also be done the other way around, starting with the LPM and deriving the DVM LP . We show that for every instance of LPM, there is an instance of DVM LP with specific parameter values, such that any optimal solution to DVM LP is also optimal in LPM and both problems have the same optimal objective values.

then (i) optimal solutions to DVM LP are also optimal in LPM, and (ii) the optimal objective value in LPM equals that of DVM LP .
Proof. Let z * 4 be the optimal objective value of LPM and w * i be optimal values corresponding to optimal values x s * i ; as stated earlier, we know that such a solution exists and that z * 4 is bounded from below. We then have holds for all feasible values w i and x s i . Therefore p i w * Consider model (10) below: As mentioned in remark 1, the following reasoning originates from the result of Everett (1963). We obtain model (10) by removing the term q s i∈T x s i from the objective function and adding the constraints i∈T (x s i − x s * i ) 0 instead, as well as adding a constant term to the objective function. Note that the values w * i and x s * i are feasible in model (10) and objective values z * 4 and z * 5 are equal. This is so, because if feasible values w i and x s i would exist such that z * 5 < z * 4 , both p i∈T w i < p i∈T w * i and Phys. Med. Biol. 63 (2018) 065011 (11pp) Remark 4. The same type of derivation can be done from a version of the LPM that penalizes deviations outside a specified dose interval for all dose points. This derived model becomes a version of the DVM model where the objective function consists of two dosimetric indices, one which is to maximize the portion of the tumour with a higher dose than the prescribed dose, and one which is to minimize the portion with a higher dose than a specified value (for example, 1.5 times the prescribed dose). Similarly, for OAR, we obtain a constraint that states that no more than a specified portion of the dose points should receive a lower dose than a specified value.

Piecewise LPM
Consider the two models below. Model (12) will be referred to as the piecewise LPM (PLPM), and is an extension of the LPM model. (PLPM is similar to the model formulated in Holm et al (2012), but only penalizes the dose being too low for target volume dose points, and the dose being too high for dose points in OAR.) Model (13) will be referred to as the multi DVM (MDVM LP ), and is an LP-relaxation of an extension of the DVM, where target volume and OAR have several dosimetric indices which are targeted or constrained. In this section, an index k is added to the parameters introduced in section 2.1. In model (13), the parameter r k is used to weigh the dosimetric indices for the target volume together: Figure 3 is a comparison between all terms that contribute to the objective value for a single dose point in the target volume. The solid line is from model (12) and is defined as k∈K p k w k i = k∈K max(0, p k L k − p k j∈J d ij t j ), and the dotted line is from model (13) and is defined as k∈K r k y k i = k∈K r k min(1, (1/L k ) j∈J d ij t j ), with r k = p k , k ∈ K. Theorem 3. For an instance of MDVM LP , let µ sk * = −π sk * 0, s ∈ S, k ∈ K, where π sk * , s ∈ S, k ∈ K, are optimal dual variables associated with constraints i∈OARs v sk i τ sk | OAR s |, s ∈ S, k ∈ K. For an instance of PLPM with parameter values p k = r k /L k , k ∈ K and q sk = µ sk * /(M sk − U sk ), s ∈ S, k ∈ K, it holds that (i) optimal solutions to MDVM LP are also optimal in PLPM, and (ii) the optimal objective value in PLPM equals that of MDVM LP .
For an instance of PLPM, let the x sk * i be optimal values. For an instance of MDVM LP with parameter values , s ∈ S, k ∈ K and r k = p k /L k , k ∈ K, it holds that (i) optimal solutions to MDVM LP are also optimal in PLPM, and (ii) the optimal objective value in PLPM equals that of MDVM LP .
The proof is omitted since the mathematical derivation between the two models can be made using the same arguments as in the proofs of theorems 1 and 2. Each slope of the penalty function in PLPM corresponds to a dosimetric index in MDVM LP . For the target volume we obtain several dosimetric indices that are weighted together in the objective function while for OAR we obtain a separate constraint on each dosimetric index. Conversely, each dosimetric index in MDVM LP gives a slope for the penalty function in PLPM.

Conclusions and future research
Both the LPM and DVM have been been studied extensively, with the former model also being widely used in clinical practice. Despite their apparant differences, there is a mathematical relationship between the models. We have presented a mathematical derivation of the LPM from a linear programming relaxation of the DVM; this gives a firm relationship between the LPM and DVM and improves understanding. It is not clear how this relationship can be useful in producing better treatment plans but we present interpretations of the result and suggestions for future research.
The DVM with constraints on dosimetric indices for OAR is one way to deal with the multiobjective nature of dose planning, and it is the most studied formulation of models which include dosimetric indices. Another approach is the weighted-sum DVM, in which all the dosimetric indices are weighted together. The weightedsum model and the LPM have the same optimal solutions if parameter values are properly chosen. To see this, a change of variables is enough, as in the proofs of theorems 1 and 2. As the starting point in this paper, we chose the DVM with constraints on dosimetric indices for OAR, since this is the most studied dose-volume formulation.
The LPM contains parameters with no direct clinical interpretation. The results in this paper offer a theoretical understanding of the nature of these parameters since they can be interpreted as dual variables to the DVM. A suggestion for future work is to exploit this interpretation of the parameters in the LPM as dual variables in the DVM to automatically calibrate them.
The optimization models studied in this paper include the essential constraints for dose planning. These models can be augmented with constraints that restrict dwell times, as in Balvert et al (2015). Since these constraints only include dwell times, the mathematical derivations in this paper would still be valid, using the same arguments.
The linear programming relaxation of the DVM is currently not used by itself to generate dose plans. However, the DVM is easier to work with since the parameters are more transparent and easier to interpret. Thus, an alternative to solve the LPM is to instead use the LP-relaxation of the DVM. This would require further study to see the effects on the results.
The LPM can be seen as a two-step relaxation of the DVM, where the first step is the linear programming relaxation. Mixed-integer programs with indicator constraints, such as the DVM, are often hard to solve, and linear programming relaxations of such programs are generally weak (Bonami et al 2015). The second step, in which we obtain the LPM, is a Lagrangian relaxation. This relaxation introduces a lack of coordinability of the solutions, compared to solutions of the linear programming relaxation of the DVM. This means that we obtain more optimal solutions, some of which are infeasible in the latter model and some of which are feasible but nonoptimal. The interpretation of this two-step relaxation questions the well-posedness of the LPM and its role as the most commonly used optimization model in clinical practice. This is also in line with the studies in Holm et al (2012) and Holm et al (2013b).
In future research, it would be interesting to try to explain the poor correlation between the objective value of the LPM and dosimetric indices in more detail by separately studying the impact of the linear programming relaxation and parameter settings to find out how large the contribution from each of them is. Another topic for future research is to use the concept of a Lagrangian relaxation with the purpose of constructing linear penalty instances that yield solutions that better correlate with the DVM. One possible way to do this takes near-optimality and near-complementarity into account, as in Larsson and Patriksson (2006).