Distributed approximation of Pareto surfaces in multicriteria radiation therapy treatment planning

Rasmus Bokrantz

doi:10.1088/0031-9155/58/11/3501

1. Introduction

A constant challenge in treatment planning for intensity-modulated radiation therapy (IMRT) is to ensure delivery of a curative dose to the tumor while simultaneously limiting normal tissue toxicity. Such tradeoffs in standard inverse planning are resolved by manual tuning of importance weights that are assigned to structure-specific planning objectives. The search for suitable weights often requires many reoptimizations, which makes the process time-consuming. The difficulty of knowing when to terminate the search for better plans, and the lack of overview of the possible plans, also implies that there is little guarantee that the best alternative among all possibilities is found.

Multicriteria optimization (MCO) is an alternate strategy that seeks to enable more informed and time-efficient decision making (Küfer et al 2003, Craft et al 2005, 2007, Hoffmann et al 2006). Its principal idea is to generate a set of candidate plans and let the treatment planner or radiation oncologist select the most appropriate combination thereof. The candidate plans are generated from the set of Pareto optimal plans, i.e., feasible plans such that no objective can be improved without a sacrifice in one of the others. Combinations between the candidate plans are evaluated by continuous interpolation in dose (Monz et al 2008). Such dose averaging is feasible in real-time and thereby provides immediate feedback on how the objectives are correlated and how a change of their priorities influences the dose distribution. Empirical studies show that MCO leads to more time-effective planning and different (and better) clinical decisions compared to standard methods (Craft et al 2012, Hong et al 2008, Thieke et al 2007).

An obstacle for MCO is that the plan generation step is computationally costly. The time expense of computing a modest number (∼50) of fluence-based IMRT plans can exceed several hours for complex patient cases¹. Time-consuming calculations can compromise clinical throughput and, if the number of plans that are practical to compute becomes very limited, also negatively influence plan quality. A natural remedy for these difficulties, which is pursued in the present study, is to exploit that plan generation is amenable to parallel processing.

The difficulty posed by distributed computation is that determination of parameters that produce a well-distributed set of plans requires explicit information about the shape of the Pareto optimal set; information that is not available until a representation of this set has been generated. In this paper, methods are developed that manage this 'catch 22' through a balance between the degree of parallelization and the quality of the approximation. In particular, an algorithm is proposed that generates a total of m plans in steps of at most k plans at the time, for integers k between 1 and m. All currently available information about the shape of the Pareto optimal set is utilized in the selection of parameters for each batch of k plans.

The algorithm that we propose is restricted to convex problems because a navigated plan is not otherwise guaranteed to be feasible and near-optimal. Treatment modalities that permit convex formulations include fluence-based models of static and rotational IMRT, and actively scanned proton therapy. Convex plan quality criteria have been surveyed by Romeijn et al (2004). The primary benefit of convexity for the method proposed in the present paper is that the Pareto optimal solutions define a convex surface in the objective function space (Romeijn et al 2004). This property permits information about the shape of the Pareto surface to be extracted from inner and outer approximations. As a further restriction, we limit ourselves to optimization by minimization of a weighted sum of objective functions. Extensions to other formulations are elaborated in the discussion.

Numerous methods exist that approximate Pareto surfaces by calculating points one at a time at the location where the distance between inner and outer approximations is maximal. In particular, Solanki et al (1993) presents such a sandwich method based on weighted sum minimization. This method is further developed in Craft et al (2006) and Rennen et al (2011). In a previous paper (Bokrantz and Forsgren 2012), we reformulated the method of Rennen et al into a method with reduced computational cost and retained quality of output. Sandwich methods using optimization models where the initial objectives are posed as constraint functions have been suggested by Klamroth et al (2002), Hoffmann et al (2006), Shao and Ehrgott (2008), and Ehrgott et al (2011). Sandwich approximation is also utilized in the study by Serna et al (2009).

The algorithm developed in this paper specializes to our previous algorithm (Bokrantz and Forsgren 2012) when k = 1. When k = m, it instead specializes to a method that closely resembles weighted sum minimization over a uniform set of weights. The proposed method thus bridges a previous fully sequential (k = 1) algorithm and a previous fully parallelizable (k = m) algorithm a provides novel semi-parallelizable method for the intermediate case when 1 < k < m.

2. Methods

2.1. Optimization model

Radiation therapy optimization is predominantly posed as the minimization of a weighted sum of objective functions according to

$\begin{equation} \begin{array}{ll}{\displaystyle \mathop {{\rm minimize}}_{x}} & \displaystyle { \sum _{i=1}^n w_i f_i(x)} \\ {\rm subject\ to}& x \in \mathcal {X}, \\ \end{array} \end{equation} \tag{ 2.1 }$

where f_i, i = 1, ..., n are objectives to be minimized and w_i, i = 1, ..., n, associated nonnegative weights. The feasible set $\mathcal {X}$ represent the physical limitations of the delivery method and possible non-negotiable requirements on the planned dose distribution. The target problem for the methods developed in this paper is a generalization of problem (2.1) to a multicriteria problem of the form

$\begin{equation} \begin{array}{ll}{\displaystyle \mathop {{\rm minimize}}_{x}} & f(x) = [ f_1(x) \cdots f_n(x) ]^T \\ {\rm subject\ to}& x \in \mathcal {X}, \end{array} \end{equation} \tag{ 2.2 }$

which does not include any a priori articulation of preference among the objectives. The solution set to problem (2.2) is the typically infinite set of Pareto optimal solutions. Formally, a feasible x* is Pareto optimal if there exists no feasible x such that f_i(x) ⩽ f_i(x*) for i = 1, ..., n, with strict inequality for at least one index i. The set $\mathcal {X}$ is throughout assumed to be nonempty and convex and all functions f_i, i = 1, ..., n, assumed to be convex in x and bounded on $\mathcal {X}$ . These assumptions guarantee that the image of the Pareto optimal solutions in objective space (i.e., mapped through f) forms a convex surface, which is referred to as the Pareto surface and denoted by $\mathcal {P}$ .

2.2. Problem statement

In a previous paper (Bokrantz and Forsgren 2012), we introduced an algorithm that approximates $\mathcal {P}$ by repeatedly solving the scalar-valued problem (2.1) with respect to different weights. The optimal solution x* obtained from each solve defines a point f(x*) within $\mathcal {P}$ and also a halfspace {z: w^Tz ⩾ w^Tf(x*)} that supports $\mathcal {P}$ at this point. The information of this form collected over the course of the algorithm's execution defines inner and outer approximation of $\mathcal {P}$ , as illustrated in figure 1(a). The black dots indicate known points, the supporting solid lines at these points define the boundaries of the halfspaces that constitute the outer approximation, and the solid lines that connect the points define the boundary of the inner approximation. The inner and outer approximations of $\mathcal {P}$ are used to steer the generation of new points to parts of $\mathcal {P}$ where the current representation is inaccurate. At each iteration, the weights w in (2.1) are set to values such that a point is generated where the distance between the inner and outer approximations is maximal. This choice of weights is a greedy strategy for minimizing the approximation error of the current representation.

**Figure 1.** Sandwich approximation of a Pareto surface (white circles) by piecewise linear inner and outer approximations (solid lines). The illustrated data corresponds to the tradeoff between sparing of the rectum or bladder for a prostate cancer case. (a) The double arrow indicates the maximal distance between the inner and outer approximations. (b) The dashed curve indicates a conservative model of the Pareto surface.
Download figure:
Standard image High-resolution image

The purpose of the present paper is to generalize this method for Pareto surface approximation to an algorithm that can take advantage of multiple computational nodes. We therefore consider selection of k weighting vectors at the time for arbitrary integers k between 1 and m, where m is total number of points that are to be computed. The k weighting vectors are subsequently used for solves of problem (2.1) which are processed in parallel. The goal is to select weights that produce a representation of $\mathcal {P}$ with minimal approximation error.

2.3. General idea of the proposed method

Our strategy to the stated problem is to alternate between inexpensive calculations of points from a model of $\mathcal {P}$ and more costly calculations of points from the exact set. The inexpensive calculations are performed serially while the costly ones are processed in parallel. At a given instant, a surrogate model of $\mathcal {P}$ is first determined from the currently known inner and outer approximations. The sequential algorithm introduced in section 2.2 is then executed for k iterations. At each iteration, the model predicts the outcome of a solve of problem (2.1), and the inner and outer approximations are then updated according to this prediction. The model of $\mathcal {P}$ itself is not changed. At termination, the k most recently used weights are reused in an equal number of solves of the exact version of problem (2.1), which are performed in parallel. The data in the inner and outer approximations that was generated with respect to the model of $\mathcal {P}$ is discarded and replaced with the corresponding data obtained with respect to the exact problem. The process is repeated until a total of m points have been generated.

A key aspect is that the surrogate model should be conservative, with conservativeness understood in the sense that the model represents a surface that is maximally difficult to approximate. This property protects against an opportunistic behavior where parts of $\mathcal {P}$ are incorrectly disregarded on basis of having an accurate representation with respect to the model of $\mathcal {P}$ . The model should simultaneously be compatible with the currently known inner and outer approximations.

The parametric form of the surrogate model is elaborated in section 2.4. The associated mathematical details are given in the appendix. The appendix also outlines how to normalize the objectives to avoid bias towards those of large magnitude, how to calculate a bound on the current approximation error, and how to account for numerical inaccuracy. A summary in pseudocode of the proposed algorithm is also included.

2.4. Construction of a conservative model

We propose to add artificial curvature to the inner approximation of $\mathcal {P}$ as a heuristic method to calculate a conservative model. The bounding faces of the inner approximation are thereby transformed into smooth patches, as illustrated in figure 1(b). We use patches with constant curvature because such surfaces, e.g., a spheroid surface, are difficult to approximate by sandwich methods. A surface with constant curvature requires a uniform spread of points to be accurately approximated, as opposed to a kinked surface, which is accurately approximated by points located to the high-curvature region, or a flat surface, which is accurately approximated by points at the edges. Uniformly spaced points are in turn generated by uniformly spaced weights (if the surface has constant curvature), which makes the sandwich algorithm select uniformly distributed weights in regions where it lacks information about the shape of $\mathcal {P}$ . This mechanism permits scaling between a sequential sandwich algorithm (k = 1) and weighted sum minimization over uniformly distributed weights (k = m).

The smooth patches that constitute the surrogate model are defined by quadratic forms according to

$\begin{equation} \big\lbrace z : \case{1}{2} z^T H z + c^T z = r \big\rbrace , \end{equation} \tag{ 2.3 }$

where the n × n matrix H is required to be symmetric and positive semidefinite. These requirements ensure that H has real and nonnegative eigenvalues, so that (2.3) defines an ellipsoid. The parameters (H, c, r) in (2.3) are selected such that the gradient of the ellipsoid has minimal error with respect to the known gradients of $\mathcal {P}$ . The gradients of the ellipsoid are also subject to constraints that ensure that the ellipsoid does not protrude outside the outer approximation. The mathematical details of calculation of the unknown parameters in (2.3) are elaborated in section A.4 in the appendix. The inexpensive counterpart of (2.1) defined by the surrogate model amounts to the quadratically constrained linear program

$\begin{equation} \begin{array}{ll}{\displaystyle \mathop {{\rm minimize}}_{z}} & \displaystyle \sum _{i=1}^n w_i z_i \\ {\rm subject\ to}&\dsty \case{1}{2} z^T H z + c^T z \le r, \end{array} \end{equation} \tag{ 2.4 }$

which is a convex program due to the positive semidefiniteness of H.

3. Computational study

3.1. Problem formulation

Algorithm A.1 was evaluated by application to fluence-based planning for IMRT. All objectives f₁, ..., f_n were posed as one-sided penalties of the form

$\begin{equation} \int _{\mathcal {V}} ( d_v(x) - \hat{d})_{\pm }^2dv, \end{equation} \tag{ 3.1 }$

where x is the fluence distributions vector, $\mathcal {V}$ a subset of the patient volume, d_v the dose to a point v in $\mathcal {V}$ , and $\hat{d}$ the prescription level for targets and zero for healthy structures. Functions defined with respect to the positive part operator ( · )₊ are called max dose functions and functions defined with respect to the negative part operator ( · )₋ called min dose functions. Both these two forms of functions are convex in x because the mapping x↦d is linear. The feasible region $\mathcal {X}$ of problem (2.2) was defined by requiring x ⩾ 0, by introducing lower bounds on the dose to target voxels, and by introducing an upper bound on the global dose. All constraints with respect to dose were aggregated into nonlinear functions on the form of (3.1).

3.2. Patient cases

A prostate, a brain, and a pancreatic cancer case were considered. Representative transversal slices are illustrated in figure 2. The planning target volume (PTV) of the prostate case was composed of the prostate and the seminal vesicles, and had a prescribed total dose of 72.2 Gy, with a boost to 78 Gy to the prostate. Considered critical structures were the bladder and rectum. The PTV of the brain case was localized to the right lobe and had a prescribed total dose of 59.4 Gy. Considered critical structures were the brainstem, left and right optical structures (lens, orbit, and optic nerve), and optic chiasm. The PTV of the pancreas case contained a localized tumor to the pancreatic head and had a prescribed total dose of 50.4 Gy. Considered critical structures were the kidneys, liver, and stomach. The prostate case was planned with respect to five coplanar fields and the brain and pancreas case with respect to seven coplanar fields. All fields were planned for delivery with a Varian 2100 linear accelerator (Varian Medical Systems, Palo Alto, CA). Fluence planes were discretized into 5 × 5 mm² bixels and patient geometries discretized into 3 × 3 × 3 mm³ voxels. A seven-objective formulation composed of a min dose and a max dose function to each target and a max dose function to all healthy structures was used for all patient cases. Simplified five-objective formulations were also considered. Constraints were introduced to each formulation that required at least $90 \mathrm{\%}$ of the prescription to be delivered to all targets and at most $110 \mathrm{\%}$ of the prescription to be delivered to any structure ( $112 \mathrm{\%}$ for the pancreas case). A constraint on the maximum dose to the spinal cord at 35 Gy was also enforced for the pancreas case.

3.3. Software

The presented method was implemented in Matlab using interfaces to the following external solvers: IMRT optimization was performed in RayStation 2.9 (RaySearch Laboratories, Stockholm, Sweden), with dose computed by a singular value decomposition of pencil beam kernels. Semidefinite programs were solved using SeDuMi 1.21 (McMaster University, Hamilton, Canada). Linear programs, linearly constrained quadratic programs, and quadratically constrained linear programs were solved using CPLEX 10.2 (ILOG, Sunnyvale, CA). Convex hulls and Delaunay triangulations were computed using Qhull (University of Minnesota, Minneapolis, MN).

4. Results

The performance of algorithm A.1 was evaluated in terms of the approximation error of the Pareto surface representations it generates as a function of the number of objectives n and the degree of parallelization k. The parameter n was varied in {5, 7} and k varied in {1, n, 2n, 5n, m}. The parameter m was kept at 10n, so that a total of 11n plans were generated if also counting the n plans obtained from the normalization of the objectives. Three forms of approximation errors were quantified:

The error between the current representation and the outer approximation.
The error between the current representation and the set of points generated over all studies values of k.
The error betweem ith calculated point and the previous i − 1 points.

The first two quantities define upper and lower bounds, respectively, on the approximation error between the current representation of $\mathcal {P}$ and the exact set $\mathcal {P}$ . The calculation of the first quantity is outlined in section A.2 in the appendix. The calculation of the second quantity is performed analogously, but with the set of Pareto points generated over all studies values of k substituted for the vertices of the outer approximation. The third quantity is defined as the maximum componentwise distance between the ith calculated point and the convex hull of the previous i − 1 points, and is calculated as the optimal value of problem (A.2) with v set to the ith calculated point and P set to a matrix with rows given by the previous i − 1 points. This rate of improvement has previously been used as a termination criterion for sandwich approximations by Craft and Bortfeld (2008). The results in terms of approximation errors are complemented by graphs that translate an error in percent to differences between dose-volume histogram (DVH) curves. Average computational times for the execution of algorithm A.1 were about 1min for n = 5 and 8min for n = 7. These times do not include the times for treatment plan optimizations which were in the order of 1–5min per plan.

4.1. Initial characterization

An initial characterization was performed with respect to a three-objective problem from Rennen et al (2011, test case 1). The Pareto surface of this problem is composed of multiple regions of relatively low curvature that form sharp edges as they intersect. Figure 3 illustrates the results for k in {1, 6, 30}. These results show that the fully sequential method (k = 1) produces uniformly scattered Pareto points by selecting nonuniformly distributed weights. These characteristics are largely retained for the semi-parallelizable method (k = 6). The fully parallelizable method (k = 30) selects a uniform set of weights which produces a highly nonuniform set of Pareto points.

**Figure 3.** Results for algorithm A.1 applied to test case 1 as a function of number of plans k generated in parallel. (a)–(c) Pareto surface representation. (e)–(f) Selected objective weights.
Download figure:
Standard image High-resolution image

4.2. Application to clinical cases

Results with respect to three clinical cases are summarized in figure 4, which show upper and lower bounds on the approximation error as a function of n and k. A consistent pattern in the depicted results is that the approximation quality obtained for k = 1 is largely retained for k up to 2n, viewed after 10n plans have been generated. For n = 5, k in {1, n, 2n}, and all three patient cases, the lower bounds are contained in the interval 2.5– $4.3 \mathrm{\%}$ and the upper bounds contained in the interval 4.0– $6.9 \mathrm{\%}$ at termination. The corresponding results for n = 7 are qualitatively similar, with the lower bounds contained in the interval 4.7– $6.9 \mathrm{\%}$ and the upper bounds contained in the interval 7.2– $10.1 \mathrm{\%}$ . The approximation quality then deteriorates as k is further increased.

A further notable property is that the quality of the approximation improves rapidly when new information about the shape of $\mathcal {P}$ is obtained. This behavior is most clearly indicated by the rapid decrease in approximation error for k = 5n beyond iteration 6n. The same behavior is visible in figure 5, which depicts the progressive distance to the current representation of $\mathcal {P}$ as new points are generated. Qualitatively, the rates of improvement (the derivative of the depicted curves), are low during generation of the first k plans. This rate then markedly increases during the early iterations after new information has been obtained. The rate of improvement finally plateaus out at a moderate level. For k = 1 and n = 5, this rate drops below $10 \mathrm{\%}$ after (13, 9, 17) iterations for the prostate, brain, and pancreas case, respectively, and below $5 \mathrm{\%}$ after (27, 20, 38) iterations. The corresponding values for n = 7 are (21, 25, 29) at the $10 \mathrm{\%}$ level and (77, 73, 70) at the $5 \mathrm{\%}$ level. The results for increased values of k are, with some exceptions, similar but shifted with the number of iterations spent in the low-rate region. A further conclusion from figure 5 is that the approximation quality is continuously improved at least locally, even though this improvement may not affect the global approximation error depicted in figure 4.

**Figure 5.** Cumulative improvement in approximation error per new plan as a function of number of plans k generated in parallel.
Download figure:
Standard image High-resolution image

The presented results should be contrasted to figure 6 which depicts DVH curve differences as a function of approximation error. Clearly, a fraction of the depicted DVHs does not correspond to clinically relevant treatment plans. This result is an effect of that the only dose limiting constraint during optimization was an upper bound on the global dose, except for a constraint on the maximum dose to the spinal cord for the pancreas case. The discrete solutions in the calculated representation therefore often gives excessive dose to some of the healthy structures. The clinically relevant plans are instead found as combinations of these discrete solutions.

5. Discussion

Multicriteria IMRT planning by continuous Pareto surface navigation has previously been demonstrated to reduce manual planning time and improve plan quality. Inverse planning of this form is, however, associated with lengthy precalculation of the plans that form the basis for the navigation. The goal of this paper was to reduce these times using an algorithm that can take advantage of distributed computational environments.

Our numerical results show that approximations can be generated which are highly comparable in quality to those generated by the previous sequential method if the degree of parallelization does not exceed twice the number of objectives. This behavior can be attributed to the fact that the algorithm can maintain a high rate of improvement also when new information is not obtained at all iterations. The presented algorithm's performance then decays as the degree of parallelization is further increased and it approaches a behavior that is similar to weighted sum minimization over a uniform set of weights. A plausible mechanism for the observed deterioration in performance is that uniformity of the weights is very loosely correlated with the uniformity of the resulting set of Pareto points (Das and Dennis 1997).

The proposed method can be extended to formulations other than weighted sum minimization. A particularly interesting alternative is minimization of the maximum componentwise distance to a reference point. A sandwich method based on optimization of this form is suggested in Shao and Ehrgott (2008), and is further developed in Ehrgott et al (2011). Fully parallelizable algorithms based on uniform grids of reference points have also been proposed by Benson and Sayin (1997), Messac and Mattson (2004), and Shao (2008). The main advantage of such optimization is that a uniform grid of reference points produces a uniform grid of Pareto points. This advantage is, however, at the expense that some of the reference points in general are not projected onto the Pareto optimal set. This disadvantage is amplified for radiation therapy optimization where the Pareto surface typically occupies only a small fraction of the feasible objective space due to correlation between the objectives (Craft and Bortfeld 2008, Spalke et al 2009). A further practical disadvantage is that the normal vectors that define the outer approximation are obtained from Lagrange multiplier information and are therefore subject to numerical inaccuracy. A direct reformulation of problem (A.3) to account also for this error source is not possible because if the normal directions are also treated as variables, then the constraints become bilinear and therefore nonconvex.

6. Conclusions

An algorithm has been presented that unifies a previous fully sequential and a previous fully parallelizable method for approximation of Pareto surfaces into a method where the degree of parallelization is freely adjustable. Numerical results indicate that the presented method produces approximations that are of comparable quality to those produced by the previous sequential method for degree of parallelization up to about twice the number of objectives. For IMRT planning where the number of objectives is typically at least five, the presented method thus permits parallelization to an extent where a time-efficiency improvement by one order of magnitude is feasible at no major compromise in approximation accuracy.

Acknowledgments

The constructive comments from Anders Forsgren and Björn Hårdemark are highly appreciated.

Appendix.: Mathematical details

The following nonstandard notation is used in this appendix. The vector of all ones with dimension defined by its context is denoted by e. Vector inequalities are to be understood componentwise. The notation '⪰' is used to indicate positive semidefinitness.

A.1. Normalization

The range of f over $\mathcal {P}$ can be normalized to [0, 1]ⁿ by the substitution

where l_i and u_i is the minimum and the maximum of the ith objective over $\mathcal {P}$ , respectively, and where it is assumed that l_i < u_i holds for all i. A practical method to determine l, and to obtain an estimate of u, is to individually minimize each objective f_i over $\mathcal {X}$ , collect the resulting objective vectors into rows of an n × n matrix, and calculate the columnwise minima and maxima, respectively. Exact calculation of u requires convex maximization which is NP-hard in general.

A.2. Approximation error

The approximation error between a set of points from $\mathcal {P}$ and the entire set $\mathcal {P}$ is defined as the maximum componentwise error between a point in $\mathcal {P}$ and its best approximation by a convex combination of the known points. This quantity is not practical to compute itself, but can be bounded by the approximation error between the known points and the outer approximation of $\mathcal {P}$ . This upper bound is calculated as follows. Let the outer approximation be given by {z: Az ⩾ b} and the inner approximation be given by {P^Tλ: λ ⩾ 0, e^Tλ = 1}, where P is a matrix with rows given by the known points, A a matrix with rows given by the associated normal vectors to $\mathcal {P}$ at these points, and b the vector of pairwise scalar products between the rows of P and A. Then, the approximation error of the inner approximation with respect to the outer approximations is given by the maximum optimal value of the linear program

$\begin{equation} \begin{array}{ll}{\displaystyle \mathop {{\rm minimize}}_{\lambda ,\eta }} & \eta \\ {\rm subject\ to}& P^T \lambda - \eta e \le v, \\ & {\rm e}^T \lambda = 1, \\ & \lambda \ge 0, \end{array} \end{equation} \tag{ A.2 }$

taken over all vertices v of the outer approximation (Bokrantz and Forsgren 2012, proposition 4.1). The vertices of the outer approximation can be determined by calculating the convex hull of the dual polytope to this set. The optimal dual variables associated with the first constraint at the instance where the maximum is attained gives a nonnegative normal vector to the inner approximation at the maximum distance to the outer approximation (Bokrantz and Forsgren 2012, proposition 4.2). This normal defines the weights for the next iteration of the sandwich algorithm.

The number of vertices of the outer approximation increases exponentially with the dimension n in the worst case. The presented technique is therefore only practical for n up to about 10. For a more thorough treatment of distance calculations between polyhedral approximations of $\mathcal {P}$ and the computational complexity of this operation, see Bokrantz and Forsgren (2012).

A.3. Accounting for numerical inaccuracy

Problem (2.1) is not practical to solve to exact optimality for the problem sizes that occur in clinical practice. The generated Pareto points can therefore slightly violate the inner and outer approximations. This property is accounted for by explicitly filtering out noise, posed as a minimum-norm fit of the noise contaminated data to a convex surface according to

$\begin{equation} \begin{array}{llll}{\displaystyle \mathop {{\rm minimize}}_{\bar{p}_1,\ldots ,\bar{p}_s}} & \displaystyle \sum _{i=1}^{s} ||\bar{p}_i - p_i||_2 & & \\ {\rm subject\ to}& a_i^T \bar{p}_j \ge a_i^T \bar{p}_i, & i,j=1,\ldots ,s, & i \ne j, \end{array} \end{equation} \tag{ A.3 }$

where p₁, ..., p_s denote Pareto points subject to noise and a₁, ..., a_s are associated normal vectors to $\mathcal {P}$ at these points. The variables $\bar{p}_1,\ldots ,\bar{p}_s$ define the convexified estimates at their optimum. Similar data smoothing with respect to polyhedral norms has previously been studied by Siem et al (2006).

A.4. Calculation of model parameters

The parameters (H, c, r) in (2.3) are determined from an (n − 1)-dimensional bounding face of the inner approximations as follows. Let p₁, ..., p_n denote the face's extreme points and a₁, ..., a_n associated nonnegatively oriented normal vectors to $\mathcal {P}$ . Then, the unknown parameters are those that minimize the L1-norm error between the gradient −(Hp_i + c) of (2.3) at p₁, ..., p_n and the known gradients a₁, ..., a_n at these points. This minimization is performed subject to the constraint that the gradients must be contained in the nonnegative span of the gradients of the outer approximation, in order to ensure that the ellipsoid does not protrude outside the outer approximation. The ellipsoid is also required to interpolate the points p₁, ..., p_n. The parameter calculation is thus formulated

$\begin{equation} \begin{array}{lll}{\displaystyle \mathop {{\rm minimize}}_{H,c,r \atop \alpha _1,\ldots ,\alpha _n}} & \displaystyle \sum _{i=1}^n|| H p_i + c + a_i ||_{1} \\ {\rm subject\ to}& H = H^T \succeq 0, & \\ &\dsty \case{1}{2} p_i^T H p_i + c^T p_i = r, & i = 1,\ldots ,n, \\ & \displaystyle \sum _{j=1}^n \alpha _{i,j} a_j + H p_i + c = 0, & i = 1,\ldots ,n, \\ & \alpha _i \ge 0 & i=1,\ldots ,n, \end{array} \end{equation} \tag{ A.4 }$

where α₁, ..., α_n are n-vectors. The objective function and the constraints of this problem are all linear, or equivalent to linear, with respect to the optimization variables. The exception to this rule is the constraint H⪰0. This constraint is nonlinear and nonsmooth, but convex, and can therefore be handled efficiently by interior-point methods (Vanderberghe and Boyd 1996). Convex semidefinite programs can be solved very efficiently—the particular problem (A.4) in fractions of a second for reasonable n (up to ∼10).

It is important to note that an expression according to (2.3) must be determined only for a single ellipsoid per iteration. This ellipsoid defines a patch that protrudes from a bounding face with extreme points p₁, ..., p_n such that the selected weighting vector w can be expressed as a convex combination of the associated normal vectors a₁, ..., a_n to $\mathcal {P}$ at the extreme points. A practical technique to identify such a face is to cross-reference the weights w against the simplices of a Delaunay triangulation of the set of known normal vectors.

A.5. Algorithm summary

The suggested algorithm is summarized in pseudocode in algorithm A.1. The syntax 'parallel for' is used to indicate a for-loop that is processed in parallel. The normalization step described in section A.1 in the appendix constitutes a prerequisite to the algorithm, and produces the n × n matrix of objective function values P, the associated n × n matrix of normal vectors A at these points, and n-vector of pairwise scalar products b between the rows of P and A, which is used as the algorithm's input.

Distributed approximation of Pareto surfaces in multicriteria radiation therapy treatment planning

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction