Feasibility study of individualized optimal positioning selection for left‐sided whole breast radiotherapy: DIBH or prone

Abstract The deep inspiration breath hold (DIBH) and prone (P) position are two common heart‐sparing techniques for external‐beam radiation treatment of left‐sided breast cancer patients. Clinicians select the position that is deemed to be better for tissue sparing based on their experience. This approach, however, is not always optimum and consistent. In response to this, we develop a quantitative tool that predicts the optimal positioning for the sake of organs at risk (OAR) sparing. Sixteen left‐sided breast cancer patients were considered in the study, each received CT scans in the supine free breathing, supine DIBH, and prone positions. Treatment plans were generated for all positions. A patient was classified as DIBH or P using two different criteria: if that position yielded (1) lower heart dose, or (2) lower weighted OAR dose. Ten anatomical features were extracted from each patient's data, followed by the principal component analysis. Sequential forward feature selection was implemented to identify features that give the best classification performance. Nine statistical models were then applied to predict the optimal positioning and were evaluated using stratified k‐fold cross‐validation, predictive accuracy and receiver operating characteristic (AUROC). For heart toxicity‐based classification, the support vector machine with radial basis function kernel yielded the highest accuracy (0.88) and AUROC (0.80). For OAR overall toxicities‐based classification, the quadratic discriminant analysis achieved the highest accuracy (0.90) and AUROC (0.84). For heart toxicity‐based classification, Breast volume and the distance between Heart and Breast were the most frequently selected features. For OAR overall toxicities‐based classification, Heart volume, Breast volume and the distance between ipsilateral lung and breast were frequently selected. Given the patient data considered in this study, the proposed statistical model is feasible to provide predictions for DIBH and prone position selection as well as indicate important clinical features that affect the position selection.


| IN TR ODUCTION
Breast cancer is the most common malignant disease in women in the United States, second to the lung cancer as the leading cause of cancer death. 1 While the whole breast irradiation (WBI) has demonstrated a significant overall survival benefit and low recurrence rate, 2,3 studies have shown the increased risk of cardiac and lung disease associated with the WBI. 4 The deep inspiration breath hold (DIBH) is one common heart sparing irradiation technique for left-sided breast patients. Since the heart can be displaced away from the left breast during deep inspiration in most patients, one approach to reducing incidental cardiac irradiation is to treat patients during this portion of the respiratory cycle; i.e., using DIBH. Shown in Figure 1a of an image fusion, the distance between the chest wall and heart of the patient increased from 0.36 cm to 1.30 cm from supine free breathing (FB) to DIBH.
On the other hand, the ipsilateral lung involvement might be increased due to the deep breath hold. Prone (P) position is another heart sparing technique. While the prone position can dramatically reduce the lung dose, reduction in the heart exposure is controversial. 5,6 The image fusion (Figure 1b) indicated that the heart was situated at further distances from the chest wall in the supine position (heart-to-chest distance equals to 1.71 cm), whereas it lays more adjacent to the chest wall in the prone position (heart-to-chest distance decreased to 0.56 cm).
Currently, for patients suitable for both techniques, clinicians select one technique that might result in better organs at risk (OAR) sparing. This decision is mainly based on experience, and might not always yield the lowest dose. Our study aims to provide predictions and quantitative guidelines for this clinical decision. Nine statistical learning algorithms are investigated. To evaluate the performance, the prediction results obtained by the models were compared to the ground truth results that have been selected for these trial patients by physicists based on treatment planning.

2.A | Proposed procedures
(a) Image fusion of the CT scans in the Free Supine versus the DIBH position. The supine position is in pink, and the DIBH position is in blue. For the supine position, the heart-to-chest distance was 0.36 cm, and this distance increased to 1.30 cm when the patient was positioned in DIBH. (b) Image fusion of the CT scans in the Free Supine versus the prone position. The supine position is in pink, and the prone position is in blue. For the supine position, the heart-to-chest distance was 1.71 cm, and this distance decreased to 0.56 cm when the patient was positioned in prone. The supine scan was rotated 180 degrees to align with the prone scan. When a new patient FB CT comes in, the same ten features would be extracted and employed as the input of the pre-trained model to predict which class the patient should belong to. We say that the predicted class is the optimum position for that patient.

Acquire
Each of these steps will be described in detail in the following sections. Standard tangent fields with compensator design 8 were used for WBI to improve dose homogeneity. Typical WBI prescription (200 cGy 9 25)was used. The normalization point was placed at lung-chest wall interface anterior of the rib. For each patient in each position, the plan that best covered the whole breast PTV (optimized not to exceed 110%) and minimized the OAR doses (the volume of the heart receiving 25 Gy dose ≤5%, and the volume of ipsilateral lung receiving 20 Gy dose ≤20%) was selected as the optimal treatment plan. The dose distributions were reviewed in three dimensions. Isodose distributions and dose volume histograms were used to analyze whole breast PTV coverage, dose homogeneity, and doses to OAR. To evaluate the doses to OAR, mean doses, V 25 heart, V 20 ipsilateral lung, and V 5 CLB were analyzed. Treatment plans were generated for all the three positions of the patient data according to our clinical guidelines. By comparing three treatment plans of each patient, the position (Supine-free or DIBH or prone) that introduces least OAR doses was selected as the patient label. In this study, we investigated heart toxicity-based criteria and weighted OAR toxicities-based criteria. OAR includes heart, ipsilateral lung and CLB, and the weighted toxicity was defined as 0.6 9 V 25 heart + 0.3 9 V 20 lung + 0.1 9 V 5 CLB. Different weights were assigned to the OARs to reflect the relative significance of OAR during the left-sided breast treatment: the heart is given the highest weight, the ipsilateral lung is the second, and then the CLB.

2.C | Features extraction and data preprocessing
To train the classifier and predict the optimal position, we extracted the anatomical features from the CT scan. Since each patient would have an FB scan, the feature extraction is done from the FB scans.
The following 10 clinically relevant features are extracted and used as the input for the statistical models, and the mean and standard deviation of each feature value are reported (See Table 1).
2.C.1 | Volumes of the breast, heart, and ipsilateral lung Breast volume has long been used as an important indicator in selecting the optimal positioning for whole breast treatment. [9][10][11] Heart and ipsilateral lung volume were also selectedthe larger the heart and ipsilateral lung volume are, the more likely they would be irradiated.
2.C.2 | The distance between heart and breast, and the distance between ipsilateral lung and breast The distance between OAR (heart and ipsilateral lung) and breast is defined as the distance between the mass centers of OAR and the PTV Breast. All distances were automatically extracted from all patients using CERR. 12 These two distance features are important because both DIBH and prone positioning can cause a demonstrable OAR shift, which in some cases, would compromise optimal OAR sparing.

2.C.3 | In-field heart and ipsilateral lung volumes
These two features are self-explanatory by the names. Figure 3 is an example.

2.C.4 | Laterality of the heart
As shown in Figure 4, the laterality of the heart is defined as the distance between the center of the heart and the center of the chest along the right-to-left direction. The further away the heart is from the center of the chest, the more likely it will be in the tangent fields.
T A B L E 1 Mean and standard deviation value of each feature derived from the patient supine free breathing CT scans.

Features (from FB scans)
Mean AE SD 2.C.5 | The ratio of heart volume to ipsilateral lung volume Inspired by Zhao et al., 13 this feature was chosen to address the concern that when both heart and lung volumes are large, the heart volume alone might not be an effective feature, so we need to normalize the heart volume to ipsilateral lung volume.

2.C.6 | Breath-hold motion
As shown in Figure 5, when the patient took a breath hold, the motion of the anterior chest was 2.14 cm. Usually, this feature is correlated with how much the heart is being moved away from the chest wall.

2.D | Statistical learning algorithms
The following nine statistical learning algorithms were used to develop the predictive models: nearest neighbors, support vector machine (SVM) with linear and radial basis function (RBF) kernel, Decision Tree, Random Forest, AdaBoost, Naive Bayes, linear and quadratic discriminant analysis (QDA).

2.D.1 | Nearest neighbors classification
The principle behind nearest neighbor methods is to find a prede-

2.D.2 | Support vector machine
SVM searches for the linear hyper-plane that can separate binary classes optimally. The optimized hyper-plane is the one that produces the maximal margin between two classes. Given training vectors in two classes x i 2 ℜ p , where i = 1,2,. . .,n, and a vector y 2 {1, À1} n , SVM solves the following problem 14 : Subject to The SVM model can be applied to both linearly and nonlinearly separable data. For nonlinearly separable data, the SVM first maps the data with a kernel function and then searches for a linear optimally separating hyper-plane in the new space. Prediction is made according to which side of the hyper-plane the subject lies on. In this study, the SVM was implemented with a linear and RBF kernel.

2.D.3 | Decision tree
Decision Trees predicts the value of a target variable by learning simple decision rules inferred from the data features. Input data are split An illustration of the lung and heart volumes in the treatment field. The green contour is the amount of the heart in the field, and magenta is the amount of the ipsilateral lung in the field.
Illustration of the laterality (L in the figure) of the heart to the chest wall.

cm
An illustration of the breath hold motion between free breathing and DIBH position of a patient. The pink body contour is FB, and the green is DIBH.
into two or more subgroups according to the best split in input variables. The splitting continues until stop conditions are met. For training data, given training vectors x i 2 ℜ p , where i = 1, 2, . . ., l and a class vector y 2 ℜ l , a decision tree is built using recursive partitioning algorithm such that the samples with the same labels are grouped. 14 For each candidate split h = (j, t m ) consisting of feature j and threshold t m , the data Q at the node is split into Q l (h) and Q r (h), where The impurity at the node can be evaluated by using an impurity function H. One of the typical choices is called Cross-Entropy, where H is defined as m refers to the current node, and p mk are fractions that represent the percentage of each class shown in the child node that results from a split in the tree.

2.D.4 | Random forest
In random forests, multiple trees are built to classify an object based on features. A sample of training set taken at random but with replacement is used to build a tree. When growing the tree, the best split is chosen among a random subset of the input features. As a result of this randomness, the model selects the classification/regression results that get the most votes from trees in the forest, and thus help reduce the variance of the final model.

2.D.5 | AdaBoost
An AdaBoost classifier is an ensemble technique that fits a classifier on the training data and then creates a second model which attempts to correct the weights of incorrectly classified instances.
The core principle of AdaBoost is to utilize multiple weak classifiers on repeatedly modified versions of the data so that a strong classifier can finally be generated.

2.D.6 | Naive Bayes
Given a class variable y and a dependent feature vector x 1 through x n , Bayes' theorem states the following relationship 14 : The major difference of different naive Bayes classifiers is the assumptions they make regarding the distribution of P(x i |y). The Naive Bayes classifier used in this study is Gaussian Naive Bayes, where the likelihood of the features is assumed to be Gaussian:

2.D.7 | Discriminant analysis
The linear/quadratic decision boundary of the classifier is generated by fitting class conditional densities to the data using Bayes' rule.
Assuming all classes share the same covariance matrix, the model fits a Gaussian density to each class. Feature selection is a process of automatically removing unnecessary features and selecting a subset of features to be used in the predictive modeling. In this paper, we applied sequential forward feature selection (SFFS) algorithm, which employs greedy search to reduce the original n features to a subset of m features where m < n. 15 Given the whole n-dimensional features as input

2.E | Dimension reduction and feature selection
And the output feature is defined as Y m , where SFFS firstly initializes Y m with an empty subset so that Y 0 = {φ}.
Then it adds an additional feature y + which can maximize the criterion function to the feature subset, where This procedure is repeated until the termination criterion is satisfied. In SFFS, the terminal criterion is set as m = p, where p is the number of desired features that we specified a priori. In this study, we set p = n so that the SFFS will go through all the features and select the feature combination that can generate the best performance. The best feature combination was discovered by iterating forwardly from the first feature to the last, determining which feature combination achieved the best performance during 5-fold cross-validation.

2.F | Model comparison and evaluation
In this study, k-fold stratified cross-validation was used to test the model performance as well as picking up the optimal hyper-parameters. For small training data size, stratified k-fold cross-validation is a widely accepted technique to evaluate the generalization capability of a model. The whole dataset is partitioned into k smaller subsets, where each subset contains approximately the same percentage of samples of each target class. Every time, the model is trained with the k À 1 folds, while the remaining single fold is used to validate the model. This procedure repeats k times and the results are combined to generate an estimation of the model performance.
In our experiments, we used k = 5 and each experiment was repeated for ten iterations using different random seeds. Prediction accuracy and receiver operating characteristic (AUROC) were used       Table 3.
Upon feature selection, the model that yields the best predictive accuracy for heart toxicity-based classification is QDA, where the accuracy is 0.93. By counting the occurrences of each feature, we can observe that for heart toxicity-based classification, Breast volume was accounted in the best feature combination of every statistical model. The succeeding feature that frequently appeared in the best feature combination was the distance between Heart and Breast. These two features, breast volume and distance between Heart and Breast, were suggested as important indicators for heart toxicity-based optimal treatment position selection by our study.  Table 4.
Upon feature selection, the model that yields the best predictive accuracy for OAR overall toxicities-based classification is Naive Bayes and QDA, where the accuracy is 0.93. By counting the occurrences of each feature, we can observe that for OAR overall toxicities-based classification, the three most frequently selected features are: the volume of heart (5 times), the volume of breast (4 times) and the distance between lung and breast (4 times). Thus, the three selected features above were suggested as important indicators for OAR overall toxicity-based optimal treatment position selection.

| DISCUSSION
Several studies using statistical learning models in the prediction of optimal positioning in breast cancer treatment have been published. [16][17][18] Compared to these studies which have taken the supine FB and prone free breathing positions into consideration, our study is the first feasibility study that predicts optimal positioning between DIBH and Prone positions and indicates important features for the sake of OAR sparing. DIBH is a position that can efficiently reduce the cardiac dose for breast radiation therapy, [19][20][21] and many centers have introduced DIBH to the clinic recently. Our study is timely, as it provides some quantitative clinical guidance to select between DIBH and Prone positions.
The best feature combinations that yield the highest predictive accuracy of statistical models for heart toxicity-based classification.
Features that are consistently selected by all the models are bold. We have applied different dose criteria, heart toxicity, and weighted OAR toxicities, to determine the patient positioning label.
As shown in Figure 6, if heart toxicity was the only factor influencing the decision, more patients are found to be classified as DIBH-treated rather than prone-treated. This is consistent with many previous clinical studies, showing that DIBH is beneficial to heart dose reduction during left-sided breast treatment. However, if the weighted OAR toxicities (dose to heart, ipsilateral lung, and CLB) are the decision factor, the classification result is the opposite. We believe the reason for this is that the dose to the ipsilateral lung is significantly lower in the prone position compared to the DIBH. 5,22 By using our model, clinicians can also assign their self-defined weighting factors to OAR which in turn can address their specific clinical interest or need. In our current study, the largest weighting factor was assigned to the heart, followed by the ipsilateral lung, Specifically, the availability of strong features is always the key to constructing better predictive models. For ongoing work, we are applying for clinical trials to produce more experimental data and improving the predictive models by utilizing powerful feature extraction techniques, such as Convolutional Neural Networks and atlasbased organ segmentation.

CONFLI CT OF INTEREST
The authors have no relevant conflicts of interest to disclose.
T A B L E 4 The best feature combinations that yield the highest predictive accuracy of statistical models for OAR overall toxicities-based classification. Features that are consistently selected by all the models are bold.