The Association of Gross Tumor Volume and Its Radiomics Features with Brain Metastases Development in Patients with Radically Treated Stage III Non-Small Cell Lung Cancer

Simple Summary Around 30% of patients with stage III non-small cell lung cancer (NSCLC) receiving radical chemoradiotherapy will develop symptomatic brain metastases (BM) during the disease course. Identifying risk factors for BM can help improve the management of these patients. The aim of this current study was to identify clinical risk factors, including gross tumor volume (GTV) radiomics features, for developing BM in patients with radically treated stage III NSCLC. We found that age, NSCLC subtype, and GTV of lymph nodes (GTVn) were significant factors for BM. GTVn radiomics features provided higher predictive value than GTV of primary tumor (GTVp) and GTV. Therefore, GTVp and GTVn should be separated in clinical and research practice. Abstract Purpose: To identify clinical risk factors, including gross tumor volume (GTV) and radiomics features, for developing brain metastases (BM) in patients with radically treated stage III non-small cell lung cancer (NSCLC). Methods: Clinical data and planning CT scans for thoracic radiotherapy were retrieved from patients with radically treated stage III NSCLC. Radiomics features were extracted from the GTV, primary lung tumor (GTVp), and involved lymph nodes (GTVn), separately. Competing risk analysis was used to develop models (clinical, radiomics, and combined model). LASSO regression was performed to select radiomics features and train models. Area under the receiver operating characteristic curves (AUC-ROC) and calibration were performed to assess the models’ performance. Results: Three-hundred-ten patients were eligible and 52 (16.8%) developed BM. Three clinical variables (age, NSCLC subtype, and GTVn) and five radiomics features from each radiomics model were significantly associated with BM. Radiomic features measuring tumor heterogeneity were the most relevant. The AUCs and calibration curves of the models showed that the GTVn radiomics model had the best performance (AUC: 0.74; 95% CI: 0.71–0.86; sensitivity: 84%; specificity: 61%; positive predictive value [PPV]: 29%; negative predictive value [NPV]: 95%; accuracy: 65%). Conclusion: Age, NSCLC subtype, and GTVn were significant risk factors for BM. GTVn radiomics features provided higher predictive value than GTVp and GTV for BM development. GTVp and GTVn should be separated in clinical and research practice.


Introduction
Up to 30% of patients with stage III non-small cell lung cancer (NSCLC) receiving radical chemoradiotherapy will develop symptomatic brain metastases (BM) during the course of their disease [1]. Identifying risk factors for BM can help improve the management of these patients. Known risk factors are female sex, adenocarcinoma or non-squamous cell carcinoma histology, and advanced tumor stage [2][3][4]. Whether a higher gross tumor volume (GTV) is a risk factor for subsequent BM development remains unclear. It has been reported that GTV is a prognostic factor for locoregional control [5], progression-free survival (PFS) [6], and OS [6][7][8][9] in patients with NSCLC. Ji et al. found that GTV is not a significant risk factor for BM in patients with NSCLC (n = 335, p = 0.687) [10], whereas a larger GTV was associated with an increased risk of BM in patients with small-cell lung cancer (SCLC) (HR = 1.37, 95% CI 1.09-1.73, p = 0.007) [11].
Radiomics is increasingly being used in cancer risk prediction [12,13] and is, based on the seed-and-soil hypothesis, also of interest to evaluate in stage III NSCLC (primary tumor and/or involved lymph nodes) and in the BM prediction setting [14]. It is a quantitative imaging analysis technique that provides image-derived metrics capable of quantifying textures at a higher granularity, beyond the ability of the naked eye. Four studies showed that radiomic models based on pretreatment computed tomography (CT) might predict BM in NSCLC, but the models did not always add value to clinical models [15][16][17][18][19]. Important limitations of these studies were small sample sizes, inadequate baseline staging, only evaluating the primary tumor and not the involved lymph nodes, and the use of heterogeneous CT protocols. Thus, it is difficult to draw firm conclusions.
According to the backgrounds and rationales above, we conducted the current study to explore risk factors and develop prognostic models for BM in radically treated stage III NSCLC. We used a large dataset with adequately staged patients (baseline 18 F-labeled fluorodeoxyglucose positron emission tomography-CT scan [ 18 FDG-PET-CT] and brain magnetic resonance imaging [MRI]) and evaluated clinical data together with radiomics features of GTVs on the uniformly scanned planning contrast-enhanced chest CT for thoracic radiotherapy.

Patients and Methods
Patients with stage III NSCLC were retrospectively screened in five hospitals in the Netherlands and Italy (MUMC, Zuyderland, Venlo, Roermond, Udine) from 1 March 2012 to 31 July 2021. AJCC 7th edition was used for staging [20]. Eligibility criteria for this study included pathologically confirmed NSCLC, 18 FDG-PET-CT and brain MRI performed at baseline (before antitumor therapy), treatment with chemoradiotherapy with radical intent (concurrent or sequential chemoradiotherapy, CCRT/SCRT). Exclusion criteria were participation in interventional clinical trials (NVALT-11 trial [because of prophylactic cranial irradiation [PCI] administration in one arm] [1], PET-boost trial [because of radiotherapy dose-escalation] [21], and NICOLAS trial [because of nivolumab] [22], other malignancy within 5 years before NSCLC diagnosis; surgery for NSCLC before chemoradiotherapy, and a total irradiation dose (TD) < 54Gy. Proton therapy, all types of platinum doublet chemotherapy, and adjuvant durvalumab were allowed. This study was conducted in accordance with the Helsinki Declaration of the World Medical Association and approved by the institutional review boards (W 22 01 00010). Informed consent from individuals for the use of their medical data was waived because no additional interventions were performed.

Acquisition of Images
The original planning CTs were retrieved from the clinical workstation database. All the images were acquired with contrast enhancement and with a slice thickness of 3 mm with consistent acquisition parameters on the two scanner manufacturers (Philips or SIEMENS). The pixel spacing varied from a minimum of 0.976 mm to a maximum of 1.52 mm in the X and Y directions.

Delineation of Regions of Interest (ROI)/GTV
The ROIs were the original GTVs obtained from the planning CT scan. The GTVs were delineated by a team of specialists in lung cancer radiotherapy in each slice of the planning CT based on the most recent 18 FDG-PET information using the ARIA workstation (Varian, Palo Alto, CA, USA). GTV of the primary tumor (GTVp) and lymph nodes (GTVn) were delineated separately if anatomically distinguishable. When it was difficult to distinguish the primary tumor or lymph nodes, the tumor was contoured as either GTVp or GTVn (the choice was left to experienced radiation oncologist). GTV was calculated as the morphological union of GTVp and GTVn. The lung window setting (W = 200 HU and L = −1000 HU [Hounsfield Units]) was used to contour tumors surrounded by lung tissue and the mediastinum window setting (W = 220 HU and L = −180 HU) was applied for the contouring of lymph nodes and primary tumors invading the mediastinum or chest wall [23]. The contouring of each patient was confirmed by a senior radiation oncologist (experience >10 years).

GTV Radiomics Features Extraction
The pipeline for radiomic feature extraction consisted of the following steps: data conversion and pre-processing, radiomic extraction configuration and feature extraction. The data conversion was performed using an in-house Python script that converts the original DICOM and RTSTRUCT images into the .nrrd format that is mineable by pyradiomics. The Python packages simpleITK v2.1 and pyplastimatch v1.9.3 were used to convert the original DICOM CT images and the contours GTV, GTVp, GTVn into .nrrd images and corresponding binary masks. The radiomic extraction configuration included the following operations: re-sampling of the original images to the same pixel spacing of [1,1,1] using B-spline interpolation, removal of outliers from the binary masks above 3σ from the distribution of intensity values for each patient, application of wavelet filtering in all the 13 directions to generate wavelet features. The following feature categories were extracted from both original and wavelet-filtered images: first-order statistical features, and texture feature matrixes (GLCM, GLSZM, GLRLM, NGTDM). Morphological features were extracted only from the original images. The fixed-bin width approach (n = 25) was chosen for the quantization of statistical and texture features. The details of the source code are published on a public repository (https://github.com/Maastro-CDS-Imaging-Group/GTVNSCLC; accessed on 27 March 2023). The features were normalized to Z-score, as is common practice in statistical analyses.

Clinical and Treatment-Related Variables
In addition to GTV, other potential factors for BM were also recorded and investigated, including age, sex, smoking history, body mass index (BMI), performance status (PS), histology type, TNM stage at diagnosis; chemoradiotherapy type (concurrent or sequential chemoradiotherapy, CCRT/SCRT), total dose of radiotherapy, type of radiotherapy (once-daily radiotherapy, ODRT; twice-daily radiotherapy, TDRT/mix [TDRT + ODRT]), and immunotherapy.

Statistics
Missing data of the clinical variables were imputed by multiple imputation. Then, GTV, GTVn, and GTVp were divided into three categories by interquartile range (IQR) for risk analysis. The primary endpoint was BM confirmed by cranial imaging at any time regardless of presence of neurologic symptoms (e.g., headache or vomiting). The secondary endpoints were PFS (progression of disease at the first time in any sites confirmed by imaging or death) and OS. All endpoints were analyzed as time-to-event data from the pathological diagnosis to the respective events, which were subject to censoring at the last follow-up if no events were observed. The BM was analyzed using competing risk analysis, in which death without BM was treated as a competing event. The significant clinical risk factors (including volume of GTVs, which was excluded from the radiomic features analysis) for BM were identified using the multivariate Fine-Gray model with backward stepwise elimination [24,25].
Radiomic feature selection was performed using 1000 bootstrap resamples [26][27][28]. Within each of the 1000 bootstrap resamples, Spearman's correlation was performed to identify and eliminate the highly correlated features (|r| >0.9). Then, the least absolute shrinkage and selection operator (LASSO) embedded with the Fine-Gray model was used to select features (lambda = 0.01). The features were sorted according to how frequently they were retained by LASSO in 1000 bootstrap resamples. We arbitrarily selected the top 13 features as the input for the backward stepwise competing risk model on the same 1000 bootstrap resamples. Then, we arbitrarily selected the top signature (more than one radiomic feature) to build the radiomic models. The coefficients were fitted using the original sample. A maximum number of five predictors was considered for each model to reduce the risk of overfitting. To build the combined model, one feature with the highest effect estimate (according to subdistribution hazard ratio [sHR]) from each model (Clinic + GTV + GTVp + GTVn) was selected.
The performance of competing-risk models at the 24 months' time point was evaluated by area under the receiver operating characteristic curves (AUC-ROC) and calibration. The sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and accuracy of each model were also reported. The net-benefit decision-curve analysis (DCA) was performed to compare the models' utility/application value [26]. A nomogram would be developed for the clinical model. If the radiomics model or combined model performed better, a nomogram would also be developed for the best one [28] (Figure 2.5). The time point of 24 months was chosen because most of BM develop within 2 years [1]. The effect of significant BM risk factors (features) on PFS and OS were investigated by Cox proportional hazards regression models. All tests were 2-sided, and p < 0.05 were considered as statistically significant. Statistical analyses were performed using R version 4.2.2 (R Project for Statistical Computing). associations with BM using backward stepwise competing risk model; VI, select the top signatures to build the radiomic models using the original sample; VII, evaluate the performance of the models by AUC and calibration curve; VIII, evaluate the utility of the models by the net−benefit decision−curve analysis; IX, develop a nomogram for the best model. Abbreviations: AUC, area under the receiver operating characteristic curves; LASSO, least absolute shrinkage and selection operator; NSCLC, non-small cell lung cancer.
This type 2A study was conducted according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guideline [29]. The TRIPOD checklist was reported in Appendix A Table A1.
We extracted 861 radiomic features in total and identified five GTV features, five GTVn features, and five GTVp features that were associated with BM ( Table 2). The combined model of the first top feature from each model showed that the GTVn radiomics feature (LLH glrlm Run Variance: sHR 1.53, 95% CI 1.05-2.24, p = 0.028) and the GTVp radiomics feature (HLH glszm Grey Level Non Uniformity: sHR 1.52, 95% CI 1.29-1.79, p < 0.001) were significantly associated with an increased risk of BM development, while the clinical variable (volume of GTVn) and GTV radiomics feature were not (   Abbreviations: BM, brain metastases; CI, confidence interval; GTV, gross tumor volume; GTVp, gross tumor volumeprimary lung tumor; GTVn, gross tumor volume-metastatic lymph nodes; sHR, subdistribution hazard ratio.

Evaluation of the Models' Performance
The AUCs at each time point showed that the GTVn radiomics model performed best to discriminate the patients that did and did not develop BM (AUC range: 0.71-0.74) ( Figure 3A).

OS, PFS
The above factors and radiomics features were checked for their impact on PFS and OS. A larger GTVn was significantly associated with decreased OS (median IQR: HR  (Tables A2 and A3).

Discussion
Although it has been reported that the GTV volume is associated with OS and PFS, the association of the GTV volume and the subsequent risk of BM development is unclear in patients with radically treated stage III NSCLC. Our study indicated that a larger GTVn was a risk factor for BM, OS, and PFS in patients with stage III NSCLC, but GTV and GTVp were not significantly associated with BM. In contrast, Ji et al. reported that GTV was not a significant risk factor for BM (p = 0.687) [10]. A possible explanation could be that they analyzed the BM risk factors using Cox regression, which does not consider the competing event of death. Additionally, GTVp and GTVn were not specified. To the best of our knowledge, this is the first work reporting that GTVn is associated with subsequent BM development, while GTVp and GTV are not. The finding that lymph nodes involvement is more prognostic than the primary tumor volume warrants validation in further studies. A biological explanation could be that lung cancer cells are already more aggressive when migrating to lymph nodes and that the volume of GTVn correlates with the aggressiveness.
Interestingly, colleagues in Denmark investigated GTVp and GTVn for the first failure site in patients with locally advanced NSCLC. They found that neither GTVp nor GTVn was significantly correlated with first failure site (either locoregional failure or distant metastases) [30]. In this study, the main idea of separating GTVn from GTVp was similar to ours. However, patients with stage I-II or stage IV (20.5%) were also included, and they only explored the associations with the first failure site without specifying BM or metastases to other organs. We focused on BM, regardless of whether the brain was the first site of failure and we also evaluated the association with PFS and OS. This approach is more inclusive and therefore of greater clinical practice value, as patients can still develop BM after extracranial progression.
In line with other studies [2][3][4]31], we also found that higher age (p = 0.05) and squamous cell carcinoma (p = 0.009) were independent protective factors for developing BM, while smoking history, thoracic radiotherapy dose, and the use of adjuvant durvalumab were not significantly associated with the development of BM (p > 0.05). The latter was in contrast to the PACIFIC trial, in which the percentage of patients with BM halved in the durvalumab arm compared with placebo (31/476 [6.5%] vs. 28/237 [11.8%], p = 0.015) [32]. A possible explanation could be that in the PACIFIC study only fit patients without disease progression after CCRT were selected [33], while we included NSCLC patients who had stage III at the initial diagnosis, patients who progressed after CCRT were not excluded. In addition, in the PACIFIC trial, brain MRI was not required (brain CT was allowed) and PET-CT was not mandatary, while in this current study, only patients who underwent baseline PET-CT and brain MRI to fully stage and exclude occult BM were included.
As far as we are aware, this is the first study which found that the GTVn radiomics model performed better than the clinical, GTVp radiomics, GTV radiomics, and combined model, with the highest AUC (0.74), sensitivity (84%), and NPV (95%). Consistent with earlier studies, the GTVp radiomics model was not as good as the clinical model [18]. Generally, a high sensitivity leads to a better ability of ruling a disease out, and a high specificity leads to a better ability of ruling in a disease [34]. As our aim of this study is to identify patients who are at higher risk of developing BM, it is better in ruling out BM (higher sensitivity) than ruling in BM, which indicates that management to prevent or detect BM (such as prophylactic cranial irradiation [PCI] and regular brain MRI surveillance) is needed for these patients. Therefore, although the GTVp radiomics model had a better specificity, we considered the GTVn radiomics model as a better one. Furthermore, an NPV of 95 % is very good. Although the PPV and overall accuracy of the GTVn model were not great, the model may still be very valuable in clinical practice, as it can predict with a high likelihood that a patient will not develop BM and hence should not be considered for PCI (currently only within a clinical trial) nor for brain image follow-up. In addition, the specificity and sensitivity do not change when the incidence/prevalence changes, while the PPV and NPV are dependent on the prevalence/incidence of the outcome [34]. The relatively low PPV of all the models is mainly because of the relatively low incidence of BM (14.5% at 2 years) in this cohort, which was probably due to better staging (PET-CT and MRI were performed at diagnosis of stage III NSCLC). Therefore, the GTVp and GTVn should be separately delineated and analyzed in clinical practice and related studies.
Our results showed that wavelet features were the most prominent class associated with BM development as well as OS and PFS, independently from the region of interest of choice (GTV, GTVp, GTVn). Wavelet features decompose the original CT scan into a frequency space (like an "MR-like" image) and they are able to quantify granular textures based on the differences among harder and softer tissues. Our results showed that a combination of high-pass (HLH) and low-pass filtered (LLH) wavelet features are capable of quantifying tumor heterogeneity, which is a potential surrogate for a higher tumor aggressiveness. Although in the literature there is a lack of understanding about the biological meaning of radiomic features, our results suggest that tumors (and more specifically lymph nodes) that have more enhanced textures (GLSZM features) are more likely to spread to the brain, probably suggesting a higher proliferation of aggressive cells. Additionally, there are few studies that focused on extracting the radiomic features from both the GTVp and GTVn. However, previous radiomic studies [35,36] have shown that for the prediction of distant metastases it is better to focus on a larger region than just the GTVp, the so-called peri-tumoral ring.
In addition, we provided nomograms for the clinical model and the GTVn radiomics model, together with the codes for extracting radiomics features from GTVs on the planning CT scan, which clinicians and researchers could use for conducting future studies.
Strengths of this study are the relatively large dataset, the gold standard staging (baseline PET-CT and brain MRI), the administration of radical treatment to every patient, the inclusion of immunotherapy data in a real-world setting, and the long follow-up. All GTVs were rigorously contoured and evaluated by a team of specialists in lung cancer radiotherapy. Planning CTs were homogeneous regarding the scanning protocol. One limitation lies in the fact that we used the planning CT rather than the staging CT before anti-tumor treatment. In CCRT, most patients already had received one chemotherapy administration before having the planning CT. In SCRT, only patients with a reasonable performance status and without progression after chemotherapy were sent for thoracic radiotherapy. One can question the necessity of predicting the risk of BM in patients intended to undergo SCRT but not eligible (progression, poor PS) to undergo thoracic radiotherapy. On the other hand, the use of expertly contoured GTVs is reliable as well as convenient because no additional contouring is necessary, and therefore extracting radiomics features from GTVs is feasible in clinical application for all patients with a radiotherapy treatment plan. Another limitation is the lack of external validation. To overcome this limitation, we performed bootstrapping 1000 times and LASSO regression to develop the radiomics models. Our results can be further tested in future external validation studies (evaluating our model on a separate dataset, TRIPOD type 4 studies), or further confirmed using the same methods with a larger sample size training dataset and an independent validation dataset (TRIPOD type 3 studies) [29].

Conclusions
To our knowledge, this is the first study that demonstrates the prognostic value of the GTVn volume on BM development in patients with stage III NSCLC. Younger patients and those with non-squamous cell carcinoma are at higher risk of developing BM. Radiomics features of GTVn have greater prognostic value than GTVp and GTV for BM development. Therefore, the GTVp and GTVn will be contoured and analyzed separately in clinical practice and future studies. Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models.

2-3 3b
Specify the objectives, including whether the study describes the development or validation of the model or both. 3

Source of data 4a
Describe the study design or source of data (e.g., randomized trial, cohort, or registry data), separately for the development and validation datasets, if applicable.

4b
Specify the key study dates, including start of accrual; end of accrual; and, if applicable, end of follow-up. 3 Participants 5a Specify key elements of the study setting (e.g., primary care, secondary care, general population) including number and location of centres. Clearly define the outcome that is predicted by the prediction model, including how and when assessed. 5 6b Report any actions to blind assessment of the outcome to be predicted. NA

Predictors 7a
Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured.

3-5 7b
Report any actions to blind assessment of predictors for the outcome and other predictors. NA Sample size 8 Explain how the study size was arrived at. 3

Missing data 9
Describe how missing data were handled (e.g., complete-case analysis, single imputation, multiple imputation) with details of any imputation method. Risk groups 11 Provide details on how risk groups were created, if done. NA

Participants 13a
Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful. 6, Figure 2 13b Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and outcome.  Tables 2 and 3 15b Explain how to the use the prediction model. 7, Figure 3 Model performance 16 Report performance measures (with CIs) for the prediction model. 7

Limitations 18
Discuss any limitations of the study (such as nonrepresentative sample, few events per predictor, missing data).

10-11
Interpretation 19b Give an overall interpretation of the results, considering objectives, limitations, and results from similar studies, and other relevant evidence.

8-11
Implications 20 Discuss the potential clinical use of the model and implications for future research. 10

Supplementary information 21
Provide information about the availability of supplementary resources, such as study protocol, Web calculator, and datasets. 5

Funding 22
Give the source of funding and the role of the funders for the present study. Title page