Radiomics-based prediction of survival in patients with head and neck squamous cell carcinoma based on pre- and post-treatment 18F-PET/CT

Background: 18-fluorodeoxyglucose positron emission tomography/computed tomography (18F-PET/CT) has been widely applied for the imaging of head and neck squamous cell carcinoma (HNSCC). This study examined whether pre- and post-treatment 18F-PET/CT features can help predict the survival of HNSCC patients. Results: Three radiomics features were identified as prognostic factors. Radiomics score calculated from these features significantly predicted overall survival (OS) and disease-free disease (DFS). The clinicopathological characteristics combined with pre- or post-treatment nomograms showed better ROC curves and decision curves than the nomogram based only on clinicopathological characteristics. Conclusions: Combining clinicopathological characteristics with radiomics features of pre-treatment PET/CT or post-treatment PET/CT assessment of primary tumor sites as positive or negative may substantially improve prediction of OS and DFS of HNSCC patients. Methods: 171 patients who received pre-treatment 18F-PET/CT scans and 154 patients who received post-treatment 18F-PET/CT scans with HNSCC in the Cancer Imaging Achieve (TCIA) were included. Nomograms that combined clinicopathological features with either pre-treatment PET/CT radiomics features or post-treatment assessment of primary tumor sites were constructed using data from 154 HNSCC patients. Receiver operating characteristic (ROC) curves and decision curves were used to compare the predictions of these models with those of a model incorporating only clinicopathological features.

In the present study, we used a quantitative radiomics approach to extract imaging features from pre-treatment 18 F-FDG PET/CT scans of patients with HNSCC and a conventional approach to extract positive/negative findings from post-treatment scans. Then we combined each of these types of data with clinicopathological characteristics to generate models to predict survival. The predictive performance of these models was compared to that of a model-based only on clinicopathological characteristics.

Patient characteristics and radiomic signatures
A total of 171 patients (training cohort = 115 and a validation cohort = 56) were analyzed for the construction of a Radiomics score (Rad-score) model based on pre-treatment PET/CT, and 154 patients were analyzed for the development of nomograms based on pre-or post-treatment PET/CT. The clinical characteristics of patients in the training and validation cohorts were summarized in Table 1. The correlations between extracted radiomics features were calculated and visualized by a correlation matrix (Figure 1). LASSO Cox regression was used to choose potential prognostic predictors from the 56 radiomics features in the training cohort ( Figure 2). Three radiomics features were identified, and both the univariate and multivariate analyses of the selected features were performed to show the correlation of these features with patients' survival (Supplementary Table 1 The optimal cut-off value of the Radscore was 0.01187901, and patients in the training and validation cohorts were accordingly classified as low-or high-risk. Supplementary Table 2 shows clinicopathological characteristics between patients with low and high risk. In the pre-treatment Rad-score model, the Kaplan-Meier analysis showed that high risk was associated with significantly worse overall survival (OS) in the training cohort (HR 5.  (Figure 4). Cox regression showed that both the pre-treatment Radscore and post-treatment outcomes were significant independent predictors of both OS and DFS (Supplementary Table 3). Besides, we compared the concordance index (C-index, which is proportional to the survival-prediction ability of variables) between Rad-score and four conventional PET features (TLG, MTV, SUVmean, and SUVmax). The results showed that the survival-prediction ability of the Rad-score was much better than not only each single conventional PET feature but also the combined of four (Supplementary Table 4).

Prediction of OS and DFS using models based on radiomic signatures
As a first step in constructing predictive models based on radiomic signatures, we created a conventional prediction model based only on clinical characteristics of 154 HNSCC patients according to inclusion and exclusion criteria. This conventional clinical model also served as a benchmark for assessing the prognostic performance of the radiomic models.     Figure 1).
Radiomics signatures from the pre-treatment PET/CT scans were added to this conventional clinical model, and the corresponding model was used to generate OS and DFS nomograms ( Table 2 and Figure 5). The C index indicated good discrimination of OS (C index 0.77, 95%CI 0.70-0.84) and DFS (C index 0.77, 95%CI 0.70-0.83). Calibration curves calculated for 3, 5, or 7 years showed good agreement with the OS and DFS nomograms.
Good results were also obtained when positive/negative findings based on post-treatment PET/CT were added to the conventional clinical model ( Table 2 and Figure 6). The corresponding nomograms showed excellent accuracy and discrimination for OS (C index 0.822, 95%CI 0.767-0.877) and DFS (C index 0.832, 95%CI 0.781-0.883). www.aging-us.com

Comparison of models
Comparison of ROC curves at 3, 5, and 7 years showed that the pre-treatment model predicted OS and DFS better than the conventional clinical model. In contrast, the post-treatment model performed significantly better than the pre-treatment model. Similarly, decision curves showed that the post-treatment model maximized clinical benefits for patients in the prediction of OS and DFS at 3, 5, and 7 years (Figures 7 and 8).

DISCUSSION
18 F-FDG-PET/CT radiomics signatures, which can capture spatial heterogeneity in tumors, have been applied as potential prognostic markers in many cancers, including gastric cancer [19], nasopharyngeal carcinoma [20], NSCLC [21], and HNSCC. HNSCC is a clinically heterogeneous disease, and few biomarkers are available for predicting tumor response to treatment or prognosis [22]. The present study used machine learning to identify 56 radiomics features in PET/CT scans of patients with HNSCC, and these features were significantly associated with OS and DFS. Combining some of these features with patients' clinicopathological characteristics allowed reliable and accurate predictions of OS and DFS, which were substantially better than those obtained based on clinicopathological characteristics alone. The models described here may help improve the design of treatment strategies in HNSCC and thereby lead to better patient prognosis.   18 F-FDG-PET/CT has also been widely applied to predict survival in cancer patients because of its ability to provide information on tumor burden and aggressiveness. Bogowicz et al. [22] compared PET and CT radiomics for prediction of local tumor control in HNSCC, and they found PET to be more accurate than CT in predicting www.aging-us.com 14600 AGING tumor local control rate. Those authors highlighted the need to pay more attention to PET-based radiomic analysis for predicting prognosis. Kim et al. [25] examined the ability of PET/CT to predict treatment failure and guide clinical decision-making about salvage surgery. Despite their relatively small sample, they were able to predict OS and PFS reasonably well based on post-treatment PET findings. The optimal time to perform PET/CT on HNSCC patients and the optimal prognostic model for predicting survival remain unclear. Our study identified a pre-treatment Rad-score, comprising SHAPE_Sphericity, NGLDM_Coarseness, and standardized MTV (SMTV). This integrated PET/CT signature, when combined with clinicopathological www.aging-us.com 14601 AGING characteristics, shows promise for predicting OS and DFS of HNSCC patients. Previous studies have demonstrated the prognostic significance of traditional PET quantitative parameters such as SUV [26], MTV, and TLG [27]. We found, however, that these parameters did not predict OS or DFS as well as the combination of our PET/CT radiomic signatures with a subset of clinico-pathological characteristics. These findings highlighted the potential role of PET/CT radiomic signatures that could play in the high throughput machine learning era. At the same time, our study suggested that post-treatment positive/negative findings may have even more prognostic potential than pre-treatment Rad-score when combined with clinicopathological characteristics. www.aging-us.com 14602 AGING Our findings suggested the potential of PET/CT radiomic signatures to predict the prognosis of patients with HNSCC reliably. These promising results may partly reflect our efforts to control for heterogeneity in the patient population, which came from a single center with the same scanner. While this approach allows us to reduce potential confounding due to heterogeneity of patient characteristics and hospital practices, it also threatens the external validity of our results. Therefore, our findings should be verified and extended in larger, preferably multi-site patient populations.

CONCLUSIONS
The present study using publicly available 18 F-PET/CT images suggests that combining clinicopathological characteristics with specific radiomic signatures from pre-treatment scans or with post-treatment assessment of primary tumor sites as positive or negative can predict OS and DFS of patients with HNSCC significantly better than clinicopathological characteristics alone.

Patient population
We extracted 18 F-FDG-PET/CT scans from the publicly available HNSCC dataset on The Cancer Imaging Achieve (TCIA) platform of the University of Texas MD Anderson Cancer Center [28] (http://www.cancerimagingarchive.net/). Of the total set of 2,840 consecutive patients with HNSCC treated with curative radiotherapy at the MD Anderson Cancer Center between 1 October 2003 and 31 August 2013 [29]. Two hundred fifteen patients overlapping in TCGA and TCIA databases were initially selected. Of these, 203 patients were included because they did not have a primary diagnosis of nasopharyngeal carcinoma, cancer of unknown primary site, or recurrent HNSCC. For the identification of preradiomics signatures, patients were excluded from the analysis if their pre-treatment PET/CT images were unavailable or the region of interest on their scans was too small to extract features. The rest of the patients were randomly divided into a training cohort and a validation cohort using the caret package in R 3.6.1 [30]. Finally, 171 patients with available pre-treatment PET scans and 154 patients with available posttreatment PET/CT scans were included in our study, according to the Data Descriptor [28]. For further identification of post-radiomics signatures and model construction, patients from the original cohort were included except for those who lacked the paired preand post-treatment PET/CT images (Supplementary Figure 2).

Pre-treatment PET/CT image analysis and feature extraction
The pre-treatment PET/CT images were segmented, and the features were extracted using LIFEx 4.0 (http://www.lifexsoft.org) [31]. The primary tumor without lymph nodes was segmented by two specialists in nuclear medicine (Y.C. and W.D.), who delineated a computer-generated volume of interest around voxels equal to or greater than 40% of SUVmax [32]. Noise in images was reduced by resampling FDG uptake values using 64 discrete values, boundary SUV values of 0 to 30, and a bin width of 0.47, based on typical SUVs for HNSCC tumors [33]. Data were extracted on 56 quantitative PET parameters, first-order intensity features, shape features, and texture indices (Supplementary Table 5 and Supplementary Material). Finally, texture features were investigated based on gray-level co-occurrence matrices, gray-level run-length matrices, neighborhood gray-tone difference matrix wavelet decompositions, and gray-level size zone matrices.

Post-treatment PET/CT image interpretation
The post-treatment PET/CT scans were reviewed independently by two specialists in nuclear medicine (Y.C. and W.D.), who determined whether the residual or recurrent disease was presented. Scans were judged negative if no focal increase in FDG uptake was evident, or if an increase in FDG uptake was apparent but could be attributed to physiological causes or the treatment [5]. Discrepancies between the independent assessments were resolved in consultation with a senior specialist in nuclear medicine (Z.Y.J.) and a radiation oncologist (X.C.P.). Pearson correlation analysis was performed to show the correlations between extracted radiomics features.

Feature selection and integration into a single Radscore
Post-normalized Fifty-six radiomics features were entered into a "least absolute shrinkage and selection operator" (LASSO) algorithm [34] in a Cox regression model based on penalized maximum likelihood, to shrink the regression coefficients of most radiomics variables to zero. The λ is a penalty parameter that varies in each step of model fitting. Bootstrapping was used to cross-validate 1000 times to the built model and to select the variables most relevant to overall survival (OS) in the training cohort at an optimal λ. The minimum λ giving a minimum mean cross-validated error of the built model was determined, and the coefficients of the selected variables were identified at this λmin. Then a Rad-score for each patient was computed based on all LASSO-selected features using the following formula: where the coefficient of radiomics feature (i) was the coefficient determined in the regression model.
Data in the training set were used to generate a timedependent receiver operating characteristic (ROC) curve by survivalROC Package in R to describe the ability of the Rad-score to predict OS, which was defined as the period from the first diagnosis to death. This curve was used to identify the optimal cut-off for the Rad-score, and patients whose Rad-scores were higher than this threshold were classified as "high risk," while those with Rad-scores equal to or lower than the threshold were classified as "low risk".

Model construction and evaluation
The following three models were used to predict OS and disease-free survival (DFS), defined as the period www.aging-us.com 14605 AGING from the first diagnosis to death due to HNSCC: a conventional clinical model, a pre-treatment PET/CT model, and a post-treatment PET/CT model. The conventional clinical model contained several pretreatment clinical characteristics that have been linked to the survival of HNSCC patients [35]: body mass index, age, T stage, N stage, stage according to the 7th edition of the American Joint Committee on Cancer (AJCC) guidelines, tumor location, histology grade, and smoking history. The model was optimized in a stepwise manner based on the Akaike information criterion, after which time-dependent variables were excluded by applying an assumption of proportional hazards. The pre-treatment model was generated by adding the Rad-score to this conventional clinical model. The post-treatment model was created by adding positive/negative findings (based on post-treatment PET/CT scans) to the conventional clinical model. Three nomograms were constructed based on the three models.
The various models were assessed for their ability to predict OS or DFS at 3, 5-or 7-years using calibration curves and Harrell's concordance index (C index). The "bootstrap split" method [36] was applied with 1000 iterations. Models were also assessed and compared using ROC curves, and the overfitting risk was evaluated using the Akaike information criterion. A decision curve analysis (DCA) was conducted to help to determine which model is the best in clinical use by comparing benefits and the harms of false-positive and false-negative prediction on the same scale [37,38].

Statistical analysis
Data were analyzed statistically using R 3.6.1 [30] and a significance threshold of p = 0.05. LASSO-based Cox regression was conducted using the glmnet package, while ROC curves and optimal cut-offs were generated using the survivalROC and tdROC packages [39,40]. The Pearson's correlation analysis was conducted and visuliazed by rattle package. OS and DFS were calculated, and survival curves were plotted using Kaplan-Meier analysis; statistical inference about the survival difference between high-and low-risk patients was accomplished using the Cox regression statistic, and the analyses were performed using the survival package [41,42]. Multivariate Cox models were constructed and evaluated using the survival and pec packages [43][44][45], while decision curves were analyzed using the DCA package. When appropriate, results were reported as hazard ratios (HRs) with associated 95% confidence intervals (CIs). Collinearity diagnostics were run using SPSS software, version 25.0 (IBM Corporation, Armonk, NY, USA) to ensure partial regression coefficients derived from regression analyses were estimated precisely and that the relative importance of each predictor for OS and DFS could be assessed reliably.

AUTHOR CONTRIBUTIONS
In the present study, X.P. and Z.J. were responsible for the study design and participated in evaluation of results. Y. Chen, D.W., Z.L., and Y.Cao participated in collection of study materials or patients. D.W., Z.L., and Y.Cao participated in collection and assembly of data. Z.L. and Y.Cao did the data analysis and interpretation. Z.L. and Y.Cao drafted the manuscript. X.P., Z.J, Z.L.and Y.Cao proofread the manuscript for important intellectual content. All authors contributed to manuscript preparation. All authors reviewed the report and approved the final version.

ACKNOWLEDGMENTS
The authors thank Prof. Peng Huang who works at Evidence Medical Center of Nanchang University for his help in statistics.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.