Radiomics based analysis to predict local control and survival in hepatocellular carcinoma patients treated with volumetric modulated arc therapy

To appraise the ability of a radiomics based analysis to predict local response and overall survival for patients with hepatocellular carcinoma. A set of 138 consecutive patients (112 males and 26 females, median age 66 years) presented with Barcelona Clinic Liver Cancer (BCLC) stage A to C were retrospectively studied. For a subset of these patients (106) complete information about treatment outcome, namely local control, was available. Radiomic features were computed for the clinical target volume. A total of 35 features were extracted and analyzed. Univariate analysis was used to identify clinical and radiomics significant features. Multivariate models by Cox-regression hazards model were built for local control and survival outcome. Models were evaluated by area under the curve (AUC) of receiver operating characteristic (ROC) curve. For the LC analysis, two models selecting two groups of uncorrelated features were analyzes while one single model was built for the OS analysis. The univariate analysis lead to the identification of 15 significant radiomics features but the analysis of cross correlation showed several cross related covariates. The un-correlated variables were used to build two separate models; both resulted into a single significant radiomic covariate: model-1: energy p < 0.05, AUC of ROC 0.6659, C.I.: 0.5585–0.7732; model-2: GLNUp < 0.05, AUC 0.6396, C.I.:0.5266–0.7526. The univariate analysis for covariates significant with respect to local control resulted in 9 clinical and 13 radiomics features with multiple and complex cross-correlations. After elastic net regularization, the most significant covariates were compacity and BCLC stage, with only compacity significant to Cox model fitting (Cox model likelihood ratio test p < 0.0001, compacity p < 0.00001; AUC of the model is 0.8014 (C.I. = 0.7232–0.8797)). A robust radiomic signature, made by one single feature was finally identified. A validation phases, based on independent set of patients is scheduled to be performed to confirm the results.


Background
Hepatocellular carcinoma (HCC) is the third cause of cancer death and one of the most challenging oncological problems [1]. Surgery, although providing survival rates up to 70% at 5 years [2], is viable in a small fraction of patients (less than 1/3) because of advanced stage at diagnosis. In this clinical setting the use of radiotherapy was limited by severe radiation induced liver disease (RILD) [3][4][5][6][7]. After the introduction of intensity modulated radiotherapy (IMRT) and Volumetric modulated Arc Therapy (VMAT), a new hope emerged for radiotherapy in HCC patients [8][9][10]. Preliminary valuable data resulting from the use of VMAT also in association with stereotactic body radiation therapy (SBRT), were proved [11][12][13][14]. In this context, it would be important to develop and validated tools capable to predict for individual patients, the likelihood of tumor control and possibly of survival in order to better personalize the treatment offering. Textural analysis of diagnostic images is a very broad area of research which might lead to the definition of such tools. In particular, radiomics is an emerging field that converts imaging data into a high dimensional mineable feature space using a large number of automatically extracted data-characterization algorithms [15,16]. Radiomics has being evaluated, in oncology, also as a potential prognostic indicator, useful for classifying patients and evaluating their assignment to risk categories in order to customize and tailor the prescribed oncological treatments [17][18][19]. While several investigations has been published on the use of radiomics in many cancer models [20][21][22] and the correlation between radiomics signatures to radiation treatment outcome, little is available for liver cancer.
In general, some studies were published concerning the use of texture analysis in the liver (primary hepatocellular carcinoma or metastatic disease) to either classify the lesion type or to facilitate the therapeutic decision. Echegaray [23] investigated (retrospectively on 29 patients with HCC) the possibility to identify robust radiomics features in CT image datasets, insensitive to segmentation processes and identified them in the intensity and texture families. The study was done testing multiple manual contouring by different radiologists and identifying automatic "core sample" regions of interest for the textural analysis. Chen [24] analyzed the prognostic value of texture features for hepatocellular carcinoma on a cohort of 61 patients who underwent hepatectomy. CT textural characteristics allowed to identify higher order features with potential prognostic value outperforming the more traditional predictors like the Barcelona Clinic Liver Cancer (BCLC) stage. Li [25] explored the potential of CT textural analysis to stratify patient with HCC and to help in the determination of the optimal therapeutic procedure among resection or arterial chemoembolization. Authors claimed that wavelet decomposition allowed a successful stratification of the patients although further validation was required. Raman [26] used radiomics analysis of CT data to classify different liver lesions types, with specific regard to hypervascularization. The predictive model they trained and validated (on a retrospective cohort) allowed to correctly classify adenomas, focal nodular hyperplasia and hepatocellular carcinoma with accuracy in the range of 91-99% while human observers had a correspondent accuracy in the range of 66-72%. Lubner [27] appraised the role of radiomics analysis of CT images for hepatic metastatic colorectal cancer, finding that primarily histogram based features were significantly associated to tumor grade in untreated liver metastases suggesting that twodimensional (2D) texture analysis on single slices might be adequate. Similarly, Simpson [28] studied the correlation between texture analysis of CT datasets versus the risk of hepatic recurrence after resection of liver metastases in colorectal cancer patients. The hypothesis was that radiomics features could be predict the risk of future recurrence. The results confirmed that quantitative imaging features of the future liver remnant (after first resection) were predictive of hepatic disease-free survival (as well as of overall survival).
Literature search with various combinations of keywords like "radiomics" or "texture analysis" (and variants) in relation to "liver" and "radiotherapy" (and variants) did not provide any, suggesting that no published data might exist on the role of radiomics in the assessment and prediction of radiation treatment outcome for HCC patients.
In this study we present the results of a feasibility investigation aiming to identify possible radiomics signature applied to HCC patients for detecting a prognostic classification of such patients. Endpoints for the study were overall survival and the local control of the tumor after radiation treatment administered with volumetric modulated arc therapy.

Patients and treatment
Hundred thirty-eight consecutive HCC patients presented BCLC stage A to C and were eligible for curative or palliative radiotherapy and treated with VMAT as previously detailed in the retrospective analysis [29,30]. All selected patients in the original retrospective study were either inoperable or not eligible for trans-arterial chemo embolization (TACE) treatments and received radiotherapy as primary treatment. In brief, patients with BCLC stages A to C, Child-Pugh stages A-B with single lesions larger than 5 cm or multi-nodular lesions larger than 3 cm were considered as eligible for radiotherapy. Portal vein thrombosis was present in about 53.6% of the cases. Dose prescription ranged from 45 Gy to 66 Gy depending upon stage, location of target and its size and general conditions of patient. All patients were treated with volumetric modulated arc therapy.
All patients were included in this new retrospective analysis and two cohorts (full or restricted) were defined according to the availability of survival data (available for all patients) and of objective response (for local control, available in a subset of patients).
All patients were treated between February 2009 and December 2010 according to the Helsinki declaration; ethical approval for retrospective analysis of data was provided by the institutional ethical review board.
Clinical evaluation was performed, with reference to baseline conditions: basic treatment outcome was measured in terms of in-field local control (visits included laboratory assessment and CT and MRI imaging (at 2 to 3 month intervals for at least 2 years and at 6 month intervals thereafter)) and patient overall survival and it was scored continuously with a median follow-up of 9 months (minimum 1 month, maximum 28 months). Tumor response was assessed using Response Evaluation Criteria in Solid Tumors (RECISTs) criteria. Local in field recurrence was defined by new enhancement or progressive disease with CT or MRI imaging during follow-up.

Radiomics image analysis
The entire dataset of the treatment planning noncontrast enhanced CT images, all acquired with 3 mm slice thickness with an in-plane resolution of 0.8 mm, was analyzed to extract a number of textural features from the clinical target volumes contoured for the radiotherapy plans. The volumes subject to the textural analysis were defined as the clinical target volumes (CTV) manually contoured for the radiation treatment. The feature extraction was performed by means of the LifeX package [31,32]. A total of 35 features were extracted from the analysis of the volumes inspected. These indices included conventional parameters, shape and size features, histogram-based features, second and high order-based features. The gray-level co-occurrence matrix (GLCM) [33]; the neighborhood gray-level different matrix (NGLDM) [34]; the grey level run length matrix GLRLM) [35] and the grey level zone length matrix (GLZLM) [36] were computed for each patient. The list of the corresponding features is provided in Table 1 while a detailed description of all the features, can be found in [37].
In addition to these groups, other parameters were derived for each volume: the sphericity and the compacity which measure the characteristics of the shape of the volume relatively to its regularity and compactness. From the histogram of the gray level distribution in the volume, a set of further parameters was obtained: the skewness (measure of the asymmetry of the distribution), the kurtosis (measuring weather the distribution is peaked or flat relative to a normal distribution), the entropy (randomness of the distribution) and the energy (uniformity of the distribution). In the survival analysis, the selection of covariates was obtained by elastic net regularization process in order to deal with multiple cross related covariates and reduce the risk of overfitting of the data. The elastic net regularization was introduced by Zou and Hastie [39] and aimed to improve both the accuracy of the prediction and the interpretation of the models. Elastic net regularization does automatic variable selection and continuous shrinkage, and can select groups of correlated variables allowing to identify the best predictors when a set of predictors is much more larger than the number of cases. Overall survival analysis was performed on the unrestricted dataset and local control on the restricted dataset.

Results
A total number of 138 patients were enrolled in the study (full dataset -FD). Patients characteristics are summarized in Table 2. For all of them survival was  Table 3. Analysis of cross correlation (Fig. 1) showed several cross related covariates, so only covariates with Pearson's correlation test p > 0.05 were used for multivariate analysis in two different logistic models (

Survival data analysis
Using OS outcome for the analysis in the full dataset, the univariate log-rank test for covariates showed several significant results using all cases. Using this test, continuous numerical covariates were divided according to mobile threshold in order to better distinguish two categories of patients to fit the outcome. Table 5 summarizes the results of univariate log-rank test. Both clinical and radiomics covariates have been included and found significant. Figure 2 shows the cross-correlation matrix, indicating that there are multiple and complex crosscorrelation among different covariates. This fact led to use a different approach for selecting the significant covariates that was the elastic net regularization. The result of such analysis showed that the most significant covariates with 1 standard deviation of partial likelihood deviance were the compacity and BCLC but Cox model fitting with stepwise regression returned only compacity as significant covariate (Cox model likelihood ratio test p < 0.001, compacity p < 0.0001, Fig. 3). AUC of ROC of the model is 0.8014 (C.I. = 0.7232-0.8797). The calibration plots of Cox model are shown in Fig. 4.

Discussion
The scope of our investigation was to perform a feasibility study to correlate some radiomics signatures to the clinical outcome in a retrospective analysis of a large cohort of patients already investigated and reported [29,30]. As a matter of fact, texture analysis has been scarcely applied to primary liver cancer and, those studies, mostly focused on classification issues or on the development of decision aiding tools [23][24][25][26]. Some efforts have also been reported about the use of radiomics for the study of metastatic disease, particularly from colorectal cancer [27,28]. In our study, the use of CT scan has given the chance to obtain images whose features have shown the possibility to be modeled according clinical and survival outcomes. The methodology implemented in this study is simple and easy to reproduce and the generation of the features was based on a validated package, freely available from the authors [31], all positive facts for a feasibility investigation; it is of course at the same time a limitation of the project not having introduced higher order texture analysis methods. Nevertheless, we hypothesized that, if a radiomics signature was to be found and possibly used in practice, and eventually shared,this should have been identified among the most robust and easy to implement categories.
The use of non-contrast enhanced treatment planning CT datasets, and the possibility to analyze the regions of interest identified as clinical target volume in the radiotherapy planning process, is another factor of interest of the study since enables an easy procedure and makes the process potentially available to all patients who will be scheduled for RT treatments. A limit of this approach, not appraised in our study is of course the sensitivity and robustness of the radiomics features to the segmentation process, the inter-observer variability (how different CTVs would be contoured by different radiation oncologists) and the possible presence of artifacts in the images (e.g. markers for positioning purposes). Apart from recognizing this limit, we shall consider that,  unfortunately, this is a key problem for all kind of radiomics investigations. It is our opinion that predictive models will have to be built on large scale population datasets, from multiple institutions and from different scanning devices in order to encompass at maximum, the inherent variance of the input data. In this respect, our pilot study cannot of course solve the problem but, the further validation steps will try to appraise some of these points. A second important factor to consider is the consistency of the patient's cohort. In this study we focused on survival and on local control as a direct measure of the efficacy of the delivered treatment. A large cohort was available from an earlier retrospective study and it was used to compute all the radiomics features and to investigate the general aspect and OS. Unfortunately, the availability of clinical response outcome (LC) restricted the number of patients analyzed in the multivariate logistic regression models for that endpoint and this fact might led to get lower discrimination power models than the one achieved by analyzing the overall survival outcome. Concerning the whole dataset, despite the presence of a single significant covariate (compacity) the OS model is able to fit the outcome with a fair discrimination performance (AUC of ROC of the model 0.801). Looking at calibration plots the best survival estimate is given for the 12 months survival (Fig. 4) while the calibration at 24 months returned a lack of fit for the first group of patients and a general underestimate of survival prediction for the other two groups. This fact could be related to the lower median FUP time (16.6 months) respect to the length of this endpoint so longer time prediction could be achieved by increasing the FUP time and the number of observations.
An obvious limitation of this feasibility study, due to the issue mentioned above, is the lack of a validation based on an independent dataset. The limited consistency of the investigated cohort, prevented the possibility to separate it into training and testing subgroups and for this reason, a separate validation study is scheduled to be performed on a multicentric basis and with the (retrospective) inclusion of patients treated with either conventional or hypofractionated regimens. To provide a specific declination of this limitation, we might consider the fact that, since the full range of covariates did not return significant results, we applied thresholding methods to the covariates. The cut-off were identified based on the p-value analysis. Nevertheless, the absence of an external validation might question the robustness of these thresholds. This could in fact potentially cause a bias or a false positive because the explicit values might not be suitable for other population/ situations. All this points to the necessity of a further set of investigations in this area, looking for an independent validation of the models.

Conclusions
A radiomics signature made of a single textural feature allowed to fit a predictive model with a fair discrimination performance in HCC patients treated with volumetric modulated arc therapy. Further validation studies, at mono-and multi-centric level are mandatory and scheduled to confirm these findings.