Predicting the outcomes of hepatocellular carcinoma downstaging with the use of clinical and radiomics features

Background Downstaging of hepatocellular carcinoma (HCC) makes it possible for patients beyond the criteria to have the chance of liver transplantation (LT) and improved outcomes. Thus, a procedure to predict the prognosis of the treatment is an urgent requisite. The present study aimed to construct a comprehensive framework with clinical information and radiomics features to accurately predict the prognosis of downstaging treatment. Methods Specifically, three-dimensional (3D) tumor segmentation from contrast-enhanced computed tomography (CT) is employed to extract spatial information of the lesions. Then, the radiomics features within the segmented region are calculated. Combining radiomics features and clinical data prompts the development of feature selection to enhance the robustness and generalizability of the model. Finally, we adopt the support vector machine (SVM) algorithm to establish a classification model for predicting HCC downstaging outcomes. Results Herein, a comparative study was conducted on three different models: a radiomics features-based model (R model), a clinical features-based model (C model), and a joint radiomics clinical features-based model (R-C model). The average accuracy of the three models was 0.712, 0.792, and 0.844, and the average area under the receiver-operating characteristic (AUROC) of the three models was 0.775, 0.804, and 0.877, respectively. Conclusions The novel and practical R-C model accurately predicted the downstaging outcomes, which could be utilized to guide the HCC downstaging toward LT treatment. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-023-11386-0.


Background
Liver transplantation (LT) is a potentially curative treatment in patients with hepatocellular carcinoma (HCC), and the 5-year overall survival (OS) after LT was > 70% [1][2][3][4].However, two-thirds of HCC patients did not meet the criteria for diagnosis [5,6].For advanced HCC patients, the median survival was 6-7 months if untreated, and the 5-year survival was 18-32% post-LT [3,[6][7][8][9].Downstaging was defined as reducing the tumor burden to meet the criteria for LT, making it possible for patients beyond the criteria to have the chance of LT [4,10,11].Recent studies have shown that 5-year OS of HCC patients after successful downstaging to Milan criteria was 73.2-93.8%,which was similar to patients initially meeting the Milan criteria and better than that of no transplantation (31.2%) [4,12,13].The definition of successful downstaging was controversial, however, recent studies commended that the OS of HCC patients after successful downstaging to the setting criteria was similar to patients initially meeting the criteria [14][15][16].
Predicting the outcomes of downstaging could provide individualized treatment, reduce unnecessary interventions, and provide healthcare to HCC patients.Due to the wide range of downstaging failure rates 23-89%, there is an urgent requirement for an objective and accurate downstaging prediction model [16][17][18].Some studies speculated that downstaging-failed patients had more frequent MVI and worse tumor grades than successful patients [14].In the study by Barakat et al., the noninfiltrative expanding tumor type was the sole predictor of successful downstaging; however, noninfiltrative evaluation was difficult, especially in small tumors [3].Overall, no specific objective models are available to predict the downstaging response precisely.Radiomics can deduce more information than human eyes and achieve brilliant prognostic accuracy in various clinical tasks, such as the prediction of microvascular invasion [19,20].Machine learning (ML) was used successfully in many applications related to classification (diagnosis), such as lung cancer and response to treatment.To the best of our knowledge, no previous studies have predicted the downstaging outcomes using ML in HCC patients.
In the current study, we developed a model based on clinical data and radiomics features to predict the downstaging success in HCC undergoing local regional or systemic therapy, which was validated in an independent test cohort.The present study aimed to establish an HCC downstaging prediction model based on clinical data and radiomics features through ML.

Patients
Patients diagnosed with HCC who received locoregional therapy or systemic therapy for downstaging were considered for this study from March 2015 to December 2021 in Beijing Tsinghua Chunggung Hospital (Beijing, China) and from January 2019 to December 2021 in The First Hospital of Jilin University (Jilin, China).The diagnosis of HCC was based on radiographic imaging (Liver Imaging Reporting and Data System, LI-RADS) or biopsy.The inclusion criteria were as follows: (1) patients with baseline enhanced computed tomography (CT) and follow-up imaging 4-12 weeks after the first downstaging and complete clinical data; (2) the tumor was beyond the up-to-seven criteria.The exclusion criteria were as follows: (1) patients with cholangiocarcinoma and mixed hepatocellular-cholangiocarcinoma (diagnosed by pathology); (2) patients with metastasis.The successful downstaging in the current study was to facilitate LT.Based on the modified Response Evaluation Criteria in Solid Tumors (mRECIST) assessment, the endpoint of successful downstaging is that the patients meet the upto-seven criteria for LT.
The discovery cohort was randomly divided into two cohorts: the training set (n = 74, 70%) and the test set (n = 32, 30%) (see Fig. 1).The study was conducted according to the guidelines of the Declaration of Helsinki and approved by Beijing Tsinghua Changgung Hospital Ethics Committee (Approval No. 21269-4-04).

Radiomics analysis Tumor segmentation
Tumor segmentation was performed by two experienced radiologists using ITK-SNAP (version 3.6.0)during the portal-venous phase of enhanced CT and reviewed by a senior radiologist, as shown in Fig. 2A.

Data pre-processing
The homogenous feature was calculated using default parameters.Since the CT images were heterogeneous, we performed image normalization to resample the CT volume to the same target spacing (1,1,1).Specifically, third-order spline interpolation was utilized for in-plane resampling, whereas nearest neighbor interpolation was performed for out-of-plane interpolation.
All features (including radiomics features and clinical information) were normalized to a standard scale based on z-scoring normalization.

X = (x − µ) σ
Where µ is the average value of all features, σ is the standard deviation of all features.The least absolute shrinkage and selection operator (LASSO) regression model was used to select imaging features in high-dimensional data for the predictive model, as shown in Fig. 2B.
Next, we developed three models using only radiomics features (R), only clinical features (C), and both (R-C), respectively (see Fig. 2C).The classification prediction models were constructed based on a selection support vector machine (SVM) using the scikit-learn package.
Univariate analyses were performed to determine the clinical features.Clinical data that reached statistical significance in univariate analysis were included in the ML.In the model, we input missing values by the median of non-missing entries in the specific feature column.Finally, the candidate clinical variables and radiomics features were utilized to investigate the prediction model.
SHapley Additive exPlanations (SHAP) value was utilized to compute the distribution of features in the prediction model [21].To enhance the interpretability of the model, we employed the SHAP value to express the importance of features in the established SVM classification model.

Statistical analysis
Results were expressed as mean ± standard deviation and medians with interquartile range for continuous In addition, the area under the receiver-operating characteristic curve (AUROC), the area under the precisionrecall curve (AUPRC) and F1-score were utilized as evaluation indexes.F1 score was defined as: The clinical utility of the models was evaluated using decision curve analysis (DCA).The statistical comparison of AUROC was evaluated using Delong test.Statistical analyses were performed using SPSS v25.0 and R 4.3.2.Statistical significance was set at p < 0.05.

Baseline clinical characteristics
A total of 106 HCC patients who underwent downstaging in Beijing Tsinghua Changgung Hospital from March 2015 to November 2021 and in The First Hospital of Jilin University from January 2019 to December 2021 were collected.The patients' clinical features are listed in Table 1.This retrospective cohort comprised 95 males and 11 females with a mean age 56.36 ± 1.07 years.Also, 54 patients had successful downstaging, while the other 52 presented failed downstaging.The detailed downstaging treatments are listed in Table S1.
The two groups were similar in their distribution of age, sex, BMI, hepatic virus infection, cirrhosis, ascites, chronic disease (hypertension and diabetes mellitus), and type of treatment (Table 2).Among all factors, the following clinical features were related to the downstaging outcomes in univariate analyses: Child-Pugh class, AST, GGT, AFP, Portal Vein Tumor Thrombus (PVTT), and the number of tumors (p < 0.05).

Feature extraction
For R model features selection, a total of 112 radiomics features were extracted and normalized.A total of 18 features with nonzero coefficients in the LASSO regression model were selected, including two image-original related features, six shape-related features, two first-order features, and eight textural features.For the C model's features, four features were selected: PVTT, tumor number, AFP, and GGT.
The R-C model consisted of 13 features with nonzero coefficients in LASSO regression model, including 10 radiomics features and three clinical features.The clinical features were as follows: PVTT, tumor number, and AFP.The radiomics features included one image-related feature, three shape-related features, one first-order feature, and five textural features.Moreover, 8 radiomics features comprised the R and R-C models.
The importance of selected features was showed in Figure S2.The detailed features of three models were listed in Table S2.

Predictive Model Development and Validation
We adopted the SVM algorithm to establish three classification models for predicting HCC downstaging outcomes with the features selected above: R, C, and R-C.Next, we tested the performance of different models in the same training and test cohorts.The training cohort consisted of 74 (70%) cases, and the test cohort consisted of 32 (30%) cases.In the test cohort, the accuracy of the R model, C model, and R-C model was 0.656, 0.781, and 0.875, and the AUROC of the three models was 0.827, 0.816, and 0.933, respectively.The ROC, P-R curve, and confusion matrix of the test and training cohorts of the different models are shown in Fig. 3. Precision, recall, and F1-score are shown in Table 3.While the Delong test indicated that there was no significant difference in AUROC between each pair of models in the test cohort (Table S3).It is noteworthy that the R-C model demonstrated notably improved accuracy compared to the R model (p = 0.039, Table S4).The R model, C model, and R-C model showed good performance; however, the R-C model had better prognostic ability than the others.
Herein, we adopted k-fold cross-validation to test the stability of the three models.After k-fold cross-validation (k = 3, repeat = 2), all three models showed stable performance in the test cohort and training cohorts (see Fig. 4A  and B).In the test cohort, the average accuracy of the R, C, and R-C models was 0.712, 0.792, and 0.844, the average AUROC of the three models was 0.775, 0.804, and 0.877, and the average AUPRC of the three models was 0.785, 0.760, and 0.859, respectively.Figure 4C showed the DCA of the three models in the test cohort.

Inspection of model features
Feature importance was also assessed using SHAP values in the trained R-C SVM model (see Fig. 5).The impact of each key feature on the model prediction was shown in SHAP values.For different trained models, the SHAP values of features may not be consistent.However, the importance of tumor burden (such as tumor diameter, tumor number, and PVTT) was stable in different cohorts.However, if we only used tumor burden to build the predictive model, accuracy and AUROC were poorer  than the R-C model (Figure S1).The average accuracy and AUROC were 0.769 and 0.792, respectively, when the tumor burden parameter was used to build the model; in the same test cohort, the average accuracy and AUROC of the R-C model were 0.788 and 0.814, respectively.

Discussion
In this study, the R-C model was proposed for the first time, which accurately predicted the downstaging outcomes of HCC patients in LT.Previous clinical studies focused on the survival after LT within successful downstaging; however, only a few studies concentrated on predicting the downstaging response in HCC patients [4,13,14,22].Since the reported downstaging failure rates varied, predicting the outcomes of pre-treatment in HCC patients provided individual treatment strategies.
The biological nature of a tumor involves multiple interacting components, which might be reflected when considering various features [23].In the current study, we constructed the R model, C model, and R-C model with k-fold cross-validation.The average accuracy of the three models was 0.712, 0.792, and 0.844, the average AUROC of the three models was 0.775, 0.804 and 0.877, and the average AUPRC of the three models was 0.785, 0.760 and 0.859, respectively.In terms of incorporating wavelet transform features, we performed a comprehensive comparative analysis of R model and original radiomics features with wavelet transform features model (R_w model).We filtered to obtain 18 and 22 features from R features and R_w features, respectively (Table S5).Upon meticulous evaluation of the results presented in Table S6 and Figure S3, it was evident that the R_w model did not surpass the performance of the R model on the ROC (p = 0.2291) and P-R curves.To strike a balance between interpretability and effectiveness in the model, the focus of this study was on choosing the most representative R model features instead of R_w ones.The R-C model, with better accuracy, AUROC and AUPRC, showed more stability than the other models.The R-C model consisted of clinical data, and radiomics features performed better than previous studies.Based on objective  Nowadays, radiomics revealed tumor heterogeneities, which made it possible to study the correlation between radiomics features and downstaging outcomes [20,[24][25][26].In our R-C model, we identified 10 predictors from 112 radiomics features.Four features were tumor morphological-related, including diameter, axis length (least and major), and sphericity.In addition to tumor sizerelated features, we also found that shape_sphericity was positively correlated to the downstaging outcomes, and the regular shape indicated good tumor biological behavior.Besides these, the other eight features were image original features, first order features and textural features.Some radiomics features, which had predictive implications that were not reported before, implied tumor heterogeneity and were difficult to identify by radiologists and physicians [20,27].To the best of our knowledge, although studies speculated that morphological features might be predictors of HCC downstaging outcomes, no study has demonstrated the correlation between radiomics and downstaging outcomes.However, the performance of the R model was not as stable as the R-C model, which might be because radiomics features could reveal tumor heterogeneities but failed to reflect liver function and macrovascular invasion, which is essential for outcomes [28,29].Thus, in the R-C model, we used clinical features to improve the accuracy of the prediction model.
In a review of downstaging for LT, most studies showed that Child-Pugh class and tumor burden were associated with the outcome of downstaging [2,14,16,[30][31][32][33]. Such studies focused on the differences between successful and failed downstaging groups but lacked a test cohort.One liver function-related feature (GGT) in our C model was considered a predictive feature.Based on the current study, the Child-Pugh class was not the predictive feature, which might be because the laboratory parameters were more objective in assessing liver function compared to Child-Pugh class [2,31,32].Previous studies demonstrated that when used alone, AFP has low sensitivity in HCC surveillance; however, imaging in combination with AFP reached optimal sensitivity [34].In the current study, the AFP was critical in different outcomes of downstaging, and AFP was one of the predictors in the R-C model.Previous studies recommended that tumor burden (such as tumor size, tumor number, tumor volume, or macrovascular invasion) was related to HCC survival, while some studies used tumor burden as a predictor of survival in patients who underwent transcatheter arterial chemoembolization [14,[30][31][32][33]35].However, the tumor size was controversial, and some studies found that necrosis was high in large tumors [36,37].In our study, the tumor burden-related features were critical in the R-C model, especially tumor diameter, tumor number, and PVTT.This finding was in agreement with previous studies.However, when we constructed a predictive model, only tumor diameter, tumor number, and PVTT were used, the predictive ability was limited.This indicated that the radiomics features were necessary for enhanced performance.
Nevertheless, the present study had some limitations.Firstly, it was a small patient cohort encompassing two centers, and the predictability of the tumor downstaging treatment was beyond the scope of our study.Secondly, it was a retrospective study but exhibited a potential predicted value.Further, a multicenter clinical trial should be designed to examine the R-C model and focus on finding the standardized downstaging protocol.
In conclusion, we investigated the downstaging outcomes of HCC patients for LT by analyzing the clinical data and radiomics features.The novel and practical R-C model accurately predicted the downstaging outcomes and could be applied as guidance for the downstaging treatment in the future.

Fig. 2
Fig. 2 Model development overview (A) Data preparation (B) Feature extraction (C) Model validation

Fig. 3
Fig. 3 Performance of the R model, C model, and R-C model (A) ROC and P-R curves of the three models in test and training cohorts (B) Confusion matrix of the three models in the test and training cohorts

Fig. 5
Fig. 5 Feature inspection (A) Feature inspection in specific R-C model (Train cohort) (B) Feature inspection in specific R-C model (Test cohort) (C-D) Feature inspection in specific patient

Table 1
Demographic and baseline data

Table 2
Univariate analysis for different outcomes of downstaging

Table 3
Performance in test and train cohorts of three models