Enhancing prognostic prediction in hepatocellular carcinoma post-TACE: a machine learning approach integrating radiomics and clinical features

Objective This study aimed to investigate the use of radiomics features and clinical information by four machine learning algorithms for predicting the prognosis of patients with hepatocellular carcinoma (HCC) who have been treated with transarterial chemoembolization (TACE). Methods A total of 105 patients with HCC treated with TACE from 2002 to 2012 were enrolled retrospectively and randomly divided into two cohorts for training (n = 74) and validation (n = 31) according to a ratio of 7:3. The Spearman rank, random forest, and univariate Cox regression were used to select the optimal radiomics features. Univariate Cox regression was used to select clinical features. Four machine learning algorithms were used to develop the models: random survival forest, eXtreme gradient boosting (XGBoost), gradient boosting, and the Cox proportional hazard regression model. The area under the curve (AUC) and C-index were devoted to assessing the performance of the models in predicting HCC prognosis. Results A total of 1,834 radiomics features were extracted from the computed tomography images of each patient. The clinical risk factors for HCC prognosis were age at diagnosis, TNM stage, and metastasis, which were analyzed using univariate Cox regression. In various models, the efficacy of the combined models generally surpassed that of the radiomics and clinical models. Among four machine learning algorithms, XGBoost exhibited the best performance in combined models, achieving an AUC of 0.979 in the training set and 0.750 in the testing set, demonstrating its strong prognostic prediction capability. Conclusion The superior performance of the XGBoost-based combined model underscores its potential as a powerful tool for enhancing the precision of prognostic assessments for patients with HCC.


Introduction
Primary liver cancer is the sixth most common cancer globally and the third leading cause of cancer-related deaths worldwide.Hepatocellular carcinoma (HCC) constitutes approximately 75-85% of all primary liver cancer cases (1).There are many treatment options for HCC, including local ablation therapy, liver transplantation, liver resection, transarterial chemoembolization (TACE), radiation therapy, and systemic treatment.Presently, surgical resection stands as the principal treatment approach for HCC.However, due to HCC often presenting without distinct early symptoms, many patients are already at the intermediate stage [Barcelona Clinic Liver Cancer (BCLC) B stage] according to the BCLC staging system at the time of diagnosis, missing the optimal timing for treatment (2).This limits the choice of treatment options and affects the patient's prognosis.
The TACE is a minimally invasive technique that uses imaging guidance to diminish the blood supply to a tumor.Through a catheter inserted into an artery, contrast materials are administered to block the tumor's blood vessels, thereby halting the growth of new blood vessels and causing cell death in the tumor.According to European and American HCC management guidelines, TACE is a commonly used interventional treatment method for patients with intermediate to advanced HCC, effectively delaying disease progression and providing a chance of survival for some patients (3)(4)(5)(6)(7).
Owing to considerable variability within the patients, the effectiveness and safety of TACE treatment for individuals with intermediate to advanced HCC can differ (8).Therefore, before treatment begins, an objective method must be available to accurately predict the prognosis of patients with HCC treated with TACE.For patients with HCC who are not expected to benefit from TACE, alternative treatment methods should be considered, such as using sorafenib or lenvatinib, while preserving liver function as much as possible to extend overall survival (OS) (9)(10)(11).
In recent years, with the advancement of imaging technology and the development of big data analysis techniques, radiomics has emerged as a new research field.Radiomics transforms medical images into high-dimensional, quantitative, and minable data through deep feature extraction and data analysis, quantifying tumor phenotypic characteristics and heterogeneity, and is considered a potential biomarker for personalized cancer treatment (12).Moreover, machine learning methods excel at handling the intricate interactions among complex variables, which is difficult for traditional models (13).Nowadays, some studies have reported substantial progress in the diagnosis, treatment response, and prognosis prediction of HCC by combining radiomics features, clinical information, and computer technology, especially in TACE treatment for patients with HCC (14-16).However, predicting prognosis for patients with HCC with the use of radiomics or clinical information by different machine learning algorithms has not been fully explored, and their performance may vary in different scenarios.
Therefore, this study aimed to develop and validate different prediction models using four machine learning algorithms.It includes radiomics, clinical, and combined models incorporating clinical information and radiomics features.The purpose of these models is to predict the OS of patients with HCC after TACE treatment, providing new insights and effective strategies for selecting treatment options.

Patients
To obtain the requisite data, we used the public data repository, The Cancer Imaging Archive (TCIA, https://www.cancerimagingarchive.net/) database.We collected the data of 105 patients with HCC who were treated with TACE from 2002 to 2012.The inclusion criteria specified that TACE must be the sole first-line or initial bridging therapy, accompanied by the availability of multiphasic contrast material enhanced computed tomography (CT) images at baseline, free from any image artifacts such as surgical clips.More information can be found in previous studies (17)(18)(19).
Patients were randomly divided into two categories at a ratio of 7:3: a training cohort (n = 74) and a testing cohort (n = 31).The training cohort was used to build the predictive models, while the testing cohort was used to validate the performance of the predictive model.

Image acquisition and segmentation
The dataset used was taken from TCIA and consisted of CT images from 105 patients.More information can be found in the TCIA database and previous studies (17)(18)(19).For the dataset from TCIA, expert radiologists meticulously annotated CT images from 105 patients using specialized software, focusing on the precise delineation of tumors and anatomical structures.Adhering to a standardized protocol, they outlined regions of interest on each slice, with their work undergoing rigorous review in consensus meetings to ensure accuracy and consistency.

Radiomics features 2.3.1 Radiomics features extraction
Feature extraction was based on Python 3.7 and implemented using the PyRadiomics software1 (20).The algorithms for obtaining radiomics features were referenced from the Image Biomarker Standardization Initiative (21).The extracted radiomics features can be divided into three groups: (1) first-order statistical features; (2) shape features, including two-dimensional and three-dimensional characteristics; and (3) texture features, including gray level co-occurrence matrix, gray level run length matrix, gray level size zone matrix, gray level dependence matrix, and neighborhood graytone difference matrix.

Radiomics features selection
Initially, within the training dataset, the Spearman rank correlation coefficient was used to determine the inter-feature correlations, retaining one feature from any pair with a correlation coefficient exceeding 0.9 to eliminate highly redundant features.To maximally preserve the descriptive power of features, a greedy recursive elimination strategy was applied for feature filtering, wherein the most redundant feature in the current set was removed at each iteration.Subsequently, further selection was performed using random forests, an ensemble learning method based on decision trees that assesses the contribution of each feature to the model's predictive performance.By evaluating the role of feature splits in the trees, random forests determined the extent to which feature splits improve model accuracy.Feature importance helped identify the most influential features for predicting the target variable (survival time).
Finally, univariate Cox proportional regression analysis was used to evaluate the impact of each variable on survival time.In this analysis, each variable was examined in relation to survival time separately, to ascertain its effect on survival risk.This method identified variables for subsequent model construction by calculating the hazard ratio and corresponding statistical significance (p-value) for each feature, incorporating variables from univariate regression analyses with a p-value of <0.05.

Clinical features selection
In the initial data preparation phase, features with more than 20% missing values were excluded to maintain the integrity and reliability of the dataset.This step was crucial to ensuring the robustness and comprehensiveness of the clinical information used for modeling.Following this, to simplify the model and enhance its interpretability, continuous variables were converted into binary variables (dichotomization).This process involved setting a threshold for each feature, above which values were coded as 1 and below as 0, thus categorizing patients into two distinct groups based on each feature's presence or absence.
For feature selection, univariate Cox proportional hazards regression analysis was used on the training set.This statistical method was used to assess the impact of each feature on the OS of patients with HCC, identifying variables that significantly affected the outcome.Features with a p-value less than 0.1 in this analysis were considered statistically significant and were selected for further modeling.

Construction and validation of models for survival prediction
The application of machine learning algorithms was carefully tailored, with specific parameters set to optimize their performance for radiomics and clinical data.The Cox proportional hazard regression model (Coxph) was parameterized to evaluate the risk factors with adjustments to its baseline hazard function and regression coefficients to suit the survival data.For the random survival forest (RSF) algorithm, many decision trees were constructed to improve prediction accuracy, with parameters such as the number of trees, maximum depth, and minimum samples per leaf tuned to prevent overfitting while capturing the complex interactions within the data.Gradient Boosting was utilized to minimize errors sequentially using decision trees, where the learning rate and the number of trees were critical parameters to balance bias and variance effectively.Finally, eXtreme gradient boosting (XGBoost) was used, as it is known for its efficiency and scalability.Parameters such as the learning rate, maximum depth of trees, and the number of estimators were optimized to enhance the model's ability to accurately predict the outcomes of patients with HCC.To validate these models, a 5-fold cross-validation method was used, assessing their prediction accuracy through the average area under the curve (AUC) on the testing cohort, thus ensuring a robust evaluation of each algorithm's predictive power.

Statistical analysis
Statistical analysis was conducted using R software version 4.2.3. 2  The normality of continuous data was tested using the Shapiro-Wilk test, with normally distributed data presented as mean ± standard deviation (x̅ ± s), and differences between two groups were analyzed using independent sample t-tests.Non-normally distributed data were presented as M (Q1, Q3), with differences between groups analyzed using the Mann-Whitney U-test.Categorical data were compared using the chi-square test, with a p-value of <0.05 considered statistically significant.OS was regarded as the primary outcome.Receiver operating characteristic (ROC) curves were generated using the "pROC" package.The model that achieved the highest AUC was chosen as the best prediction model.ROC curves and C-index were used to assess the predictive capability of different models for the prognosis of patients with HCC treated with TACE.

Clinical baseline characteristics of patients
Table 1 displays the clinical characteristics of patients in the training cohort (n = 74) and the testing cohort (n = 31).A total of 12, 24, 66, and 3 patients were in BCLC stages A, B, C, and D, respectively.In the training cohort, most patients (71.62%) were diagnosed at age over 60 years, and 66.22% were men.Additionally, most patients were not diagnosed with vascular invasion (82.43%) or diabetes (66.22%).Moreover, 74.32% of the patients had cirrhosis, and 94.59% had metastasis.No significant differences were observed in clinical characteristics between the training and testing cohorts (p > 0.05).

Radiomics features screening results
A total of 1,834 radiomics features were initially extracted.Using the Spearman rank correlation coefficient to assess the inter-feature correlations, 233 features were retained.Subsequent selection through random forests resulted in the preservation of 13 features owing to the importance scores.Finally, univariate Cox regression analysis included three variables (p < 0.05), comprising two first-order features and one texture feature.Among these, the feature square_glcm_ClusterShade made the most significant contribution.Detailed information about the process is depicted in Figures 1, 2. All selected features were utilized in constructing the radiomics and combined models.

Clinical characteristics included analysis
The clinical characteristics of patients in the training cohort are presented in Table 2. Based on the univariate Cox proportional hazards regression analysis within the training cohort (p ≤ 0.1), only age at diagnosis, TNM staging, and metastasis were associated with OS in patients with HCC.Age at diagnosis helped in understanding the survival prognosis, with older age potentially indicating a poorer outcome.The TNM stage, an indicator of cancer progression, provided crucial information on tumor size, lymph node involvement, and the extent of metastasis.The presence of metastasis, indicating the spread of cancer to other parts of the body, was another critical factor influencing survival rates.These findings underscored the importance of these variables in predicting the survival outcomes of patients with HCC.They were used to refine the predictive model for better accuracy and clinical relevance.

Model performance
Model performance is shown in Tables 3, 4. ROC curves for each model in the training and testing cohorts are shown in Figure 3.Among all machine learning models, the evaluation of combined models generally outperformed the radiomics or clinical models in predicting HCC prognosis in the training and testing cohorts.Specifically, the XGBoost model in the combined models showed the best performance, achieving an AUC of 0.979 in the training cohort and an AUC of 0.750 in the testing cohort, demonstrating strong prognostic prediction capability.The AUC of the combined model was significantly higher than Heatmap of 233 radiomics features correlations according to Spearman rank correlation coefficient.

Discussion
Primary liver cancer is the sixth most common cancer globally.Most patients are diagnosed at an intermediate stage according to the BCLC staging system (stage B), for which TACE is considered the preferred treatment option.This plays an important role in managing patients with HCC (1, 7).However, the treatment response to TACE among patients with HCC often exhibits considerable individual variability, and liver function usually declines in patients with intermediate-stage HCC compared with healthy individuals.Moreover, TACE is highly likely to impose an additional burden on the liver; therefore, accurate preoperative  prediction is crucial for treating and managing patients with HCC (22).In this study, we constructed four machine learning models based on patients' CT images and clinical information.The combined models outperformed single-feature models in predicting the prognosis of patients with HCC treated with TACE, with the XGBoost combined model demonstrating the best performance.
Recently, the rapid advancement of radiomics has enhanced the accuracy of clinical diagnosis and prognosis assessment.Radiomics extracts tissue and lesion characteristics, converting potential pathological and physiological information in images into mineable high-dimensional quantitative image features for analysis, training, and validation, providing a powerful tool for modern medicine to address clinical problems (23,24).This applies to HCC as well, where radiomics is extensively used.29) have also confirmed that radiomics features can predict the prognosis of many cancers.This suggests the potential of radiomics in predicting the prognosis of patients with HCC after TACE.Therefore, in our study, we opted to incorporate radiomics features into constructing machine learning models.We selected three radiomics features, including two firstorder features and one texture feature: square_glcm_ClusterShade, wavelet_HHL_firstorder_Skewness, and wavelet_LHL_firstorder_ 10Percentile.
However, interpreting the relationship between radiomics features and complex tumor biological processes remains challenging, prompting the inclusion of clinical information in our analysis.Regarding clinical information, our study identified age at diagnosis, metastasis, and TNM stage as significant variables affecting the prognosis of patients with HCC, with metastasis occurrence being the most critical variable based on importance scores.Combined models using clinical information and radiomics features outperformed radiomics or clinical models, echoing findings from previous studies.Machine learning is widely applied in the medical field because of its high predictive accuracy (33).To date, various machine learning algorithms have been utilized to predict survival, prognosis, and treatment efficacy in patients with intermediate or advanced HCC treated with TACE.However, previous studies usually limit themselves to a single machine However, because of the limited sample size in this study, machine learning might be a more suitable choice.In the future, we will obtain more samples for further in-depth research.
The limitations of this study included: (1) The ideal TACE candidates are patients in BCLC stage B; however, most patients (n = 70) were in BCLC stages C and D. (2) Data were selected in a single center, and external validation from other research centers is needed to improve the universality of the predictive model.(3) The sample size was small.(4) The relatively small number of patients included could have led to model overfitting, and increasing the number of cases would enhance the model's generalizability.(5) This study was retrospective, lacking a prospective study, and subject to selection bias.(6) An easy-to-use application designed for machine learning algorithms is lacking.

Conclusion
The purpose of this study was to investigate how four machine learning algorithms utilize radiomics features and clinical information to predict the prognosis of patients with HCC treated with TACE.By applying feature selection methods and testing various machine learning algorithms, it was found that the combined model notably outperformed those based solely on radiomics or clinical features.Among the four algorithms, XGBoost emerged as the most effective, demonstrating the model's enhanced predictive power in forecasting patient outcomes.This underscores the potential of integrating radiomics and clinical data through advanced machine learning techniques such as XGBoost to improve prognostic predictions in patients with HCC.

FIGURE 3 ROC
FIGURE 3 ROC curves of four machine learning algorithms in the training cohort: radiomics models (A), clinical models (B), and combined models (C), in the testing cohort: radiomics models (D), clinical models (E), and combined models (F).
Liu et al. (25) investigated the OS of patients with HCC after hepatectomy.Feng et al. (26) built a radiomics model to predict the macrotrabecular-massive subtype in patients with HCC.Xia et al. (27) extracted radiomics features from images to predict microvascular invasion status in patients with HCC.Tong et al. (28) and Khodabakhshi et al. ( Ning et al. (30) conducted a study that combined radiomics signatures and clinical information to predict early recurrence in HCC.The combined model demonstrated the highest predictive power in the training and validation datasets, with AUCs of 0.846 and 0.737, respectively.Fang et al. (31) and Geng et al. (32) also drew a similar conclusion consistent with our results, where the combined model was a better predictor than the clinical or radiomics models.

TABLE 1
Baseline of 105 enrolled patients from TCIA.

TABLE 2
Clinical variables for predicting survival in the univariate Cox analysis.FIGURE 2Forest plot of radiomics features selection with univariate Cox regression analysis.the AUC decreased to 0.524 and 0.571, respectively.The clinical models displayed consistent accuracy across all algorithms, with an accuracy of 0.838 in the training and 0.742 in the testing cohorts.Conversely, the radiomics models and the combined models exhibited significant fluctuations in accuracy across different algorithms.

TABLE 3
Performance of different machine learning algorithms in the training cohort.

TABLE 4
Performance of different machine learning algorithms in the testing cohort.
The XGBoost algorithm iteratively optimizes the structure of trees to minimize the loss function, introduces L1 regularization to reduce the number of leaf nodes of decision trees, and introduces L2 regularization to reduce the weight of leaf nodes of decision trees, among other iterative optimizations, enhancing the model's generalization ability.XGBoost is distinct from other machine learning algorithms because it captures complex and non-linear relationships between features and outcomes.It efficiently processes complex, high-dimensional data, handles missing values effectively, and prevents overfitting.This makes XGBoost particularly well-suited for highdimensional data scenarios such as radiomics.Our study demonstrated the great potential of the XGBoost in accurately predicting the prognosis of patients with HCC treated with TACE, especially when based on radiomics features and clinical characteristics.Apart from machine learning, we have also identified other advanced algorithms, such as deep learning, which have achieved notable successes in various fields (39, 40).