Patients and dataset
The institutional ethics review boards of Zhujiang Hospital of Southern Medical University (2020-KY-094-01) and Sun Yat-Sen University Cancer Center (SL-B2021-214-02) approved this retrospective study, and the requirement for informed consent was waived. Patients with suspected HCC who underwent Gd-EOB-DTPA-enhanced MRI scans between January 2012 and September 2018 at two institutions prior to curative resection were consecutively included. The inclusion criteria were as follows: (a) patients with pathological confirmation of HCC; (b) patients with Barcelona Clinic Liver Cancer (BCLC) stage 0, A, or B HCC; (c) patients received no previous anti-cancer treatment; and (d) patients who underwent Gd-EOB-DTPA-enhanced MRI of the liver within one month before surgery. The exclusion criteria were as follows: (a) recurrent HCC or combined with hepatocyte cholangiocarcinoma or metastatic tumor in the liver; (b) without radiographic MVI or extrahepatic metastasis; (c) incomplete clinical, radiological, pathological, or follow-up data; and (d) patients died due to postoperative complications or liver cancer rupture within two weeks (Supplementary Fig. 1). All patients were randomly divided into the training and validation sets at a ratio of 7:3.
Baseline clinicopathological data were collected from electronic medical records. Clinical data included demographics and time to early recurrence. Laboratory features included neutrophil count, Hepatitis B virus DNA, α-fetoprotein (AFP) level, alanine aminotransferase (ALT), aspartate aminotransferase (AST), and γ-Glutamyltranspeptidase (GGT). Pathologic data were the presence of MVI, defined as tumor emboli in a vascular space lined by endothelial cells on microscopy [12], Barcelona Clinic Liver Cancer (BCLC) stage, and tumor number.
Follow-up surveillance and clinical endpoint
All patients were followed up for at least two years after curative resection. Patients were screened for tumor recurrence through serum AFP level, ultrasound, contrast CT or MRI scan of the chest and abdomen in the first month after surgery and once every three months thereafter during the first year and every six months thereafter. The censored follow-up time was October 1, 2020.
The study endpoint was early recurrence, which was defined as one (or more) of the following events that occurred within two years after curative resection: (a) presence of new hepatic lesions with typical imaging findings of HCC, (b) atypical imaging findings with biopsy or re-postoperative pathology-confirmed HCC, or postoperative transarterial chemoembolization-revealed tumor staining, and (c) extrahepatic metastases.
MR imaging acquisition
The MRI machines and parameters are provided in Supplementary Methods and Supplementary Table 1.
Qualitative analysis of MR Images
Two radiologists with 5 and 10 years of experience in diagnostic abdominal imaging independently observed the imaging features without prior knowledge of the pathological findings, and when they disagreed on the results, the decision was made by mutual agreement. The MR features included: (a) tumor size, defined as maximum diameter on transverse HBP images; (b) arterial phase (AP) enhancement type: type 1 represents a homogeneous enhancement pattern with no increased arterial blood flow; type 2 represents a homogeneous enhancement with increased arterial blood flow; type 3 represents a heterogeneous enhancement containing non-enhanced areas; type 4 represents a heterogeneous enhancement pattern with irregular ring-like structures [13-15]; type 5 represents a heterogeneous and hypointense; (c) capsule appearance: peripheral rim of uniform and smooth hyperenhancement in the portal phase (PVP) or delayed phase, and categorized into three groups (absent, incomplete and complete) [16]; (d) hypodense halo: a rim of hypointensity partially or wholly surrounding the tumor; (e) intratumor necrosis, a low signal on T1-weighted imaing (T1WI), a high signal on T2-weighted imaging (T2WI), and a low signal on all enhanced phases; (f) satellite nodules, defined as small (< 2 cm) tumor nodules close (< 2 cm) to the main tumor [17]; (g) peritumoral hypointense, defined as a flame-like or wedge-shaped hypointense areas of the hepatic parenchyma around tumor on HBP images [18]. Supplementary Fig. 2 shows the images of the MR features.
Image segmentation and DL feature extraction
The regions of interest were delineated around the boundary of tumor at the largest dimension. A state-of-the-art architecture VGGNet-19 was then applied to extract 1472 DL features from the AP, PVP, and HBP images, respectively. The DL network contains five convolutional layers, four max-pooling layers, three fully connected layers, and a softmax layer. The DL workflow is shown in Fig. 1. More details are provided in Supplementary Materials.
Feature selection and deep learning signature development
The DL features were subjected to the following steps: feature value preconditioning, de-redundancy, and dimensionality reduction to select the features strongly related to early recurrence; then, machine learning methods were used to predict the status of outcome events and establish a DL signature that can predict early recurrence. All features were first normalized to the range of [0,1] by the minimum-maximum normalization method. Moreover, Spearman correlation analysis was added to retain features associated with the early recurrence of HCC (P < 0.05). Then, the Pearson correlation coefficient (i.e., r) was used to remove one redundant feature with a lower r from the feature pairs (r > 0.9). The high predictive features obtained were further screened by variance analysis, recursive feature elimination (RFE), and Relief algorithm. Five types of classifiers, including random forest (RF), support vector machine (SVM), least absolute shrinkage and selection operator logistic regression (LASSO), Adaboost, and Gaussian Process (GP), were compared to identify the outcome status of early recurrence for every phase of the DL features.
Clinical and deep learning analysis
A two-sided P-value was considered statistically significant if < 0.10. Univariate logistic regression analysis was performed in the training set, and significant variables P < 0.10 were entered into the multivariate logistic regression using the forward likelihood ratio method to identify the independent risk factors for early recurrence. A two-sided P-value < 0.05 was considered statistically significant. The nomogram was plotted based on the results of multivariate logistic regression models.
Collinearity analysis of conventional clinical factors and DL signatures was also performed. The evaluation indicators were tolerance and variance inflation factor (VIF), tolerance value < 0.1, or a VIF value > 5, considered as collinearity between two variables.
Statistical analysis
Comparisons between two groups were conducted using the Chi-square test or Fisher’s exact test for categorical variables while the Mann-Whitney U test for continuous variables.
The receiver operating characteristic curve (ROC) analysis was employed to calculate the area under the curve (AUC), accuracy, sensitivity, and specificity. Comparisons between different DL signatures, and between different models were performed using the Delong test. Model fit was assessed using calibration plots using 1000 bootstrap resamples. The clinical utility of the models was evaluated using decision curve analysis (DCA). Softwares and packages for statistical analyses are provided in Supplementary Materials. All statistical tests were two-sided with a significance level of 0.05.