Development and validation of a risk score for predicting mortality after resection of primary hepatocellular carcinoma

Background: Primary hepatocellular carcinoma (PHCC) has a poor prognosis and high short-term mortality rate, even after resection. Thus, early diagnosis in PHCC cases can help improve quality of life via personalized management strategies. Results: The risk score system (RSS) were classified as low risk (<5 points), medium risk (5–10 points), or high risk (>10 points). The areas under the receiver operating characteristic curves were 0.80 in the training cohort and 0.69 in the validation cohort, which indicated satisfactory prognostic performance. The Hosmer-Lemeshow goodness of fit test (P>0.05) revealed consistent performance in both groups. The concordance index (C-index: 0.663, 95% CI: 0.618–0.708) revealed excellent discrimination and good calibration in the validation cohort. Conclusions: This simple RSS, which is based on clinical and laboratory data from patients undergoing resection of PHCC, might allow clinicians and medical staff to better manage PHCC. Materials and Methods: A total of 672 PHCC cases were retrospectively obtained from the First Affiliated Hospital of Wenzhou Medical University between January 2007 and February 2015. Cox proportional hazard models were used to identify independent predictors of mortality. Kaplan-Meier curves and the log-rank test were used to examine the relationships between the prognostic factors and overall mortality.


INTRODUCTION
In 2018, the International Agency for Research on Cancer (IARC) present an updated overview of the global cancer burden, which indicated that liver cancer has become the fourth leading cause of cancer-related death [1], an increase from its position as the sixth leading cause in 2012 [2]. Hepatocellular carcinoma (HCC) accounts for approximately 80% of primary liver cancer cases, with high-risk HCC areas (including China) having rapidly increasing HCC incidence and mortality rates, with the most prevalent risk factors being the hepatitis B virus in China and the hepatitis C virus in Japan [3]. Relative to transplantation or other treatment, resection based on the tumor's extent is the preferred first-line treatment based on better survival outcomes, especially for early-stage HCC cases [4]. Therefore, the focus of recent HCC research has been concentrated on the effective management of surviving patients. Nevertheless, this approach requires an early evaluation of mortality risk at the resection, which guides the subsequent patient management decisions.
An increasing number of reports have focused on pathological or laboratory parameters that can predict the prognosis of various diseases and tumors [3,[5][6][7].
AGING However, single factors are often ineffective for diagnosis and do not provide satisfactory results, which suggests that using multiple biomarkers can improve prognostic accuracy. For example, various prognostic tools have been developed to predict recurrence-free overall survival (OS) based on multiple significant risk factors [8][9][10]. Moreover, various risk factors have been found to predict primary HCC (PHCC)-related mortality, and several studies have shown that some internationally recognized and widely used systems can help predict the prognosis in PHCC cases [11][12][13][14]. Unfortunately, there is no universal and standardized scoring system for use in PHCC cases, as the existing staging and risk scoring systems (RSSs) have various limitations. Such as TNM (T: the extent of the primary tumor; N: regional lymph nodes; and M: distant metastases) system or the Child-Pugh-score, which only depends on tumor stage or impairment of liver function, thus can't explain the complexity of liver cancer in cirrhosis [15]. BCLC has been criticized for being too algorithmic to beneficial for clinical use [16]. Besides, The Okuda system has proven to be inadequate for patients with less advanced disease (early HCC) as well as is not bad but far from the excellent predictive ability [17]. Therefore, we attempted to overcome this limitation by constructing a simple and effective risk score system, which aimed to predict all-cause mortality after resection of PHCC.

Characteristics of the study population
This retrospective study identified 672 eligible patients between 2007 and 2015. Most patients were men (83.33%) and the median age was 57.0 years (IQR: 49.0-64.0 years). The patients were randomized 1:1 to the training cohort (336 patients) and the validation cohort (336 patients), with no significant differences detected in the cohorts' baseline characteristics except the tumor size and platelet count (Supplementary Table 1 Table 1 shows the characteristics of the 336 patients in the training cohort (280 men and 56 women, median age: 57.0 years, IQR: 49.0-63.0 years). The univariate Cox regression models revealed that mortality was associated with LNM, tumor stage, satellite nodules,  single/multiple tumors, PVTT, vascular infiltration,  IATO, presence of ascites, tumor size, and values  for PT, neutrophil count, albumin, TBIL, TC, AST, and γ-GTT. The multivariable Cox regression analyses revealed that mortality was independently predicted by higher tumor stage, multiple tumors, PVTT, IATO, greater tumor size, and elevated values for PT and AST.

X-tile analysis and survival analysis
As tumor size, PT, and AST were continuous variables that were associated with mortality, we aimed to identify the optimal cut-off values using X-tile analysis, which is an effective tool for evaluating biomarkers' and other factors' abilities to predict patient survival. To the best of our knowledge, this is the first study to examine the optimal prognostic cut-off values for these three variables using X-tile analysis. The results are shown in Figure 1, which indicated that the optimal cut-off values were 3.0 cm for tumor size, 14.5 s for PT, and 65.0 U/L for AST. Figure 2 shows the Kaplan-Meier curves for predicting mortality based on IATO, PVTT, tumor stage and single/multiple tumors. The results indicate that those factors were strong predictors of mortality. As Figure 2 was shown, the cumulative mortality rates were 95.0% and 43.0% in the population with and without IATO as well as 100.0% and 44.6% in the population with PVTT and without PVTT, respectively. In addition, the corresponding mortality rates were 41.7% in the Grade1/2 tumor group and 59.8% in the Grade3/4 tumor group, as well as 42.6% for single tumor and 72.5% for multi tumor.

Development of the RSS
The seven independent predictors from the previous section were used to create an RSS, based on their beta regression coefficients (Table 2). These scores were calculated using linear transformations of the corresponding beta coefficients (divided by 0.37, the minimum beta value for tumor stage), multiplication by a constant value of 2, and rounding to the nearest integer. The reference groups for the categorical variables were assigned scores of 0, based on a beta coefficient of zero [18]. The total risk score was then calculated using the following formula:

AGING
In the training cohort, the total scores were subsequently classified as <5 (low risk of mortality), 5-10 (medium risk of mortality), and >10 (high risk of mortality) ( Figure 3, A1 and A2). The cumulative probabilities of OS in the risk groupings are shown in Figure 3 (A3).

Validation of the RSS
Total prognostic scores were then calculated in the validation cohort base on the RSS. The Kaplan-Meier curves of validation cohort between risk groups (P<0.001), were similar to training cohort, showed that the RSS was robust in predict OS ( Figure 3B). The effectiveness and accuracy of the RSS was evaluated by comparing it to various other widely used systems in the training and validation cohorts ( Figure 4). Relative to the CLIP, BCLC, TNM, and Okuda systems, the RSS provided the strongest ability to predict mortality in PHCC cases based on AUC values of 0.80 in the training cohort and 0.69 in the validation cohort. In addition, we examined the proportions of the patients assigned to the risk groupings in the validation cohort (low risk: 23.2%, medium risk: 50.0%, high risk: 26.8%), which were similar to the proportions from the training cohort (low risk: 28.3%, medium risk: 47.9%, high risk: 23.8%). Relative to the low-risk group, the high-risk group had elevated risks of mortality in the training cohort (HR: 9.681, 95% CI: 5.664-16.549) and in the validation cohort (HR: 3.211, 95% CI: 2.045-5.042). Moreover, we determined the differences in the risk of mortality between the low-risk and high-risk groups, which were determined to be 65.9% in the training cohort and 36.5% in the validation cohort. This result indicated that the RSS had excellent AGING discriminating ability, and the Hosmer-Lemeshow test revealed good calibration in the training and validation cohorts (both P>0.05). Finally, the C-index value of 0.663 (95% CI: 0.618-0.708) in the validation cohort indicated that the RSS had excellent discrimination and good calibration (Table 3).

DISCUSSION
The most common primary liver cancer is PHCC, which is also the main cause of death in people with cirrhosis [16] . However, relative to the previous decades, recent therapeutic improvements have enhanced the survival of current PHCC patients, who receive effective treatment using surgical resection, ablation, trans-arterial chemoembolization, and hepatic arterial infusion chemotherapy plus sorafenib [19,20]. Hepatectomy is widely recognized as a treatment for select HCC patients, in whom it provides similar long-term outcomes among patients with both early-stage and advanced disease [21,22]. Although various prognostic scores have been developed for PHCC [23,24], these scores have not been accepted by consensus as a standardized system [25,26] for predicting outcomes in this setting, some internationally recognized and widely used systems were also not satisfactory in terms of predicting OS. Thus, predicting the postoperative survival of PHCC patients remains a challenge for clinicians, and it would be useful to have a reliable scoring system that was based on one or several biomarkers and clinical factors. This study developed and validated an RSS for predicting OS after resection of PHCC, and it appears that the simple RSS may be a useful tool for clinicians and medical staff to identify patients with increased mortality risk, who may require personalized management strategies. 2 main findings were summarized as follows: First, we report a negative correlation between OS of PHCC after resection and common clinical or laboratory parameters, which are higher tumor stage, multiple tumors, PVTT, IATO, greater tumor size, and elevated values for PT and AST. Second, A risk score were calculated by above risk factors using their corresponding beta coefficients, the RSS results were then classified as three risk groups including low, medium and higher risk for predicting OS.   AGING In this context, previous studies have indicated that tumor stage, tumor size, and multiple tumors were strong predictors of poor OS in PHCC cases [27][28][29].
The present study was also emphasized the importance of tumor burden consistent with those findings and identified the optimal cut-off value as 3.0 cm of tumor size based on the X-tile analysis, as well as optimal cutoff values for PT and AST, which are known markers of liver failure based on previous studies [30,31]. Thus, we believe that these useful thresholds, which were determined using a robust statistical approach, may enhance the clinical utility of the RSS. Mizumoto et al. and Zhang et al. [32,33] have also reported that the absence of PVTT was associated with good survival in HCC cases, and our findings support this relationship by indicating that the presence of PVTT predicted a poor prognosis in PHCC cases. Previous research has also indicated that a tumor's invasion of adjacent and distant tissues is a risk factor for mortality [34], which explains why IATO was a significant risk factor for mortality in our PHCC cases. Interestingly, our RSS did not consider various clinical factors (e.g., age and sex) that are reportedly associated with mortality in PHCC cases [35]. This may be related to the laboratory parameters being more sensitive for predicting mortality, which would result in the clinical factors being omitted from the multivariable model.
The RSS are more accurate than traditional staging systems in predicting the prognosis of PHCC. The advantage is that the RSS is easy to calculate based on common clinical and laboratory parameters. Moreover, the accuracy comparison of AUC curve showed that the RSS predicted the best OS after curative liver resection than the CLIP, BCLC, TNM, and Okuda systems in the training cohort. Because we intended the RSS to help guide personalized treatment of high-risk PHCC cases, we categorized the risk scores into three groups (high, medium, and low risk). Using this approach may allow for customized approaches to managing these risk groups, which may improve the cost-effectiveness of their treatment. We also validated the RSS using the validation cohort, which confirmed that the RSS had better prognostic performance than the CLIP, BCLC, TNM and Okuda systems. These results suggest that the RSS may be clinically useful, robust, effective, and accurate for predicting mortality in PHCC cases.
The present study has two important limitations. First, the retrospective single-center design is associated with a risk of selection bias, as well as decreased statistical efficiency and testing power. Second, the patients were all from the same region of China (Wenzhou), and the results may not be generalized to other areas or ethnicities. Therefore, additional external verification is needed to confirm whether our RSS is useful in other regions and patient populations.
In conclusion, the present study revealed that an RSS could effectively predict OS in PHCC cases, and that may help clinicians and medical staff to improve patients' quality of life after resection of PHCC.

Research design and data sources
This retrospective study evaluated 672 patients who underwent resection of PHCC at our hospital between January 2007 and February 2015. Face-to-face interviews had been performed before the operation to collect data regarding sex, history of alcohol abuse, age, height, weight, and body mass index (BMI, kg/m 2 ). Preoperative blood tests had also been routinely performed to collect data regarding alpha fetoprotein (AFP) levels, prothrombin time (PT), fibrinogen (FIB) levels, neutrophil counts, monocyte counts, lymphocyte counts, platelet (PLT) counts, albumin (ALB) levels, total bilirubin (TBIL) levels, total cholesterol (TC) levels, aspartate transaminase (AST) levels, γ-glutamyl transpeptidase (γ-GT) levels, and alanine transaminase (ALT) levels. Surgical records were searched to collect data regarding the pathological diagnosis, ascites, cirrhosis, tumor size, tumor capsule, single or multiple tumors, satellite nodules, degree of tumor differentiation, peri-cancerous invasion, bile duct infiltration, lymph node metastasis (LNM), microvascular invasion (MVI), nerve infiltration, portal vein tumor thrombus (PVTT), intrahepatic metastases, and invasion of adjacent tissues or organs (IATO). The date of the hepatectomy was defined as the start of follow-up and the final outcome was recorded based on the date of death or the last known status on August 1, 2018. Strict data quality control measures were performed using double verification before the data analysis. The research was approved by our hospital ethics committee. Patient consent was obtained by telephone.

Patients and testing methods
The inclusion criteria were: 1) complete blood test data from before the surgery; 2) a postoperative pathological diagnosis of PHCC; 3) complete tumor resection during the surgery; 4) no preoperative cancer treatment; 5) no history of other malignant tumors or tumor-related complications; 6) good heart, brain, and kidney functioning before the surgery; and 7) no severe postoperative complications, such as massive bleeding or liver failure. Detailed information indicates the reasons for the number of patients screened excluded and included can be seen in Supplementary Figure 1. Based on these criteria, we excluded 380 PHCC cases among 1052 PHCC patients in the present research. In the follow-up period, we finally observed 321 dead PHCC patients as the case group and 351 alive PHCC patients as the control group in this nested case-control study. After that, we identified 672 PHCC cases that were retrospectively randomized in a 1:1 ratio to the training cohort (N=336) and the validation cohort (N=336).
All patients underwent routine blood collection before surgery and the samples were sent to a laboratory for analysis. Tumor marker levels (ie, AFP levels) were determined using the Unicel DXI-800 Analyzer (Beckman Coulter Inc., Japan). Markers of coagulation function (eg, PT, FIB levels, and other factors) were measured using a STA-R automated coagulation analyzer (French diagnostic criteria). Blood biochemical indicators (eg, TBIL, ALT, and AST) were measured using a Beckman AU5800 automatic biochemical analyzer (Beckman Coulter Inc., Japan). Routine testing for blood AGING components (eg, PLT, lymphocyte, and neutrophil counts) was performed using a SysMex XE-2100 automated blood cell analyzer (SysMex Corporation, Japan). All testing methods were performed as previously described [8]. The tumor samples were reanalyzed and the pathological diagnosis was confirmed by the pathology department at our hospital.

Statistical analysis
Categorical variables were reported as number (percentage), while continuous variables were reported as median (interquartile range [IQR]) based on the apparently skewed distributions. Depending on the training set, we performed univariate and multivariable Cox proportion hazard regression models to screen potential prognostic factors. According to the Akaike information criterion (AIC), we fitted a series of different multivariable models, factors with p-values over 0.05 in the univariate Cox proportion hazard regression models would be removed from the multivariable Cox proportion hazard regression model. Optimal cut-off values for the independent continuous variables were identified using X-tile analysis in the training cohort. Kaplan-Meier curves and the log-rank test were used to identify differences in survival probability.
Independent risk factors identified using the multivariate Cox proportional hazards model were assigned risk scores based on their β regression coefficient. The total risk score was then calculated for each patient by adding together the individual risk scores for each applicable risk factor. The discriminative ability of the RSS was estimated using the Hosmer-Lemeshow goodness of fit test and the concordance index (C-index) in the training and validation cohorts. Receiver operating characteristic (ROC) curves were used to compare the predictive abilities of the RSS and various traditional prognostic models.
All tests were two-sided and P-values of ≤0.05 were considered statistically significant.

Data availability
All study-related data are included in the published report.