Identification and Validation of Tumour Microenvironment-Based Immune Related Signature for Hepatocellular Carcioma: Immunotherapeutic Implications


 Background: Even though treatment outcomes for hepatocellular carcinoma patients have significantly improved, prognostic clinical evaluation remains a substantial challenge due to the heterogeneity and complexity of cancer. Accumulating evidence has revealed that the tumor immune microenvironment is critical for progression and prognosis of hepatocellular carcinoma. A powerful predictive model could assist physicians to better monitor patient treatment outcomes and improve overall survival rates. Therefore, we introduced tumor immune-related genes into a model that could be used for patient risk classification. Results: First, the Single-sample gene set enrichment analysis (ssGSEA) and Weighted gene co-expression networks construction (WGCNA) methods were applied to identify highly associated immunity genes. Following this, a multi-immune-related gene-based signature determined by The least absolute shrinkage and selection operator (LASSO) Cox regression analysis was used to determine risk stratification. In addition, this predictive model was evaluated according to its performance as a prognostic model in the training and testing datasets. Furthermore, tumor mutation burden and biological enrichment analysis were applied to reveal the potential mechanisms through which the gene signature functions. Conclusion: In conclusion, our four-gene signature model may be clinically applied in hepatocellular carcinoma patients at high risk of mortality for personalized therapy.


Introduction
Hepatocellular carcinoma (HCC), which is the most common type of primary liver cancer and is characterized by a high degree of heterogeneity (Sia, et al., 2017), is the second leading cause of cancerrelated deaths worldwide (Ho, et al., 2019).Beyond standard systemic therapies such as targeted therapy and chemotherapy, accumulating research demonstrates robust and durable responses from immune checkpoint inhibition (ICI) (Keenan, et al., 2019). Immunotherapy is becoming the new standard of care for advanced HCC worldwide (Waidmann, 2018), therefore, providing new therapeutic opportunities by modulating the tumor microenvironment immune response (Sim and Knox, 2018). Treatment options for patients with HCC are rapidly changing and increasing owing to the development of biological and clinical knowledge (Greten, et al., 2019). Even during the treatment process, patients face a high risk of recurrence due to lack of personalized treatment (Wan, et al., 2010). The e cacy and safety of immunotherapy should be evaluated during clinical treatment of malignancies in HCC (Li, et al., 2015). Therefore, it is necessary to develop an independent prognostic signature at the molecular level to determine hepatocellular carcinoma prognosis based on the host immune status (Chen, et al., 2011).
With the rapid development of big bioinformatics data, an approach that integrates gene signatures with classical clinical parameters may provide a great advantage in cancer prognosis (Xu, et al., 2017).
Molecular classi cations based on somatic mutation pro les and RNA expression pro les related to patient prognosis will enhance precision medicine during immune responses (Nakagawa, et al., 2019).
There are signi cant and consistent immune system alterations in hepatocellular carcinoma, and some prediction-related models have been established (Bhattacharya, et al., 2016;Bruix and Llovet, 2002).
Owing to the availability of various public cohorts providing mRNA and protein expression data, we were able to investigate the prognostic roles of immune-related genes in hepatocellular carcinoma using The Cancer Genome Atlas (TCGA) datasets in this study (Tomczak, et al., 2015).
In the current study, we integrated TCGA Hepatocellular Carcinoma (LIHC) cohorts with transcriptional activity (424 cases total) to select immune-related genes for prognostic prediction (Zhu, et al., 2014).
Additionally, we analyzed the affected genes to select for optimal immune-related genes for prognosis based on risk score strati cation. Furthermore, an integrated analysis was performed to examine the accuracy of the model in predicting overall survival. Finally, we explored the role of four immune-related genes in different clinical characteristics and biological functions.

Results
Immuno-pro ling identi es three subtypes in the TCGA dataset To cluster hepatocellular carcinoma samples from the TCGA dataset, we rst calculated the enrichment levels of the samples using ssGSEA scores from 29 immune gene signatures. Next, samples were clustered hierarchically based on ssGSEA scores, stratifying the samples into three clusters (Fig. 1A). The three clusters were de ned as Immunity High (Immunity_H), Immunity Medium (Immunity_M), and Immunity Low (Immunity_L). In addition, our correlation analysis showed that tumor purity was signi cantly lower in the Immunity_H cluster and signi cantly higher in the Immunity_L cluster (Kruskal-Wallis test, Fig. 1B). To measure the level of immune cell in ltration, the purity of the tumor, and the stromal content scores in the three clusters, we used the R package "ESTIMATE". As shown in Fig. 1C, the three subtypes were clustered and divided by lymphocyte_in ltration, tumor_purity, stromal_score, immune_score, and percent of lymphocyte in ltration.
Construction of weighted gene co-expression network WGCNA analysis is widely applied to examine the Pearson correlation coe cient between gene expression levels (Wu, et al., 2019). Here, we used WCGNA analysis to construct a gene co-expression network in the immunity_H subgroup. To identify the immunity-related correlation gene modules, the mRNA modules in each subtype were analyzed. As presented in Fig. 2A, immunity_H was signi cantly correlated with the green module, MEpurple, MEsaddlebrown, MEorange, METan, and MEdarkorange. The genes in the green module were used for subsequent analysis. As presented in Fig. 2B, immunity was signi cantly correlated with the module genes. In addition, we detected the overall gene expression level in the green module in the three immunity subtypes. We found that the genes in the green module were signi cantly increased in the high immunity group (Fig. 2C).

Gene signature model construction and performance
To exclude genes with a high correlation in the signature model, hub genes obtained from the above selection were further analyzed using the LASSO-penalized Cox analysis (Fig. 3A). Next, we analyzed these nine genes using multivariate Cox regression analysis in the TCGA cohort and identi ed four genes (IL18RAP, CSF3R, KIAA1429, PIK3R6) for the construction of our signature model (Fig. 3B & table.1). Next, we constructed a prognostic model using the risk-score formula: risk score = (coe cients × gene expression level), for each patient. Patients were classi ed into high-and low-risk groups with median risk scores in the training dataset (Fig. 3C). The high-risk group demonstrated a higher mortality rate than the low-risk group in the training dataset (Fig. 3C) with a P = 3.919e-03 in the log-rank test (table.2). In addition, we found that the area under the curve (AUC) for overall survival (OS) was 0.749 in the timedependent receiver operating characteristic (ROC) curve (Fig. 3C). Furthermore, the ranked risk scores of patients in the testing dataset showed that the high-risk group demonstrated higher mortality than the low-risk group (AUC = 0.740) (Fig. 3D).

Gene signature model performance evaluation
To evaluate the gene signature performance, we rst analyzed the risk status of these two groups. As shown in Fig. 4A, the cut-off value could distinguish between survival statuses and the expression pro les of these four genes in the two groups from the training dataset were visualized in a heatmap.
Furthermore, a nomogram was built by including the expression level of the signature genes and the overall survival rate of the patients from the training dataset (Fig. 4B). Calibration plots were used to assess the performance of the nomogram in predicting the 1-, 3-, and 5-year OS rates in the training dataset ( Figure. 4C). The c-index of the prediction model was 0.803. Furthermore, we validated the performance of the gene signature in the testing dataset. We found that the risk score formula could also be used to classify the survival status ( Fig. 4D & Table 3). Furthermore, nomogram analysis revealed that the gene signature and these four genes would also serve as risk factors to predict the overall survival rate of patients from the testing dataset (Fig. 4E). The c-index of the prediction model was 0.708 ( Fig.   4F).

Immunity status in different groups
To further evaluate the immune status difference between the high-and low-risk groups, we compared the immunity-related scores in these two groups in both the training and validation datasets. We found that all four scores were signi cantly different between these two groups in the training dataset (Fig. 5A).
Consistently, the scores were found to be signi cantly different in the validation dataset (Fig. 5B).

Immunity is positively associated with TMB
Since the TMB level is closely associated with the immunity status of patients, we further analyzed the TMB level of different immunity level groups. Correspondingly, we found that the TMB level was higher in the high-risk group than in the low-risk group in the training dataset (Fig. 6A). However, there was no signi cant difference between the high-and low-risk groups in the testing dataset (Fig. 6B).

Functional analysis of the four genes
Functional analyses were performed for the genes in the immunity_H module to clarify their biological signi cance. GO analysis was applied based on biological process (BP), cellular component (CC), and molecular function (MF). As displayed in Fig. 7, these results demonstrated that the immunity_H module was predominantly enriched in immune-related terms such as "cytokine-cytokine receptor interaction" and "natural killer-mediated cytotoxicity".

Individual gene expression analysis
To further examine the expression pro le of these four genes in liver cancer, we used the TCGA dataset.
We rst examined the different expression statuses in normal and tumor tissues. As shown in Fig. 8A, we found that CSF3R and IL18RAP increased in the tumor tissue, whereas KIAA2429 and PIK3R6 decreased in the tumor tissue. Furthermore, we found that the expression levels of these four genes were associated with the OS of the patients (Fig. 8B). In addition, we also analyzed the spatial expression of these four genes in the human protein altas database. We found that all four genes could be detected in normal tissue and cancer samples (Fig. 8C).

Discussion
Gene signatures are becoming a popular approach in cancer prognosis research, especially immunebased prognostic signatures, which are emerging as a novel hallmark of prognosis (Gaetano, et al., 2015). In this study, a novel and e cient four-gene immune prognostic signature was established using TCGA datasets. The signature showed a high correlation with OS in patients, highlighting the robustness of the gene signature. Furthermore, KEGG and GO enrichment analysis found that the enriched pathways were related to immune-related pathways, which is consistent with previous signatures of the HCC immune microenvironment. Our signature may therefore provide potential biomarkers for malignancy and survival predictions in HCC patients and provide guidance for targeted therapy and immunotherapy.
Somatic missense mutations have been mostly studied with regard to their role in the generation of neoantigens, which can be used to identify potential cancer immunity drivers (Porta-Pardo and Godzik, 2016). Genomic studies were applied in investigating the landscape of molecular alterations in HCC; however, only ~ 25% of tumours harbor potentially targetable drivers. Our gene signature could also re ect the different TMB level of subgroups of the patients, which might be used to guiding the treatment options, including biomarker-driven treatments such as targeted therapy and immunotherapies (Llovet, et al., 2018).
Here, we constructed a signature model including genes such as IL18RAP, CSF3R, KIAA1429, and PIK3R6. These four genes are mostly associated with cancer development. KIAA1429, an m(6)A methyltransferase (Chen, et al., 2019), predicts OS for breast cancer (Liu, et al., 2019). In addition, KIAA1429 promotes migration and invasion of HCC by inhibiting ID2 (Cheng, et al., 2019). The KIAA1429-GATA3 regulatory model based on the m6A modi cation provided insights into epi-transcriptomic dysregulation in hepatocarcinogenesis and metastasis (Lan, et al., 2019). KIAA1429 could promote breast cancer progression and was correlated with pathogenesis, thus possibly representing a promising therapeutic strategy in combination with CDK1 treatment (Qian, et al., 2019). KIAA1429 could also act as an independent predictive factor in HCC and correlated with survival and prognosis (Liu, et al., 2020). The granulocyte colony-stimulating factor (G-CSF or CSF3) and its receptor CSF3R would regulate granulopoiesis and hematopoietic stem cell mobilization (Zhang, et al., 2018). G-CSFR is also associated with the development of myelodysplasia/AML in patients with severe congenital neutropenia (Kunter, et al., 2011). G-CSFR also controls myeloid progenitor proliferation and differentiation to neutrophils (Klimiankou, et al., 2016). G-CSFR plays an important role in the production of neutrophil granulocytes (Dwivedi, et al., 2019). The IL-18 receptor accessory protein (IL-18RAP) was reported to be important in the development of esophageal cancer (Zhu, et al., 2016). A novel 2-gene signature, including IL18RAP, was generated for HCC prognosis (Tian, et al., 2020). However, the functional roles of these four genes in liver cancer remain unclear. Our ndings implicate these genes as prognostic markers for liver cancer and could reveal their potential functions in liver cancer progression.
In summary, we identi ed four novel immune-related genes using TCGA dataset analysis that can be used to determine liver cancer prognosis. Our signature model may reveal relevant dysregulated immune genes in liver cancer patients and predict responses to immunotherapy, and potentially to combined therapy. However, further validation of our gene signature in other independent cohorts and in functional experiments would support this conclusion.

Conclusions
We proposed and independently validated three reproducible immune molecular subtypes of liver cancer, which may provide insights for personalized immunotherapy in patients.

Materials And Methods
Data collection mRNA expression data and clinical parameters were downloaded from the target database. In the dataset, we included 424 samples with clinical information and RNA sequencing (RNA-seq) data. Fragments per kilobase million (FPKM) data were collected. Ethical approval was not necessary for our study, because all data were retrieved from a public database. Perl (ActivePerl 5.28) was used for data extraction and code was available upon request. The dataset was further split into the training dataset:validation dataset with a 3:1 ratio.

Single-sample gene set enrichment analysis (ssGSEA)
Immune features in each sample were evaluated according to 29 immune Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway categories using ssGSEA analysis using the R package, GSVA [15,16]. Brie y, the mRNA expression levels of genes in each sample were measured and ranked to generate enrichment scores.

Sub-Clustering
Based on ssGSEA scores, we performed hierarchical clustering of the samples using the "clusterPro le" R package. Immune cell in ltration levels, tumor purity, and stromal content were calculated using ESTIMATE for each sample. To calculate the correlation of the three clusters with immune cell in ltration levels, we used the "pHeatmap" package to cluster samples.
Weighted gene co-expression networks construction (WGCNA) and module selection WGCNA was used to identify gene modules that were signi cantly associated with the immunity_H group. Gene modules that were speci cally ampli ed in the three immunity subtypes were identi ed.

Prognostic values of gene signature analysis
Patients with clinical features including survival data, age, tumor stage, and sex were used for subsequent analysis. The LASSO Cox regression analysis in the "Glmnet" R package was used to detect immune-related genes and construct a prognostic gene signature. P < 0.001 was considered statistically signi cant. Multivariate Cox regression analysis of the selected genes was conducted using the forward stepwise procedure.
Building the prognostic immune-related gene signature The R packages "survival" and "survminer" were applied to calculate the survival rate, hazard ratios (HR), and 95% con dence intervals (CI), and plot the Kaplan-Meier survival curve. The following risk score formula was used: risk score = mRNA expression level of IL18RAP * -2.736 + mRNA expression level of CSF3R * 0.123 + mRNA expression level of KIAA1429 * 0.093 + mRNA expression level of PIK3R6 * 0.777. The R packages "survival" and "survminer" were also used to nd the optimal cut-off to stratify patients.
In addition, the "survivalROC" R package was applied to investigate the prognostic value of the gene signature. P < 0.05 was considered statistically signi cant.

Predictive nomogram generation
A nomogram was generated using independent prognostic measures with the "RMS" R package. A calibration plot was applied to examine the calibration value of the nomogram.

Functional enrichment analysis
Functional enrichment analysis was performed using the "clusterPro le" R package. Genes enriched in Gene Ontology (GO) and KEGG pathway categories were identi ed. The GO terms and KEGG pathways with P < 0.05 and enrichment scores > 2 were considered statistically signi cant and were plotted using the R package ggplot2.

Page 8/22
TMB was estimated as 10 kb in length with the Perl software. Code was available upon request.

List Of Abbreviations
(C) Heatmap showing Tumor_purity, Stromal_score, Immune_score Lymphocyte_in ltration and percent of lymphocyte in ltration in the three subtypes. Pearson's coe cient and P-values are indicated above each plot. (C) Box-plot of the expression pro le of MEgreen genes in the different Immunity level subgroups. **P < 0.05, one-way ANOVA test.    Gene ontology enrichment analysis for biological process: BP, cellular component: CC, molecular in the testing dataset. (C) IHC demonstrates the spatial expression of the four genes in the tumor and tissue groups from the human protein atlas database.