Development of a prognostic model based on immunogenomic landscape analysis of hepatocellular carcinoma

Hepatocellular carcinoma (HCC) is the most common subtype of liver cancer, and the systematic exploration of its prognostic indicators is urgently needed. In this study, we obtained 12 IRGs for the construction of a risk score prediction model in HCC by bioinformatics analysis. Differentially expressed genes were screened using the R software edgeR package. Functional enrichment analysis was performed through gene ontology analyses as well as the Kyoto Encyclopedia of Genes and Genomes pathway analysis. Single factor and multi-factor Cox analysis were employed for survival analysis. We used the Timer software to examine the correlation between risk score and tumor-inltrating immune cells. we after prognostic Gene functional enrichment analysis indicated that these genes were involved in the immune inammatory response. We nally screened out 12 hub IRGs and constructed a risk score predictive model for these 12 genes. The sensitivity and specicity of this model were veried. We believe that this study can enhance our insight into the clinical signicance of IRGs in the prognosis and treatment of HCC patients.


Abstract
Background Hepatocellular carcinoma (HCC) is the most common subtype of liver cancer, and the systematic exploration of its prognostic indicators is urgently needed. In this study, we obtained 12 IRGs for the construction of a risk score prediction model in HCC by bioinformatics analysis.

Methods
Differentially expressed genes were screened using the R software edgeR package. Functional enrichment analysis was performed through gene ontology analyses as well as the Kyoto Encyclopedia of Genes and Genomes pathway analysis. Single factor and multi-factor Cox analysis were employed for survival analysis. We used the Timer software to examine the correlation between risk score and tumorin ltrating immune cells.

Results
We identi ed 3,215 up-regulated and 1,044 down-regulated genes in HCC tissues based on a cohort from The Cancer Genome Atlas (TCGA). Differentially expressed immune-related genes (IRGs) and survivalassociated IRGs were further identi ed. We also integrated multivariate Cox regression analyses to obtain 12 IRGs for the construction of a risk score prediction model, whose performance was veri ed using the Kaplan-Meier survival and receiver operating characteristic curve analyses. Our ndings suggest that the risk score was associated with clinical characteristics and the in ltration of immune cells in HCC patients.

Conclusions
We obtained a risk score prediction model of 12 IRGs in HCC by bioinformatics analysis and con rmed its performance.
Background Hepatocellular carcinoma (HCC) accounts for approximately 90% of primary liver cancers and is one of the most lethal cancers worldwide [1]. Despite signi cant improvement in the treatment of HCC, patient prognosis remains poor [2]. Existing treatments are insu cient for patients with locally advanced or distant metastasis [3]. Moreover, high-quality evidence on the mortality bene t of HCC surveillance is still lacking. Therefore, further research is needed to determine the value of surveillance and to identify new prognostic markers and models for HCC patients.
Recently, the emerging hallmarks of cancer, including abnormal energy metabolism, genome instability, and immune escape, have attracted increasing attention [4]. Cancer immunotherapy, which focuses on activating the immune system to ght tumors, has made tremendous progress in recent decades [5].
Intratumoral immune cells that pre-existing or attracted by chemokines to recognize tumor antigens and destroy tumor cells can control cancer progression [6,7]. Immunoregulatory proteins, such as cytotoxic Tlymphocyte associated protein 4 (CTLA4) and programmed cell death protein 1 (PD-1)/programmed cell death-ligand 1 (PD-L1), have received substantial attention in medical conferences [8]. However, only a few other new immune markers have been identi ed and used as prognostic indicators and therapeutic targets. Therefore, the search for key immune molecules involved in HCC outgrowth is urgently needed, which may also effectively supply prognostic and therapeutic markers for patients with HCC.
In this study, the gene expression data of 371 HCC samples and 50 normal tissues were acquired and analyzed. We integrated the expression and prognostic information of immune-related genes (IRGs), developed an individualized prognostic model for HCC patients, and veri ed the risk score model. We aimed to obtain insight into the potential clinical utility of immune-associated differentially expressed genes in prognostic strati cation and treatment targets.

Microarray data
The mRNA expression data and corresponding clinical information of HCC patients were obtained from The Cancer Genome Atlas (TCGA; https://tcga-data.nci.nih.gov/tcga/) cohort, which contained data from 371 primary HCC and 50 normal liver tissues. A list of IRGs was obtained from the Immunology Database and Analysis Portal (ImmPort; http://immport.org.cutestat.com/), whereas the list of transcription factors (TFs) was obtained from the Cistrome website, which collects ChIP-seq and chromatin accessibility data.
The copy number variation (CNV) and mutation data of the genes set from HCC patients were downloaded from the Broad GDAC Firehose database (http://gdac.broadinstitute.org/). Differentially expressed genes (DEGs) were screened using the R software edgeR package (http://bioconductor.org/packages/edgeR/). We used the criteria of FDR < 0.01 and |logFC| > 1 to de ne DEGs.

Functional enrichment analysis, establishment of a proteinprotein interaction network
We submitted our DEGs, IRGs, and hub IRGs for gene ontology (GO) analyses, including biological process, cellular component, and molecular function categories, as well as the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. A p-value < 0.05 was considered statistically signi cant. The most enriched term within a cluster was represented.
To explore the interactions between these genes, we constructed the protein-protein interaction (PPI) network using the STRING online database (https://string-db.org/). The PPI network could display the interactions that directly or indirectly connect with genes. Metascape and Cytoscape software (http://www.cytoscape.org/) were used to display the PPI results. We identi ed all signi cantly enriched terms, which were then hierarchically clustered into a tree based on Kappa-statistical similarities among their gene memberships.

Survival analysis
The association between the expression levels of IRGs and OS was analyzed using single factor Cox analysis. A log2 (normalized value + 1) data format was used for survival analysis. Hub IRGs were further selected using multi-factor Cox analysis, which was conducted using the R software survival package. IRGs that were signi cantly correlated with OS were also submitted for functional enrichment analysis. A p-value < 0.05 was regarded as statistically signi cant.

Construction of a regulation network between TF and hub IRGs
TFs are important molecules that directly control gene expression. Therefore, it is necessary to explore how TFs regulate these clinically relevant IRGs. Cistrome Cancer is a data source that integrates cancer genomics data from TCGA with > 23,000 ChIP-seq and chromatin accessibility pro les to provide the regulatory links between TFs and transcriptomes. The Cistrome Cancer database (http://cistrome.org/CistromeCancer/) is a valuable resource for experimental and computational cancer biology research and contains a total of 318 TFs. We downloaded the target genes regulated by the differentially expressed TF and calculated the correlation of expression between TF and hub IRGs. We ltered the interactions with a p-value of correlation > 0.4 and drafted the regulation network diagram using Cytoscape.

CNV and mutation analysis of hub IRGs
We downloaded CNV and mutation data of liver cancer samples from the Broad GDAC Firehose website (http://gdac.broadinstitute.org/) and screened for changes in the genome of hub genes. The waterfall chart was drawn using the Maftools website.

Correlation between risk score and immune cell in ltration
We used the Timer software (https://cistrome.shinyapps.io/timer/), which includes different types of cancer samples accessible in the TCGA cohort, to examine the correlation between risk score and tumorin ltrating immune cells (TIICs; B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and dendritic cells).

Statistical analyses
We used SPSS 16.0 software (Chicago, IL, USA) to perform the data analysis. Gene functional enrichment analyses were conducted using the R software clusterPro ler package to identify biological terms among gene clusters. The receiver operating characteristic (ROC) curve was calculated using the survival ROC R software package to validate the performance. The Student's t-test was used to assess signi cant differences between data from the two groups. Correlation analysis was performed using Pearson's tests. Overall survival was evaluated using the Kaplan-Meier method. A p-value < 0.05 was considered statistically signi cant.

Identi cation of differentially expressed IRGs
We rst obtained the gene expression information for HCC and normal liver tissues from The Cancer Genome Atlas (TCGA) cohort and performed data pre-processing using edgeR software to screen DEGs based on the cutoff criteria of FDR < 0.01 and |logFC| > 1. Figure 1A and B show 3,215 up-regulated and 1,044 down-regulated genes, respectively. The KEGG pathway enrichment analysis indicated that neuroactive ligand-receptor interaction was most frequently implicated. The terms "regulation of transsynaptic signaling and chemical synaptic transmission," "extracellular matrix," and "substrate-speci c channel activity" were the main biological terms among biological processes, cellular components, and molecular functions, respectively, according to the GO analysis ( Fig. 1C-F). From this set of genes, we obtained 225 up-regulated and 144 down-regulated IRGs ( Fig. 2A-B; Supplementary Table 1). Cytokinecytokine receptor interaction was most signi cantly enriched among these genes, as per the KEGG analysis. On the other hand, "cell chemotaxis," "external side of plasma membrane," and "receptor ligand activity" were the most important biological terms among biological processes, cellular components, and molecular functions, respectively, according to the GO analysis ( Fig. 2C-F). In addition, we also screened out 50 up-regulated and 23 down-regulated TFs, as shown in Fig. 3

Identi cation of survival associated IRGs
As early diagnosis and prognosis assessment are very important for the treatment of HCC patients, we focused our attention on identifying new prognostic molecular markers for HCC patients. We rst obtained 271 IRGs associated with the overall survival of HCC patients using Cox single factor analysis (Supplementary Table 3). Similar to the enrichment analysis results of the differentially expressed IRGs, these survival associated IRGs were mainly enriched in cytokine interaction and in ammatory signaling pathways (Fig. 4A). We also constructed a PPI network of the 271 survival related IRGs (Fig. 4B). This network showed that SHC adaptor protein 1 (SHC1), growth factor receptor bound protein 2 (GRB2), albumin (ALB), and epidermal growth factor receptor (EGFR) were the core proteins among this dataset.

Identi cation of hub IRGs
In order to obtain IRGs related to the progression of HCC, we extracted hub IRGs that were differentially expressed and correlated with the prognosis of HCC patients using Cox single factor analysis. A total of 89 hub IRGs were screened out, and the correlation between the hazard ratio and expression of hub genes is presented in Fig. 5A and Supplementary Table 4. This suggested that most identi ed hub IRGs were upregulated in HCC and were negatively related to the prognosis of HCC patients. The KEGG pathway enrichment analysis indicated that cytokine-cytokine receptor interaction was most signi cantly enriched, similar to the differentially expressed IRGs (Fig. 5B). The PPI network analysis showed that KIT ligand (KITLG), ALB, insulin like growth factor 1 (IGF1), estrogen receptor 1 (ESR1), and baculoviral IAP repeat containing 5 (BIRC5) were the main interaction proteins among these gene sets (Fig. 5C). The terms "positive regulation of defense response," "receptor complex," and "receptor ligand activity" were the main biological terms among biological processes, cellular components, and molecular functions, respectively ( Fig. 5D-F). We drew a regulatory network diagram between the differentially expressed TFs mentioned above and the hub IRGs using Cytoscape (Fig. 5G). A correlation score > 0.4 was set as the cut-off value. The waterfall plot indicated the CNV and mutation data of these hub IRGs (Fig. 5H).

Risk score mode building and veri cation
Using multivariate Cox regression analysis, we nally obtained 12 hub IRGs with biomarker potential for evaluating the prognosis of HCC patients. A forest plot showed the relative hazard ratio of these 12 hub IRGs (Fig. 6A). We constructed a risk score mode on behalf of the prognostic signature according to the expression levels of 12 hub IRGs. The association between the expression of these 12 hub IRGs and the clinical features of HCC patients is presented in Supplementary Table 5. The expression of most of the 12 genes, except brain derived neurotrophic factor (BDNF) and glucagon like peptide 1 receptor (GLP1R), was statistically related to clinical characteristics. Based on the median risk score, we separated the HCC patients into two groups and performed Kaplan-Meier survival analysis to test the performance of the risk score mode (Fig. 6B). This indicated that the risk score mode could satisfactorily forecast the clinical outcomes of HCC patients. This was also veri ed by the ROC curve analysis (Fig. 6C). The area under the curve was 0.778, suggesting that the model had excellent potential to predict prognostic information for HCC patients. Furthermore, the HCC patients were separated into two groups according to the risk score median, as shown in Fig. 6D, and the expression levels of 12 hub IRGs in these two groups are shown in , and BDNF ) were up-regulated in the high-risk score group. These results were consistent with their hazard ratio values presented in Fig. 6A. Figure 6E re ected the interrelation between the risk score and survival time, as well as the survival status. Patients with a higher risk index had shorter survival times and higher mortality rates.

Clinical application of risk score mode
The relationships between risk score and clinical characteristics, including age, sex, pathologic-M, pathologic-N, pathologic-T, and pathologic-stage were evaluated. The risk score was signi cantly different between the different pathologic-T and pathologic-stage groups (Fig. 7A-F). Patients with higher pathologic-stage and pathologic-T have higher risk scores. To test if the immunogenome accurately re ected the status of the tumor immune microenvironment, we evaluated the association between risk score and immune cell in ltration (Fig. 7G-L). We found that there was a positive correlation between the risk score and CD4-T cell, macrophage, and neutrophil in ltration.

Discussion
Although the importance of IRGs in cancer prognosis and treatment has attracted signi cant attention, a systematic large sample research aimed at exploring the clinical prognostic value has not yet been performed. Here, we rst identi ed hub IRGs after a comprehensive assay of gene expression and patient prognostic information. Gene functional enrichment analysis indicated that these genes were involved in the immune in ammatory response. We nally screened out 12 hub IRGs and constructed a risk score predictive model for these 12 genes. The sensitivity and speci city of this model were veri ed. We believe that this study can enhance our insight into the clinical signi cance of IRGs in the prognosis and treatment of HCC patients.
Patients with early HCC are always asymptomatic, which results in a delay in diagnosis. This is one of the main causes of poor prognosis in patients with HCC [9]. Therefore, monitoring of high-risk groups, which includes screening for early diagnostic markers and assessment of the prognosis of identi ed HCC patients, is important for better control over disease progression. Development of cancer immunotherapies, especially immune checkpoint inhibitors, are a breakthrough in cancer treatment that have increased hope for cancer patients [10][11][12][13]. Adoptive cell transfer is a kind of cell immunotherapy that includes the infusion of tumor-in ltrating T cells, T cell receptor-engineered T cells, and genetically modi ed chimeric antigen-receptor speci c T cells, among others [14][15][16]. These immunotherapy approaches have improved the treatment effect in a wide range of cancers [17][18][19][20][21]. Nevertheless, the e cacy of these therapies is limited by a variety of mechanisms, such as the emergence of compensatory molecular changes to escape the treatment target or the transformation of antigen molecules in the tumor cell surface. Exploring the prognosis-related immune indicators of HCC can help to uncover the molecular mechanism of immunotherapy resistance. Therefore, this study aimed to screen hub IRGs and build a risk score model that was used to indicate the clinical prognosis of HCC patients. The hub genes that have been selected were subjected to enrichment analysis, and the results showed that they are mainly involved in cytokine-cytokine receptor interactions, suggesting that the results are reliable. Aside from being prognosis indicators, the 12 hub IRGs may also provide treatment direction for HCC.
To explore the underlying regulation mechanisms of hub IRGs, we investigated the regulatory network between differentially expressed TFs and hub IRGs. Nonstructural maintenance of chromosomes condensin complex subunit G (NCAPG), MYB proto-oncogene like 2 (MYBL2), sex comb on midleg-like 2 (SCML2), and E2F transcription factor 1 (E2F1) are the prominently up-regulated regulators in the network, indicating a future research direction. NCAPG has been reported as an essential oncogene in HCC and can promote the growth and migration of HCC cells [22,23]. MYBL2 is associated with HCC cell proliferation and the expression of cell cycle regulators [24]. MYBL2 expression is positively related to HCC genomic instability, proliferation, and microvessel density, and is negatively related to apoptosis [25]. E2F1 was reported to be involved in the regulation of immune function [26,27], which veri es our data.
Nevertheless, further studies are needed to clarify the regulatory relationships in the network.
In summary, by integrating database information and bioinformatic analyses, we obtained a risk score model with potential clinical signi cance for HCC. Multi-database integration and analysis as well as further studies are needed to better explore the functions and underlying mechanisms of these hub IRGs as well as their potential role in the clinical diagnosis, treatment, and prognosis of HCC.

Conclusions
This represents the rst study identifying 12 IRGs for the construction of a risk score prediction model in HCC. Our results indicated that these genes might be involved in the regulation of HCC progress, thereby suggesting them as potential targets for HCC treatment.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
The authors declare that this work was available for participate and publication.

Availability of data and materials
The datasets used and/or analyzed in this study are available from the corresponding author upon reasonable request.