Clustering analysis of DElncRNAs, DEmiRNAs, and DEmRNAs
According to the above screening criteria, 224 HCC samples and 27 normal samples were included. The screening parameters of lncRNAs and mRNAs were set to P < 0.05, FDR < 0.05, and FC > 1.5. The screening parameters of miRNAs were set to P < 0.05 and FC > 1.2. Finally, 1896 DEmRNAs, 330 DElncRNAs, and 76 DEmiRNAs (Supplemental Tables S1-S3) were obtained. According to the expression values of the DElncRNAs, DEmiRNAs and DEmRNAs in the samples, cluster analysis was carried out, and the results are displayed in the form of a heat map (Fig. 1).
GO And Pathway Analysis Of DEmRNAs In HCC
Compared with normal liver tissue, there were 1896 DEmRNAs in HCC tissues, including 953 upregulated DEmRNAs and 943 downregulated DEmRNAs. These upregulated and downregulated DEmRNAs were analyzed by using the GO database. These genes were enriched by terms in the GO database to determine their functions. The upregulated DEmRNAs were mainly enriched in rRNA processing, SRP-dependent cotranslational protein targeting to the membrane, translational initiation, nuclear-transcribed mRNA catabolic process, nonsense-mediated decay, viral transcription and translation (Fig. 1D), while the downregulated DEmRNAs were mainly enriched in the oxidation-reduction process, regulation of complement activation, complement activation, classical pathway, receptor-mediated endocytosis and proteolysis (Fig. 1E). Then, we performed KEGG pathway enrichment analysis and found that the upregulated DEmRNAs were mainly correlated with the ribosome, metabolic pathways, protein processing in the endoplasmic reticulum, spliceosome and cell cycle (Fig. 1F), while the downregulated DEmRNAs were mainly correlated with metabolic pathways, complement and coagulation cascades, carbon metabolism, valine, leucine and isoleucine degradation, and fatty acid degradation (Fig. 1G).
Construction And Analysis Of The Weighted Coexpression Network
According to the expression information of the DElncRNAs, DEmiRNAs, and DEmRNAs in the samples, cluster analysis was carried out, and the results showed that there were no outlier samples (Fig. 2A and B). To determine the relative balance between scale independence and mean connectivity, the network topology with a soft threshold power of 1 to 20 was analyzed. Finally, we determined that the optimal β value of mRNAs/lncRNAs was 7 (Fig. 2C), and the optimal β value of miRNAs was 4 (Fig. 2D) in the coexpression network analysis. According to the consistent topological overlap and the corresponding module colors represented by the color row, a gene dendrogram was obtained by clustering the dissimilarity. Each colored row represents a color-coded module that contains a set of highly connected genes. Finally, nine modules were generated in the lncRNA/mRNA coexpression network (Fig. 2E), and five modules were generated in the miRNA coexpression network (Fig. 2F). Then, we calculated and mapped the relationship between each module and the corresponding clinical features. From the miRNA correlation module (Fig. 2H), we found that the turquoise module had the largest negative correlation (module-trait weighted correlation = -0.67) with the tumors related to the miRNA coexpression network and had the largest positive correlation (module‐trait weighted correlation = 0.17) with survival time. Therefore, for the miRNA module, we selected the turquoise module to construct the ceRNA network. In addition, from the lncRNA/mRNA correlation module (Fig. 2G), we found that the turquoise module had the largest positive correlation (module‐trait weighted correlation = 0.54) with the tumor related to the lncRNA/mRNA coexpression network and the largest negative correlation with survival time (module‐trait weighted correlation = -0.27). Therefore, in the lncRNA/mRNA module, the turquoise module was selected to construct the ceRNA network. Finally, the turquoise lncRNA/mRNA module was combined with the turquoise miRNA module to construct the ceRNA network.
Prediction of miRNA target genes and construction of the ceRNA network
First, we performed target gene prediction for the miRNAs according to the above method. In the turquoise lncRNA/mRNA module and turquoise miRNA module, 196 mRNAs and 41 lncRNAs were predicted as the target genes of 33 miRNAs. Then, based on the predicted miRNA-target binding relationship and expression relationship in HCC samples, negatively correlated miRNA-mRNA and miRNA-lncRNA pairs were screened to construct the ceRNA (lncRNA-miRNA-mRNA) regulatory network. In the ceRNA network composed of the turquoise lncRNA/mRNA module and turquoise miRNA module (Fig. 3A), there were 566 lncRNA-miRNA-mRNA regulatory pairs, including 30 upregulated lncRNAs, 16 downregulated miRNAs and 75 upregulated mRNAs (Supplemental Table S4).
Functional enrichment analysis of the DEmRNAs in the turquoise module
The number of DEmRNAs in the turquoise module was 481. After module selection, we further performed functional enrichment analysis on the selected module. GO enrichment analysis showed that the DEmRNAs in the ceRNA network (Fig. 3B) were mainly enriched in neutrophil degranulation, mRNA splicing via the spliceosome, cell division, the viral process and the mitotic cell cycle. KEGG pathway enrichment analysis showed that the DEmRNAs in the ceRNA network (Fig. 3C) were mainly correlated with metabolic pathways, the cell cycle, the spliceosome, DNA replication and pathogenic Escherichia coli infection.
Construction of the PPI network and identification of 14 prognosis-related hub genes in the ceRNA network
To further explore the relationship between the regulatory proteins of the ceRNA network, the PPI network was analyzed by the STRING database, and the protein regulatory network was constructed by using Cytoscape software (Fig. 4A). Finally, we selected the top 15 hub genes and constructed the regulatory network (Fig. 4B). The ranking results of the top 15 hub genes are shown in Supplemental Table S5. Then, we visualized the relationship between the 15 hub genes and lncRNAs and miRNAs in the ceRNA network through a Sankey diagram. The results showed that the 15 hub genes were regulated by 16 prognosis-related lncRNAs (AC016747.3, AC024560.3, AC092171.4, BACE1-AS, CIDECP, CTD-2510F5.4, DSTNP2, LINC00294, PSMD5-AS1, RP11-147L13.13, RP11-385F5.5, SNHG3, TPM3P9, SNHG1, STAG3L4, and NRAV) through competitive binding with 10 miRNAs (hsa-let-7c-5p, hsa-miR-139-5p, hsa-miR-148a-3p, hsa-miR-152-3p, hsa-miR-22-3p, hsa-miR-27b-3p, hsa-miR-29a-3p, hsa-miR-29c-3p, hsa-miR-378a-3p, and hsa-miR-455-3p) (Fig. 4C). To determine the relationship between these 15 hub genes and the prognosis of patients, Kaplan–Meier survival analysis and the log-rank test were used to evaluate the overall survival of HCC patients. The results showed that 14 (H2AFZ, HNRNPA1, RAN, SNRPD1, H2AFX, NASP, PPIA, CSNK1D, NAP1L1, DARS2, FARSB, SMARCC1, TPM3, and ZCCHC17) of the 15 DEmRNAs were considered to be important prognostic factors (Fig. 4D), and the high expression of these genes was associated with poor prognosis in patients with HCC.
Screening Of Prognosis-related lncRNAs
Univariate Cox proportional hazards regression analysis was used to screen the prognosis-related lncRNAs in the ceRNA network. The results showed that 19 (AC016747.3, AC024560.3, AC092171.4, BACE1 − AS, CIDECP, CTD − 2510F5.4, DSTNP2, LINC00294, NRAV, PDIA3P1, PPIAP22, PSMD5 − AS1, RP11 − 147L13.13, RP11 − 385F5.5, RP11 − 546D6.3, SNHG1, SNHG3, STAG3L4, and TPM3P9) of the 30 DElncRNAs were considered to be important prognostic factors (Fig. 5A). Then, based on the 19 prognosis-related lncRNAs, we used lasso-penalized Cox regression and multivariate Cox regression analyses to select lncRNAs potentially related to prognosis and weighted their contribution according to the relative coefficient (Fig. 5B and C). Finally, two optimal risk lncRNAs (CTD − 2510F5.4 and DSTNP2) were selected and incorporated into the prognostic risk model.
Construction And Verification Of The Prognostic Risk Model
To explore the significance of risk genes in predicting the prognosis of HCC patients, the risk score of each patient was calculated by the estimated regression coefficient and expression level of the risk lncRNAs. The calculation formula is as follows: Training cohort risk score = (0.1895 × expression of CTD-2510F5.4) + (0.1516 × expression of DSTNP2). According to the median risk score, patients in the training cohort were divided into a high-risk group and a low-risk group. We created a Kaplan-Meier curve based on the log rank test, which showed that the prognosis of the high-risk group was worse than that of the low-risk group (P < 0.05) (Fig. 6A). Then, we ranked the risk scores of the patients in the training cohort and generated a distribution map according to the survival status of each patient. A heat map was used to describe the expression of risk genes in the two prognosis groups (Fig. 6B). Finally, we used a time-dependent ROC curve to test the accuracy of the model in predicting 1-, 3-, and 5-year overall survival. The area under curve (AUC) values of the prediction model were 0.854 at 1 year, 0.756 at 3 years and 0.756 at 5 years (Fig. 6C). To verify the accuracy of the prognostic risk model, we used it to analyze the test cohort and the whole TCGA cohort. Kaplan-Meier survival curve analysis in the test cohort (Fig. 6D) and the whole TCGA cohort (Fig. 6G) showed that the prognosis of the high-risk group was worse than that of the low-risk group (P < 0.05). The distribution of risk scores, survival status, and risk gene expression in the test cohort and the whole TCGA cohort are shown in Fig. 6E and 6H and were similar to the results of the training cohort. The ROC analysis results showed that the AUCs at 1 year, 3 years, and 5 years in the test cohort were 0.736, 0.668 and 0.73, and the AUCs at 1 year, 3 years, and 5 years in the whole TCGA cohort were 0.798, 0.723 and 0.751, respectively (Fig. 6F and 6I). These results indicate that our prognostic risk model can accurately predict the prognosis of patients with HCC.
Analysis Of Independent Prognostic Factors
Univariate Cox regression analysis showed that vascular tumor invasion, TNM stage and risk score were related to the prognosis of patients with HCC (Fig. 7A). To further confirm independent prognostic factors, we performed multivariate Cox regression analysis (Fig. 7B). The results also showed that vascular tumor invasion, TNM stage and risk score were significantly related to prognosis and might be considered independent prognostic factors.
Establishment and evaluation of a nomogram model for predicting the survival rate of HCC patients
The three independent prognostic factors (vascular tumor invasion, TNM stage and risk score) obtained above were further used to build a nomogram for predicting the 1-, 3- and 5-year survival rates of HCC patients (Fig. 7C). From this nomogram, the 1-, 3- and 5-year survival rates can be predicted according to vascular tumor invasion, TNM stage and risk score. The calibration curve of the nomogram model showed that the predicted 1-, 3- and 5-year overall survival rates were in good agreement with the actual survival rates, indicating that the nomogram model had good prediction ability (Fig. 7D-F).
Correlation between risk score and immune cell infiltration and gene set enrichment analysis (GSEA).
To explore whether our model can reflect the state of the tumor immune microenvironment, we analyzed the correlation between the risk score and immune cell infiltration in the whole TCGA cohort. The results showed that as the risk score increased, the content of immune cells (B cells, CD8 + T cells, dendritic cells, macrophages, and neutrophils) in HCC tissues also increased (P < 0.05) (Fig. 8A-F). Next, we explored the potential mechanism of the impact of the risk score on the prognosis of HCC patients through GSEA. The results show that the high-risk group is mainly enriched in complement and coagulation cascades, drug metabolism cytochrome P450, primary bile acid biosynthesis, fatty acid metabolism and the PPAR signaling pathway; and the low-risk group is mainly enriched in DNA replication, base excision repair, the cell cycle, RNA degradation and the P53 signaling pathway (Fig. 8G).