Alternative splicing events are prognostic in hepatocellular carcinoma

Alternative splicing events (ASEs) play a role in cancer development and progression. We investigated whether ASEs are prognostic for overall survival (OS) in hepatocellular carcinoma (HCC). RNA sequencing data was obtained for 343 patients included in The Cancer Genome Atlas. Matched splicing event data for these patients was then obtained from the TCGASpliceSeq database, which includes data for seven types of ASEs. Univariate and multivariate Cox regression analysis demonstrated that 3,814 OS-associated splicing events (OS-SEs) were correlated with OS. Prognostic indices were developed based on the most significant OS-SEs. The prognostic index based on all seven types of ASEs (PI-ALL) demonstrated superior efficacy in predicting OS of HCC patients at 2,000 days compared to those based on single ASE types. Patients were stratified into two risk groups (high and low) based on the median prognostic index. Kaplan-Meier survival analysis demonstrated that PI-ALL had the greatest capacity to distinguish between patients with favorable vs. poor outcomes. Finally, univariate Cox regression analysis demonstrated that the expression of 23 splicing factors was correlated with OS-SEs in the HCC cohort. Our data indicate that a prognostic index based on ASEs is prognostic for OS in HCC.


INTRODUCTION
Alternative splicing (AS) is an important posttranscriptional regulatory mechanism that increases protein diversity [1]. AS of pre-mRNA transcribed from a single gene can generate isoforms with distinct structures and functions [2]. Approximately 95% of the genes in the human genome undergo AS [3]. Aberrant AS can play a role in cancer development and resistance to therapy [2,[4][5][6]. For example, splicing factor mutations or alterations in expression can result in the activation of oncogenes and signaling pathways that promote tumorigenesis [7][8][9][10]. Alterative splicing events (ASEs) could therefore function as diagnostic or prognostic biomarkers in various cancers. Additionally, cancer-specific splice isoforms or splicing factors could be therapeutic targets.
Hepatocellular carcinoma (HCC) mortality rates are increasing worldwide [11]. Although many studies have identified genes that play key roles in HCC development and progression, few studies have focused on the potential roles of ASEs in the pathogenesis of HCC [12]. The availability of RNA sequencing (RNA-seq) data including The Cancer Genome Atlas (TCGA), and the development of databases such as TCGASpliceSeq (https://bioinformatics.mdanderson.org/TCGASpliceSeq /index.jsp), has enabled the analysis of ASEs in various cancers [13].

Analysis of ASEs in HCC
We analyzed ASEs in pooled mRNA samples from 343 HCC cases included in the TCGA dataset.
Individual ASEs were assigned a unique annotation that was a combination of the gene name, splicing type, and the ID number in the SpliceSeq database (AS ID). For example, in the annotation term "C1orf159-AA-20", the gene name is C1orf159, the splicing pattern is AA, and the AS ID is 20. A total of 34,163 ASEs in 8,985 genes were identified in the cohort of HCC cases: 2,666 AA events in 1,937 genes, 2,331 AD events in 1,663 genes, 6,352 AP events in 2,566 genes, 8,087 AT events in 3,532 genes, 12,327 ES events in 5,343 genes, 137 ME events in 135 genes, and 2,263 RI events in 1,561 genes (Table 1). Thus, individual genes were associated with multiple types of splicing patterns. Additionally, ES was the dominant splicing pattern observed.   Type  Total splicing events  OS-SEs  Splicing events  Genes  Splicing events  Genes  AA  2,666  1,937  277  257  AD  2,331  1,663  282  248  AP  6,352  2,566  687  381  AT  8,087  3,532  887  486  ES  12,327  5,343  1,423  1,092  ME  137  135  14  14  RI  2,263  1,561  244  219  Total  34,163  8,985 3,814 2,351

Identification of OS-SEs
We next performed univariate Cox regression analysis to determine whether ASEs were correlated with the OS of HCC patients. A total of 3,814 OS-SEs were identified, which included ES and AT events in TP53, and AA and ES events in VEGF (All P < 0.05, Supplementary Table 1). UpSet plots were generated to visualize the interactions between the seven types of ASEs that were associated with OS ( Figure 2A). We found that single genes could have multiple OS-SEs. For example, AA, AD, ES, and RI events in TMEM205, and AA, AP, ES, and RI events in CIRBP were all correlated with OS in HCC patients (Supplementary Figure 1). We selected the 500 most significant OS-SEs and input the genes into Cytoscape to generate gene interaction networks ( Figure 2B). Cancer-associated proteins such as P53 and VEGF were found to be major hubs in the resulting networks.

Prognostic predictors of OS in HCC
We next performed multivariate Cox regression analysis based on the 10 most significant OS-SEs for each of the seven splicing types and for all types. Eight prognostic indices (PIs) were generated based on event type: PI-AA, PI-AD, PI-AP, PI-AT, PI-ES, PI-ME, PI-RI, and PI-ALL. The median values for the eight PIs were then used to categorize HCC patients into low and high risk groups. We then analyzed the efficacy of the PIs to predict OS at 2,000 days for the two subgroups using the Kaplan-Meier method. The greatest difference in OS was observed when the HCC patients were stratified based on the median value of PI-ALL (2,542 vs. 768 days in the low and high risk groups, respectively; P = 6e−16) ( Figure 3H and Table 2). Receiver operator characteristic (ROC) curves were generated and the area under the ROC curve (AUC) calculated to evaluate the predictive efficiencies of the different models. We found   Figure 4A-4B). We identified distinct clusters of HCC patients using consensus clustering. We found that k = 3 achieved adequate selection ( Figure 5A-5F). Therefore, the patients were clustered into three subgroups. We then compared ASEs and OS among patients in the subgroups (Cluster 1, Cluster 2, and Cluster 3) and found that Cluster 3 had a higher frequency of ASEs compared to Clusters 1 and 2 (ME, P < 0.01; all other patterns P < 0.001; Figure 5G), which was associated with reduced OS and an unfavorable prognosis according to Kaplan-Meier analysis (P = 3e−4, Figure 5H).

Correlation between OS-SEs and splicing factor expression
Because alternate splicing is regulated by splicing factors, we investigated whether the OS-SEs were regulated by a subset of splicing factors. Splicing factor expression data were extracted from the SpliceAid2 database (http://www.introni.it/splicing.html). Univariate Cox regression analysis demonstrated that the expression of 23 splicing factors was correlated with OS in the HCC cohort ( Table 3). The Kaplan-Meier survival curves are shown in Supplementary Figure 2. We analyzed the associations between OS-associated splicing factors and the percent spliced in (PSI) values for OS-SEs using the Spearman correlation method. Correlation plots were then generated using Cytoscape ( Figure 6A and Supplementary Table 2). These results indicated that the expression of 23 survival-associated splicing factors (triangular nodes) was correlated with 447 OS-SEs. Of the 447 OS-SEs, 146 that were associated with favorable OS (red ovals) and 301 were associated with poor OS (green ovals). Interestingly, the majority of the ASEs associated with poor OS were positively correlated with splicing factor expression (red AGING lines), whereas the majority of the ASEs associated with favorable OS were negatively correlated with splicing factor expression (blue lines) ( Figure 6A). The 10 most significant associations between genes and splicing factors by P value are shown in Figure 6B-6H. The top splicing factors were HNRNPA0, TIAL1, QKI, SRSF6, HNRNPA1, SRP54, NOVA1, HNRNPH2, and CELF1.

DISCUSSION
Aberrant AS may play a key role in cancer development [16,17]. TCGA RNA sequencing data has enabled investigation of AS patterns in various cancers including HCC [18,19]. For example, Zhu et al. identified an AS signature that was prognostic in HCC using data derived from the TCGA dataset [20]. However, this study included several patients with limited survival and follow-up data. Therefore, we removed them in accordance with more recent studies [14,21]. Several studies have demonstrated that ASEs are frequently present in HCC tumors [22]. For example, Wu et.al identified 45 ASEs that were observed in tumor tissue from HCC patients but not in adjacent normal tissue samples. These ASEs were associated with survival and cell differentiation [23]. Additionally, Wang et al. demonstrated that a CCDC50S splice variant was modulated by the HBx/SRSF3/14-3-3β complex and promoted tumor progression in HCC through the Ras/Foxo4 signal transduction pathway [24].

AGING
In this study, we investigated whether other ASEs could function as prognostic biomarkers in HCC. We identified at least 1,000 distinct ASEs that were observed in HCC (Supplementary Table 1). Functional enrichment analysis revealed enrichment of genes in several pathways that could impact HCC development and progression.
The genes corresponding to the ASEs we identified included TP53 and VEGF, which play critical roles in cancer biology. Interestingly, ASEs in the same gene can result in protein isoforms with opposing functional effects. For example, AS of the BCL2L1 gene results in the generation of two distinct isoforms: BCL-XL and BCL-XS [25]. BCL-XS has pro-apoptotic effects, while BCL-XL has anti-apoptotic effects. The BCL-XL isoform is the predominant variant observed in HCC and protects tumor cells from p53-mediated apoptosis [26]. We identified two ASEs in BCL2, ID_45706 and ID_45707, which were positively and negatively correlated with OS, respectively (Supplementary Table 1). Because these ASEs result in aberrant proteins and were correlated with prognosis, they may play important roles in HCC development.
Alterations in splicing factor expression have been observed in tumor compared to normal tissue [27]. Splicing factors regulate AS and can function as oncogenes or pseudo-oncogenes thereby promoting tumorigenesis [28,29]. We identified 23 splicing factors that exhibited aberrant expression in HCC tumors (Table 3). Altered expression of several of these factors has been reported previously, such as QKI [30], SRSF6 [7], HNRNPA1 [31], NOVA1 [32], and HNRNPH2 [33]. However, the functions of the majority of the splicing factors we identified in HCC development and progression have not yet been elucidated.
We hypothesized that alternations in splicing factor expression could be correlated with ASEs and OS in HCC. Indeed, the PSI values and network analysis indicated that multiple ASEs were correlated with splicing factor expression in HCC. The majority of the OS-associated splicing factors were highly expressed in HCC and were correlated with poor OS (Figure 6A, Supplementary Table 2). These findings provide insight into the mechanisms by which ASEs function in HCC development and progression. Although our study has several limitations (e.g. sample size, lack of therapeutic strategies, and lack of in vitro/in vivo functional validation studies), our data indicate that ASEs are frequent in HCC and are correlated with patient prognosis. These ASEs may be part of a prognostic signature in HCC.

Data extraction
RNA-seq data for 377 HCC cases was downloaded from the TCGA Data Portal (https://tcgadata.nci.nih.gov/tcga/; accessed January 2019). We excluded 28 cases due to limited (< 30 days) of clinical follow-up data. The remaining 349 patients were then matched with their corresponding entries in the TCGASpliceSeq database, and 343 cases were finally enrolled into this study. A schematic of the overall study design is shown in Figure 8.

Identification of OS-SEs
Univariate Cox regression analysis was performed to identify and analyze OS-SEs. Interactions between the genes corresponding to the OS-SEs were plotted using Cytoscape and the Reactome FI plugin. Metascape (http://metascape.org) was to perform Gene Ontology (GO) term enrichment and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the genes corresponding to the 500 most significant OS-SEs [34]. The top 20 enrichments were displayed. A P < 0.01 and ≥ 3-fold enrichment were considered significant).

Analysis of the prognostic values of the PIs
Multivariate Cox regression analysis was performed on the top 10 OS-SEs that had the highest prognostic values for each type of splicing pattern and the top 10 OS-SEs that had the highest prognostic values for all splicing patterns [15]. OS-SEs with P values < 0.05 were selected to construct the PI. The PI was calculated using the following formula βOS-SE1 × PSIOS-SE1 + βOS-SE2 × PSIOS-SE2 + · ···· + βOS-SEn × PSIOS-SEn, where β corresponded to the regression coefficient. We then evaluated the efficacy of the PIs in predicting cancer status after 2,000 days using ROC analysis with the survivalROC package for R as described [35]. Kaplan-Meier survival curves were generated to analyze the prognostic efficacy of the PIs based on the OS-SEs as described [21]. Finally, Cox regression analysis was performed to calculate the HR values for the PIs and other clinical parameters including T, N, M stage as well as patient age and gender.

Construction of the correlation network of OS-SEs in HCC
Splicing factor data was extracted from the SpliceAid2 database (http://www.introni.it/splicing.html). Univariate Cox regression analysis was performed to evaluate the correlation between splicing factor expression and OS.

AGING
The correlations between splicing factor expression and the PSI values for OS-SEs were analyzed using Spearman's rank order correlation. Correlation plots were generated using Cytoscape (3.6.0) and the Reactome FI plugin.

Statistical analysis
R version 3.4.1 was used for all statistical analysis. All P values were two-sided, and a P < 0.05 was considered statistically significant. UpSet was used to visualize the associations between genes and the different types of SEs. Consensus clustering was performed using the ConsensusClusterPlus package for R [36].

AUTHOR CONTRIBUTIONS
Qi-Feng Chen, Peihong Wu, and Zi-Lin Huang conceived of and designed the study. Qi-Feng Chen, Wang Li, and Peihong Wu performed the literature search, generated the figures and tables, and wrote the manuscript. Qi-Feng Chen, Wang Li, and Lu-Jun Shen collected and analyzed the data, and critically reviewed the manuscript. Qi-Feng Chen, Peihong Wu, and Zi-Lin Huang supervised the study and reviewed the manuscript.