Alternative Splicing Events as Indicators for the Prognosis of Uveal Melanoma

Growing evidence has revealed that abnormal alternative splicing (AS) events are closely related to carcinogenic processes. However, the comprehensive study on the prognostic value of splicing events involved in uveal melanoma (UM) is still lacking. Therefore, splicing data of 80 UM patients were obtained from the Cancer Genome Atlas (TCGA) SpliceSeq and RNA sequence data of UM and patient clinical features were downloaded from the Cancer Genome Atlas (TCGA) database to identify survival related splicing events in UM. As a result, a total of 37996 AS events of 17911 genes in UM were detected, among which 5299 AS events of 3529 genes were significantly associated with UM patients’ survival. Functional enrichment analysis revealed that this survival related splicing genes are corelated with mRNA catabolic process and ribosome pathway. Based on survival related splicing events, seven types of prognostic markers and the final overall prognostic signature could independently predict the overall survival of UM patients. Finally, an 11 spliced gene was identified in the final signature. On the basis of these 11 genes, we constructed a Support Vector Machine (SVM) classifier and evaluated it with leave-one-out cross-validation. The results showed that the 11 genes could determine short- and long-term survival with a predicted accuracy of 97.5%. Besides, the splicing factors and alternative splicing events correlation network was constructed to serve as therapeutic targets for UM treatment. Thus, our study depicts a comprehensive landscape of alternative splicing events in the prognosis of UM. The correlation network and associated pathways would provide additional potential targets for therapy and prognosis.


Introduction
Uveal melanoma (UM) is the most common type of malignant tumor in adult eyes, and 50% of UM patients will eventually die for their disease [1][2][3]. Although there are certain advances in the diagnosis and treatment of UM, the prognosis for patients with UM remains poor [4]. Therefore, it is important to explore the molecular mechanism underlying the survival events of UM and identify new prognostic factors and therapeutic targets [5].
There have some studies focusing on the prognostic gene signatures of UM, and protein-coding genes of tumor cells are very important in tumor progression [6]. However, due to the limited number of mRNAs, it is difficult to illustrate the complicated function of protein molecular, especially for development and differentiation of tumors [7]. It is well known that alternative splicing (AS) is an important mechanism of post-transcriptional regulation that enables a single gene to produce multiple proteins, and nearly all human protein-coding genes will undergo alternative splicing [8]. Growing evidence has revealed that abnormal splicing events are closely related to carcinogenic processes

Raw Data Process
The mRNA data and corresponding clinical information of UM were downloaded from TCGA (https://tcga-data.nci.nih.gov/). This dataset was derived from the tissue samples of 80 adult patients. Alternative splicing data of UM were obtained from TCGA SpliceSeq (https: //bioinformatics.mdanderson.org/TCGASpliceSeq) [17]. In TCGA SpliceSeq, a value called Percent Spliced In (PSI), ranging from 0 to 1, was calculated for each detected AS event in a gene. These alternative splicing events can be classified into seven patterns, including the Alternate Donor site (AD), Alternate Acceptor site (AA), Alternate Terminator (AT), Alternate Promoter (AP), Mutually Exclusive Exons (ME), Exon Skip (ES), and Retained Intron (RI). The illustration of the seven types was visualized in Figure 1A. For each sample and every possible splice event, a PSI value was calculated, which suggests a shift in splicing events. AS events with a PSI value of ≥75% and ∆PSI ≥ 30% in the UM cohort samples were selected for further research.

Identification of Survival-Related Alternative Splicing Events
The overall survival time of UM was also downloaded from TCGA. In order to screen survival-related AS events, we classified patients into two groups by the median PSI value of each splicing event and performed univariable cox regression analysis. Splicing events expressing significance p values < 0.05 were selected as survival-related AS.

Construction of the Spliced Gene Correlation Network
To evaluate the interactive relationships among survival-related alternative spliced genes, correlation analyses of survival-related spliced genes was applied by using STRING database which is widely used to predict protein-protein interactions (http://stringdb.org). Moreover, the functions of significant survival-related spliced genes were performed by the bioinformatic tool "clusterProfiler" (Guangchuang YU, Southern Medical University, Guangzhou, China).Gene ontology (GO) functional enrichment analyses and Kyoto encyclopedia of genes and genomes (KEGG) pathways were used to assess the functional categories, which consists of an integrated biological knowledge base and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. Adjust p value < 0.05 were considered to be significant.
Genes 2020, 11, x FOR PEER REVIEW 3 of 17 Figure 1. Landscape of alternative splicing (AS) events in uveal melanoma (UM). AA, alternate acceptor; AD, alternate donor; AP, alternate promoter; AT, alternate terminator; ES, exon skip; ME, mutually exclusive exons; RI, retained intron. (A) Illustration for seven types of alternative splicing in this study. (B) The number of AS events and corresponding genes included in the present study; the x-axis stands for the types of alternative splicing, and y-axis means the number of genes and AS events. (C) UpSet plot of different types of alternative splicing types in UM. The dark bar on the left of drawing represents the amount of each type of AS event. The dark dots in the matrix at the right of drawing represent the intersections of AS events. One gene might possess several alternative splicing patterns (dark dot line), and even a single gene can undergo four types of alternative splicing (green dot line). ES was the most common type of the alternative splicing events (red bar).

Identification of Survival-Related Alternative Splicing Events
The overall survival time of UM was also downloaded from TCGA. In order to screen survivalrelated AS events, we classified patients into two groups by the median PSI value of each splicing event and performed univariable cox regression analysis. Splicing events expressing significance p values < 0.05 were selected as survival-related AS.

Construction of the Spliced Gene Correlation Network
To evaluate the interactive relationships among survival-related alternative spliced genes, correlation analyses of survival-related spliced genes was applied by using STRING database which is widely used to predict protein-protein interactions (http://stringdb.org). Moreover, the functions of significant survival-related spliced genes were performed by the bioinformatic tool "clusterProfiler" (Guangchuang YU, Southern Medical University, Guangzhou, China) .Gene ontology (GO) functional enrichment analyses and Kyoto encyclopedia of genes and genomes (KEGG) pathways were used to assess the functional categories, which consists of an integrated biological knowledge base and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. Adjust p value < 0.05 were considered to be significant.

LASSO Multivariate Cox Analysis
The PSI values of survival-related spliced genes were obtained as described above. Next, the LASSO was used to construct prognostic markers with selected survival-related spliced genes. According to the expression level of each sample, LASSO determines the qualified spliced genes for the risk system and generates the risk score of each sample. The patients were divided into high-risk group and low-risk group by using the median cutoff risk score. Subgroups analysis of somatic mutation and clinical characteristics were evaluated. The survival curves of Kaplan-Meier were drawn, and the differences among groups were compared by log-rank tests. In addition, in order to assess the specificity and sensitivity of gene signature, receiver operating characteristic (ROC) curves for predicting overall survival (OS) were drawn, and area under the curve (AUC) values were generated.
In addition, an SVM classifier was constructed based on the final selected survival-related spliced genes, and its performance was evaluated by leave-one-out cross-validation.

Construction of Splicing Factor Correlation Network
In order to assess the relationships between splicing factors and prognostic alternative splicing events. We first selected 119 splicing factor genes from previously published article and then the expression of splicing factor genes was screened from RNA-seq data of UM [18]. Kaplan-Meier survival analysis was also applied to evaluate the survival associated splicing factor genes. Spearman analyses was performed to explore the correlation between the survival associated splicing factor genes and the survival-related AS events. p < 0.05 was considered significant, and the correlation plots were visualized by Cytoscape.

Statistical Analysis
All statistical analyses were conducted using R software (v.3.5.2). The intersection of seven AS patterns was applied by using the "UpSetR" package. Survival analysis were performed by using "survival" and "survivalROC" package. LASSO multivariate cox analysis was calculated with the "glmnet" package. SVM method was conducted by "e1017" package. The correlation coefficient was calculated by Spearman test. The Kaplan-Meier survival analysis was applied to compare the overall survival of the patients in the different groups or in the low-and high-risk groups. p < 0.05 was regarded as statistically significant in all statistical tests.

Alternative Splicing Events in UM
As a result, a total of 37996 AS events in 17,911 genes in UM were detected, which included 5599 ESs in 14,019 genes, 1619 RIs in 2388 genes, 2130 AAs in 3022 genes, 1891 ADs in 2705 genes, 3048 APs in 7565 genes, 3465 ATs in 7938 genes, and 159 MEs in 159 genes ( Figure 1B). These results also revealed that one gene may have multiple types of alternative splicing events. Even one gene might undergo up to four types of alternative splicing. Among types of AS events, ES was the most frequent type of the alternative splicing events and the infrequent types were ME events ( Figure 1C).

Identification of Survival-Related Alternative Splicing Events
In total, 387 AAs in 349 genes, 354 ADs in 318 genes, 1271 APs in 679 genes, 1274 ATs in 676 genes, 1678 ESs in 1221 genes, 18 MEs in 18 genes, and 317 RIs in 268 genes were identified as prognosis-associated AS events (p < 0.05) ( Figure 2A). The 20 most significant survival-related alternative splicing events in UM are shown in Figure 2B-H. Among them, there were only 18 prognostic ME events.

Construction of Spliced Gene Correlation Network
Finally, there are 383 significant survival-related spliced genes in UM (p < 0.001) mapped to STRING database, and Cytoscape was used to visualize the gene correlation network ( Figure 3A; the cytoscape file is provided as Supplementary Material). Furthermore, in order to explore the molecular functions of genes with survival-related alternative splicing events. The "clusterProfiler" method was applied to annotate and the results showed that 21 pathways were significantly enriched in biological process terms including "protein targeting," "protein targeting to membrane," "mRNA catabolic process," "nuclear-transcribed mRNA catabolic process," etc. ( Figure 3B). In addition, we found that only "Ribosome" pathway was significantly correlated with these spliced genes in KEGG terms ( Figure 3C).

Construction of Spliced Gene Correlation Network
Finally, there are 383 significant survival-related spliced genes in UM (p < 0.001) mapped to STRING database, and Cytoscape was used to visualize the gene correlation network ( Figure 3A; the cytoscape file is provided as supplementary material). Furthermore, in order to explore the molecular functions of genes with survival-related alternative splicing events. The "clusterProfiler" method was applied to annotate and the results showed that 21 pathways were significantly enriched in biological process terms including "protein targeting," "protein targeting to membrane," "mRNA catabolic process," "nuclear-transcribed mRNA catabolic process," etc. ( Figure 3B). In addition, we found that only "Ribosome" pathway was significantly correlated with these spliced genes in KEGG terms ( Figure 3C).

LASSO Analysis Based on Selected Splicing Events
In order to build prognostic markers for UM patients, the least absolute shrinkage and selection operator (LASSO) cox analysis was used to construct prognostic markers with selected survival-related AS events. Then, we constructed seven types of prognostic markers based on AA, AD, AP, AT, ES, ME, and RI, respectively ( Figure 4). Interestingly, these seven prognostic markers can independently predict the overall survival of UM patients ( Figure 5A-F). The area under cure (AUC) values of seven types were 0.809, 0.949, 0.904, 0.822, 0.981, 0.672, and 0.867, respectively, which showed a good performance in prognosis prediction ( Figure 5H). Moreover, we selected all the significant survival-related splicing events in the seven types and develop an 11 spliced gene prognostic marker ( Figure 6C). The risk system was built and reckoned a risk score for each patient. Applying the medium cut-off value of the risk scores. Eighty UM patients were divided into high-risk and low-risk groups ( Figure 6C). The expression value of 11 spliced genes in UM and normal samples which come from GTEx (Genotype-Tissue Expression) database were showed in Figure 6F as well. The results indicated that most of genes were markedly different in the tumor and normal samples. Kaplan-Meier curve indicated that there is a significant difference between high-risk and low-risk group with log-rank test of p < 0.0001 ( Figure 6D). The ROC curve indicated that the final 11 spliced gene prognostic marker could significantly and robustly predict UM survival ( Figure 6E). What's more, we tried different number of selected survival-related spliced genes and calculated their prediction performance. On the basis of these performances, we plotted an accuracy curve (shown in Figure S1). Finally, the 11 spliced genes could determine short-and long-term survival with a predicted accuracy of 97.5% (Table 1). The subgroups analysis of clinical characteristics between low-and high-risk groups showed that vital status, age, histological type and living time have a significant difference ( Table 2). The forestplot showed three genes GNAQ, SF3B1 and EIF1AX which are highly mutated in low-risk group compared to high-risk group ( Figure 7A-C). The mutant of GNAQ and SF3B1 also indicated a significantly longer OS time than the wild type ( Figure 7D).

LASSO Analysis Based on Selected Splicing Events
In order to build prognostic markers for UM patients, the least absolute shrinkage and selection operator (LASSO) cox analysis was used to construct prognostic markers with selected survival- What's more, we tried different number of selected survival-related spliced genes and calculated their prediction performance. On the basis of these performances, we plotted an accuracy curve (shown in Figure S1). Finally, the 11 spliced genes could determine short-and long-term survival with a predicted accuracy of 97.5% (Table 1). The subgroups analysis of clinical characteristics between lowand high-risk groups showed that vital status, age, histological type and living time have a significant difference ( Table 2). The forestplot showed three genes GNAQ, SF3B1 and EIF1AX which are highly mutated in low-risk group compared to high-risk group ( Figure 7A-C). The mutant of GNAQ and SF3B1 also indicated a significantly longer OS time than the wild type ( Figure 7D).

Construction of Splicing Factor Correlation Network
First, the expression of 119 splicing factor genes was screened from RNA-seq data of UM. Kaplan-Meier survival analysis revealed that 18 splicing factor genes whose expression levels were significantly associated with overall survival (Figure 8). Next, the Spearman test was used to evaluate the correlations between survival-associated splicing factors and alternative splicing events, and correlation networks were constructed using Cytoscape (cytoscape file to be provided as Supplementary Material). In UM patients, a total of 3871 alternative splicing events were significantly associated with splicing factors, with 701 positively (red lines) and 3170 negatively (green lines) related alternative splicing events (Figure 9). Table 2. The subgroups analysis of clinical characteristics between low-and high-risk groups. The results indicated that UM patients in high-risk have older age, shorter living time and more epithelioid cell type than low-risk (chi-square test p < 0.05).

Discussion
Alternative splicing is involved in the regulation of almost all multiexons. Abnormal alternative splicing is widely regarded as a new indicator of carcinogenic processes [19,20]. Although some specific mRNAs related to alternative splicing of UM have been identified: For instance, SF3B1 mutations can cause differential alternative splicing of protein coding genes and the aberrant of CD44 alternative splicing are commonly existed in UM [21][22][23][24]. The exploration of alternative splicing events and their related pathways in UM is far from comprehensive. With the growing development of high-throughput sequencing and bioinformatic methods, (TCGA) SpliceSeq (http: //bioinformatics.mdanderson.org/TCGASpliceSeq) has emerged and could be used to gain numerous profiles to comprehensive overview of alternative splicing events in UM. In the current work, we conducted our research with the help of numerous computational tools and public data like TCGA. A series of survival-associated alternative splicing events was identified, and their interaction network was also established, which could provide a novel intervention target for UM treatment.
Different splicing patterns cause one gene to produce multiple subtypes, which makes alternative splicing more complex in the regulatory mechanism of cancer development. In our research, the most frequent alternative splicing pattern is ES which is consistent with many other cancer researches [8,9,11]. A total of 523 alternative splicing events in 383 genes were regarded as the top (p < 0.001) significant survival-related splicing events. In the spliced gene correlation network, the family of ribosomal protein genes RPS27A, RPS14, RPS23, RPS20, RPS24, RPS15A, RPS21, and RPS25 were the hub genes. Remarkably, it is widely accepted that defects in the ribosome are associated with cancer [25]. Recent studies have revealed that alternative splicing in ribosomal protein genes were identified in leukemias and solid tumor types. In this paper, we also identified some potential new biological pathways associated these survival-related spliced genes like mRNA catabolic process, nuclear-transcribed mRNA catabolic process and ribosome. These findings also point out the way for future clinical application. As we known, ribosome biogenesis is a tightly regulated process that is critical for fundamental cellular functions including cell growth and division [26]. These ribosomal protein genes which encode structural proteins associated with ribosome biosynthesis are the most suitable for treatment in UM.
In addition, in order to refine the prediction model and help specify the predicted value of alternative splicing patterns in UM, we performed an ROC curve analysis and the LASSO method to build prediction models. Interestingly, these seven types of alternative splicing markers revealed a good overall survival prediction of UM. The AUCs of prediction models for each splice type were different from each other, with the best AUC values being 0.981 in ES predictive model and the poor AUC values being 0.672 in ME predictive model, respectively. Furthermore, the final prediction model of all splice type suggested that an 11 spliced gene prognostic marker can better to predict the prognosis of UM with AUC value of 0.846. These 11 spliced genes including MGRN1, ATG4B, SF1, DDX54, NCOR2, CXXC1, FXYD5, ZMIZ2, NUDT22, G6PC3 and PRKAG1. Stratified analysis revealed that this 11 spliced gene signature can further stratify UM patients into subgroups with different mutation patterns and was proper for predicting the vital status, age and histological type of UM. Obviously, the performance needed to be validated in an independent large dataset, but the in-depth biological analysis of these biomarkers and compared the mRNA expression to normal samples showed a great promise and suggested that these biomarkers played important roles in UM [27]. To sum up, these alternative splicing prediction models can provide new prognostic information to promote the prognosis evaluation of UM patients.
Alternative splicing events are primarily regulated by splicing factor genes through affecting the binding of exons selection and splicing sites [28]. Therefore, exploration of the splicing factors and alternative splicing events correlation network is imperative in UM. Here we identified 18 splicing factors were significantly co-related to UM patients' survival (Figure 7), such as RBM10 and ZC3H18. RBM10 (RNA-binding protein 10) is belongs to RNA binding proteins family and regards as a tumor promoter universally participate in tumorigenesis by influencing tumor proliferation and apoptosis [29]. For example, recent researches revealed that high expression of RBM10 protein in lung cancer was associated with a shorter overall survival time and a poor prognosis [30]. Interestingly, our result consistent with this conclusion ( Figure 9B). Furthermore, the expression of RBM10 is positive co-related with the expression of NEDD4L. Yusuke Kito et al. reported that NEDD4L expression may be increased to facilitate tumor growth in many melanomas [31]. As for ZC3H18 (Zinc finger CCCH domain-containing protein 18), our study suggested that the gene of ZC3H18 could be regarded as a protective factor and negatively associated with CD47 ( Figure 9C). Massive studies have shown that overexpression of CD47 predicts poor prognosis and promotes cancer cell invasion in many types of cancer [32][33][34]. What is more, in this correlation network, we found most of alternative splicing events were associated with poor overall survival and positively co-related to splicing factors. It is reasonable to believe that there is an obvious trend for the expression of splicing factors are positively associated with the poor prognostic alternative splicing events in UM. Thus, this phenomenon suggested that up-regulation of splicing factors and down-regulation of splicing factors may serving as therapeutic targets for UM treatment.
Although we identify some significant AS events and spliced genes for prognostic of UM, our research still has some limitations. The research is based on bioinformatics method and its conclusion has not been confirmed by experiments. In addition, the sample size in our study was limited. Therefore, it is necessary to do more work to further explore this molecular mechanism.
To sum up, our study depicts a comprehensive landscape of alternative splicing events in UM and identified that survival-related alternative splicing signatures can be used to predict overall survival of UM patients. Prognostic splicing factors and an alternative splicing events network were constructed in UM to present a whole picture of the expression interactions, which revealed a novel underlying mechanism in the tumorigenesis of UM. Further investigations are needed to reveal the clinical and biological significance of the non-cancer cells and genes of alternative splicing events in U, so as to better guide the more effective diagnosis and prognosis of UM.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/2/227/s1, Figure S1: The accuracy curve with the number of genes and the performance of classifiers. The x-axis was the number of genes used for SVM classifier construction and the y-axis was the accuracy of the SVM classifier evaluated with Leave one out cross validation (LOOCV). The peak of curve was accuracy of 0.97 when 11 spliced genes were used.