Features of alternative splicing in stomach adenocarcinoma and their clinical implication: a research based on massive sequencing data

Alternative splicing (AS) offers a main mechanism to form protein polymorphism. A growing body of evidence indicates the correlation between splicing disorders and carcinoma. Nevertheless, an overall analysis of AS signatures in stomach adenocarcinoma (STAD) is absent and urgently needed. 2042 splicing events were confirmed as prognostic molecular events. Furthermore, the final prognostic signature constructed by 10 AS events gave good result with an area under the curve (AUC) of receiver operating characteristic (ROC) curve up to 0.902 for 5 years, showing high potency in predicting patient outcome. We built the splicing regulatory network to show the internal regulation mechanism of splicing events in STAD. QKI may play a significant part in the prognosis induced by splicing events. In our study, a high-efficiency prognostic prediction model was built for STAD patients, and the results showed that AS events could become potential prognostic biomarkers for STAD. Meanwhile, QKI may become an important target for drug design in the future.

produce a large number of mRNA isoforms through alternative splicing of pre-mRNA [5]. Alternative splicing actually regulates gene expression at the intron/exon level [6][7][8]. In addition, alternative splicing causes the premature occurrence of termination codon in mRNA, which degrades immediately upon discovery to prevent its translation [9]. Therefore, alternative splicing is a key biological process in cells, and different mRNA splicing isoforms make the final protein products perform different functions.
More and more studies have found that splicing disorder can be used as a marker of tumor development [10] and as a key mechanism involved in the broad biological process of cancer [11,12]. It is noteworthy that some important splicing factors can change the alternative splicing mode of target genes, thus forming a favorable environment for promoting the occurrence and development of cancer [13]. In general, comprehensive and indepth analysis of alternative splicing can dig out potential biomarkers of malignant tumors, so as to assist physicians in clinical diagnosis and prognosis judgment [12,14].
We used a variety of bioinformatics analysis methods to explore prognostic factors in STAD. COX regression analysis helped us screen out significant prognostic markers for further study. According to the regulatory relationship between AS events and splicing factors in STAD, a clear network diagram was drawn to find out the potential mechanism. These results provide a basic direction for further exploration of the molecular mechanism and diagnostic markers of STAD.

Survival associated AS events
As a whole, there are 4006 AA events in 2799 genes, 3450 AD events in 2401 genes, 10,004 AP events in 4025 genes, 8390 AT events in 3666 genes, 19,121 ES events in 6973 genes, 226 ME events in 219 genes, and 2944 RI events in 1956 genes for evaluation of prognostic value (Fig. 1a). The initial clinical data downloaded from the TCGA website is in the supplementary files (Additional file 1). A total of 157 AA events in 153 genes, 174 AD events in 164 genes, 461 AP events in 304 genes, 297 AT events in 203 genes, 805 ES events in 660 genes, 18 ME events in 18 genes, and 130 RI events in 113 genes were identified as prognostic AS events (P < 0.05) (Fig. 1b, Additional file 2). Thus, one gene might have two or more AS events that were markedly related to the survival of STAD patients. The ES which was vividly revealed by the UpSet plot was the most common prognosis-related event, and a gene could have up to seven prognosis-related events (Fig. 1c).

Molecular characteristics of survival related AS events
The distributions of AS events significantly correlated with patient survival are displayed in Fig. 2a. The 20 most significant prognosis-related AS events are also shown ( Fig. 2b-h). To reveal the molecular characteristics of genes with survival-associated AS events, several bioinformatics analyses were conducted. First, a PPI network was constructed to demonstrate the relationships among these genes. UBA52, STAT3 and PLK4 ranked at the core in the network (Fig. 3). According to the functional annotations, "organelle organization", "positive regulation of cellular process" and "protein localization" were the three most significant biological process terms. "intracellular organelle", "membrane-bounded organelle" and "intracellular membrane-bounded organelle" were the three most significant cellular component terms. For molecular function, "enzyme binding" and "GTPase binding" were two most enriched categories (Fig. 4).

Prognostic signatures for STAD patients
By applying the LASSO Cox analysis following univariate Cox, which aims to filter out redundant genes and prevent overfitting of the model, we developed seven types of optimal prognostic signatures based on AA, AD, AP, AT, ES, ME and RI (Fig. 5, Table 1). According to Fig. 6, the eight models we constructed were able to separate the high and low risk groups well, because the differences between the high and low risk groups were very significant (although there were some overlaps of confidence intervals in the several subfigures, the overlaps were less, and the p values were very small, and the differences were significant). Therefore, the eight models could be used to predict the clinical results of STAD patients in clinical practice (Fig. 6). Eight ROC curves validated the performance of prognostic signatures in prognosis prediction, and their AUC values were all greater than 0.7, indicating that these eight models had certain accuracy (Fig. 7). Figure 8 shows the patient's survival status and risk score, as well as the splicing pattern of AS signatures for each AS type or a combination of seven AS types. The upper risk score curve classified patients with low and high risk. The middle survival status figure indicated that the risk value was related to survival; although the decline of survival time was not obvious, the survival status was different. The bottom heat map shows the PSI value change of AS events with the increase of the risk value, in which if the PSI value of an AS event increases with the risk value, it indicated that the AS event was a high-risk AS event (Fig. 8). In univariate Cox analysis, the eight riskScores we constructed were all correlated with prognosis and high-risk factors (Fig. 9). According to multivariate independent prognostic analysis, riskScores obtained by the eight models could be used as independent prognostic factors, and all of them were high-risk factors (Fig. 10).

Survival-associated SF-AS network
Because events are primarily orchestrated by SFs that often bind with pre-mRNAs and regulate RNA splicing via influencing exon selection and splicing site. Therefore, exploration of the SF-AS regulatory network is imperative in STAD. Next, correlation analyses between the SFs' expression and the most significant AS events' PSI value (P < 0.001) were conducted (Fig. 11a). We observed that QKI was most significantly connected in the network, so we compared the influence of QKI expression on STAD's survival rate. The consequence showed that low QKI expression significantly improved the survival rate of patients with STAD, and the five-year survival rate of the patients with low QKI expression was almost twice that of the patients with high QKI expression (Fig. 11b, Fig. 11c).

Discussion
Currently, scientific research on the role of AS events in STAD still has many unanswered questions owing to the shortage of available large-sample public AS profiles and the paucity of systematic analysis referring to their clinical significance and deep molecular function. These bottlenecks have prevented cancer researchers from effectively recognizing the widespread applicability of AS events in STAD. Exploration of AS patterns broadens our vision and our understanding of traditional transcriptome molecular biomarkers. In this project, we adopted several biomedical The number of prognosis-related AS events and corresponding genes obtained by using univariate COX analysis; c UpSet plot of interactions between the seven types of survival associated AS events in STAD. One gene may have up to seven types of alternative splicing to be associated with patient survival computational approaches, which integrate the AS event profiles and clinical information of STAD patients to mine prognosis-related AS and construct splicing prognostic signatures that could stratify STAD patients into subgroups with distinct survival outcomes. Moreover, the SF-AS network could provide further insights into regulatory mechanisms in patients with STAD from the perspective of splicing.
Gastric cancer is a highly heterogeneous malignant tumor. Therefore, a single drug is not significantly useful for various types of gastric cancer. Classical cytotoxic therapy cannot be fully effective because of the presence of patients resistant to specific drugs. At present, the diagnosis and treatment of gastric cancer rely on histopathological diagnosis and definite classification. Therefore, in addition to targeted treatment with trastuzumab, Fig. 2 Top 20 most significant alternative splicing (AS) events in STAD. a Volcano plot of AS events. Each dot represents an AS event that occurs in a gene. The red dots represent AS events that are significantly correlated with patient survival. The blue dots represent AS events without correlation. Both z-score and p-value are statistical values generated by the previous univariate COX analysis, and they have a corresponding relationship (that is, the p-value can be obtained by searching the table according to z-score). Z-score represents Wald statistic, z-score > 0 corresponds to high risk AS events, z-score < 0 corresponds to low risk AS events. So all dots form a parabola finally. The top 20 AS events correlated with clinical outcome based on acceptor sites (b), alternate donor sites (c), alternate promoters (d), alternate terminators (e), exon skips (f), mutually exclusive exons (g), and retained introns (h) we need to develop new targeted drugs to provide better treatment for patients. Potential biomarkers can be mined and used to predict patient outcomes, and treatment strategies can be developed for specific tumor parameters.
The next-generation sequencing technology developed in recent years adopts the whole-genome sequencing method, which has great advantages in exploring alternative splicing. Previously, several studies conducted SpliceSeq analyses to generate alterative splicing profiles for some types of cancer, as well as to construct prognostic signatures for cancer prognosis monitoring, including non-small cell lung cancer [15], colorectal cancer [16], and esophageal cancer [17]. This computational bioinformatics analysis could open up different perspectives on the clinical application and potential pathological mechanism of AS on a macro level. Previously, several studies have proposed transcriptomic signatures related to epithelial-tomesenchymal transition and diagnosis of gastric cancer [18,19]. The present in-depth study further explored alterations of transcriptomes used as prognostic predictors and could broaden our horizons in the clinical significance of transcriptomic signatures.
Given the multitude of AS events impacted by their own pre-mRNAs, the downstream functional impact is partly used to describe the molecular function of AS alteration events. In the PPI network analysis, UBA52, STAT3 and PLK4 were the hub genes. Previous studies have shown that UBA52 and STAT3 are all considered to be related molecules involved in the biological process of STAD. For example, bioinformatics analysis has verified the correlation between UBA52 and GC progress and metastasis [20]. STAT3 is a crucial transcription factor that regulates the transcription of many genes. It plays an extremely important role in promoting the occurrence and development of gastric cancer, and chronic STAT3 activation is a key event to induce the occurrence and development of gastric cancer [21]. STAT3 can directly up-regulate the epithelial expression of TLR2 in gastric tumors, which is related to the low survival rate of GC patients [22]. STAT3 signaling drives transcription activation of EZH2 and mediates poor prognosis in gastric cancer [23]. STAT3 promotes the increased expression of lncRNA HAGLROS, which leads to further progress of gastric cancer [24]. PLK4 is a serine/threonine protein kinase that regulates centriole Fig. 3 Protein-protein interaction network of genes with survival-associated alternative splicing events in STAD. For nodes, low degrees correspond to small sizes and bright colors, and high degrees correspond to large sizes and dark colors; For edges, low combined_scores correspond to small sizes and bright colors, and high combined_scores correspond to large sizes and dark colors duplication. Its maladjustment can lead to abnormal centrosomal numbers, mitotic defects, chromosomal instability, and ultimately tumorigenesis [25]. The relevant study has also shown that PLK4 overexpression in gastric cancer induces centrosome amplification and chromosomal instability, and leads to inhibition of primary cilia formation [26]. These findings also pave the way for future clinical applications, and related target inhibitors are being widely studied and clinically tested as new anticancer drugs. Functional enrichment analysis showed that in STAD, the main molecular function of AS event gene related to prognosis is to bind to GTPase, so it may provide selective advantages for cancer cells by regulating GTPase. Increased RhoA activity leads to poorer survival outcomes for the Lauren diffuse type of gastric adenocarcinoma (DGA), and inhibition of RhoA can correct the drug resistance of DGA [27]. RacGAP1 is closely associated with malignant progression and poor survival [28]. Leptin promotes GC migration through the Rho/ROCK mechanism [29]. RASSF6 partially regulates the effect of mir-181a-5p on GC progression through the MAKP pathway [30]. It is worth considering that, in gastric cancer cells, RhoA promotes cell proliferation and RhoC stimulates cell migration and invasion, while RhoB functions contrary to RhoA and/or RhoC [31]. Therefore, targeted GTPase therapy is also being explored. For example, ALEX1 functions in gastric cancer through the PAR-1/Rho GTPase signaling pathway, becoming a new target for tumor inhibition [32]. RhoA-mediated Fbxw7 regulates the apoptosis of tumor cells and other phenotypes in gastric cancer [33]. Similarly, Gastrokine 1's inhibition of gastric cancer progression may also be dependent on RhoA [34]. Our findings suggest that a group of AS events play a biological role in the alteration of GTPase in STAD.
The highlight of the current study was that we proposed prognostic signatures based on AS events for monitoring the prognosis of STAD patients. Recently, some prognostic signatures in STAD have been proposed. Zhang H et al. found that the efficacy of postoperative adjuvant chemotherapy for gastric cancer was affected by the degree of neutrophil infiltration of the tumor [35]. Jiang Y et al. developed an immune score GC classifier that can effectively predict the recurrence and survival of patients with gastric cancer, which plays a good role in complementation of the prognosis judgment for the TNM staging system [36]. The clinical management of STAD patients still needs to be improved, and the above mentioned molecular biomarkers have broad prospects. In order to facilitate clinical Fig. 4 Cluego analysis for GO terminology (show only pathways with P-value ≤0.001). And each node represents a GO term, each line reflects the correlation between the terms, and the color embodies the function enrichment classification of the nodes, with same function aggregating together in same color practice, we selected a group of AS events using the LASSO Cox regression model, and the prognostic model proposed on which showed satisfactory results. However, our prognostic model has limitations because our work completely based on the bioinformatic analysis and lacked an independent validation cohort. In addition, in order to provide more explanation details, our study also needed a wet lab validation. Due to the limited public alternative splicing data currently, the sample size used to construct a prognostic model is small. If these samples are forcibly divided into a training group and a validation group, the sample size used to construct the prognostic model will be less, leading to poor accuracy of the prognostic model. Besides, on account of the We believe that the TCGA data we used were appropriately standardized. However, the Lasso model adopts the square loss function and applies the same tuning parameter to all variables. Once outliers exist in the data, the estimator obtained is biased, resulting in poor robustness. Therefore, it may be a potential problem that leads to the imperfect accuracy of the prognostic model. Robust analysis methods using outliers to process data for high-dimensional genetic data analysis have been developed and are rapidly gaining popularity. Among them, LAD (least absolute deviation) -LASSO is a method to combine the regression shrinkage and selection of LASSO and robustness of LAD for outliers and heavy-tailed errors [37,38]. In theory, the test results may be more meaningful through robust variable selection.
A large number of AS events are programmed by finite SFs in cells [39]. The altered profile of AS events in multiple tumor types emphasizes the important mechanism of splicing factors in cancer, which is disordered splicing [40]. It is increasingly believed that changes of SFs in STAD can be involved in tumorigenesis and progression through various mechanisms [41][42][43]. The splicing correlation network analysis has also found out the larger regulated nodes, indicating that they occupy a significant position in the SF-AS network. QKI, which is recognized as a tumor suppressor in a wide range of cancers, is highly connected in the network, which can  play a significant part in the prognosis induced by splicing events [44][45][46]. Multiple-factor analysis of a related study shows that QKI expression is an independent prognostic factor for the survival of GC patients [47]. But the role of QKI in STAD has not been fully discussed yet. Our study indicates that the level of QKI expression is significantly correlated with the survival rate of patients with STAD, and it can become an important target for drug design in the future. Nevertheless, our algorithm suggested deregulated AS events as a hallmark of STAD. However, there are some limitations inevitably affecting the reliability of the study. Firstly, we didn't use a separate cohort for more validation. Secondly, more functional experiments are needed to further investigate the impact of dysregulated AS events and SFs on carcinogenesis.

Conclusions
In conclusion, the current study has found out a phenomenological relationship between AS events and prognosis in STAD patients, which is the base of unscrambling the functional contribution of AS events in STAD. These findings are conducive to develop new genomic models for clinical cancer management. In addition, the further identification of predictive splicing factors for prognosis and the construction of SF-AS networks will pave the way for further exploration of splicing related mechanisms.

Data acquisition
TCGA SpliceSeq [48] is a data portal that provides AS profiles across 33 tumors based on TCGA RNA-seq data. SpliceSeq evaluates seven types of splice events, including alternate acceptor (AA), alternate donor (AD), alternate promoter (AP), alternate terminator (AT), exon skip (ES), mutually exclusive exon (ME) and retained intron (RI). TCGA SpliceSeq processed the percent spliced in (PSI) value for cancer research analysis, which indicates the inclusion of a transcript element divided by the total number of reads for that AS event. Alterations in PSI values range from 0 to 100 (%), which suggests a shift in splicing events. The filtering condition for downloading data from the TCGA SpliceSeq website was the percentage of samples with PSI value ≥75. The AS events with standard diversion < 1 were removed. Clinical information of STAD patients was also obtained from the TCGA database. Only pathologically confirmed STAD patients with both follow-up and AS event data were included for our analysis. For clinical information downloaded from TCGA, we deleted patients with survival time < 90 days or null data, and included a total of 338 patients (Table 2) for subsequent analysis. For advanced cancer patients with survival time < 90 days (usually along with severe metabolic disorders such as cachexia and endotoxin), the effect of alternative splicing events in promoting cancer development is no longer accurate, so patients with survival time < 90 days were removed. The same TCGA ID was used to integrate clinical information and AS events data.

Survival analysis
In the survival analysis, the follow-up periods ranged from 90 days to 3720 days after removal of patients with survival less than 90 days. Univariate Cox analysis was conducted to assess the correlations between the PSI value (from 0 to 100) of each AS event and the survival data of STAD patients (P < 0.05). We input the corresponding genes into the Search Tool for the Retrieval of Interacting Genes (STRING) database [49], and the constructed protein-protein interaction (PPI) network was adjusted by Cytoscape software [50]. Meanwhile, we applied the ClueGO plug-in [51] in Cytoscape software for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis and drew the enrichment network. The least absolute shrinkage and selection operator (LASSO) method is a widely used regression analysis method of high-dimensional predictors [52]. LASSO has been extended for use in Cox regression survival analysis and is ideal for high-dimensional data. We selected the LASSO Cox regression model to determine the accurate coefficient for each prognostic feature and to estimate the deviance likelihood via 1-standard error (SE) criteria. The coefficients and partial likelihood deviance were calculated with the "glmnet" package in R.

Prognostic signature construction
The significant AS events in univariate Cox analysis were submitted to LASSO regression analysis to develop Finally, prognostic signatures for survival prediction were calculated by multiplying the PSI values of prognostic indictors and the coefficient assigned by LASSO regression analysis. The riskScore of each patient was calculated according to the constructed prognostic signatures. Based on the median value of all patients' riskScores, all patients were divided into high and low risk groups. Then, survival analysis was carried out for the high and low risk groups to obtain the P-values of survival difference and survival curves. The ROC curve was plotted using the survivalROC package, primarily to determine the accuracy of the prognostic model. By incorporating the following parameters into multivariate Cox regression analysis, splicing-based prognostic signature was evaluated as independent predictors: age, gender, grade, stage, TMN stage.

SF-AS regulatory network
A compendium of 404 splicing factors was obtained from a previous study [53]. The expression profiles of SF genes were curated from the TCGA dataset. We selected Fig. 11 SF-AS network and survival analysis. a Survival-associated SF-AS network in STAD. AS is represented by circles (red for high-risk AS, green for low-risk AS), and SF is represented by triangles. A line between AS and SF indicates a regulatory relationship between them (the red line represents positive regulation and the green line represents negative regulation); b Overall survival of STAD patients; c Correlation analysis between the expression of QKI and the survival rate of STAD patients axes between the expression value of SFs and PSI values of prognosis-related AS events to construct the SF-AS regulatory network according to the following conditions: P value less than 0.001 and the absolute value of Pearson's correlation coefficient more than 0.6. Then, we built the correlation plots via Cytoscape version 3.7.1. All R code and annotations have also been submitted (Additional file 3).
Additional file 1. The initial clinical data downloaded from the TCGA website is in the file Additional file 1.
Additional file 2. The file Additional file 2 contains all surviving-related AS events.
Additional file 3. All R code and annotations in the file Additional file 3 have also been submitted.