Identiﬁcation of potential biomarkers and their clinical signiﬁcance in gastric cancer using bioinformatics analysis methods

Background: Alternative splicing (AS) is an important mechanism for regulating gene expression and proteome diversity. Tumor-alternative splicing can reveal a large class of new splicing-associated potential new antigens that may aﬀect the immune response and can be used for immunotherapy. Methods : The RNA-seq transcriptome data and clinical information of stomach adenocarcinoma (STAD) cohort were downloaded from The Cancer Genome Atlas (TCGA) database data portal, and data of splicing events were obtained from the SpliceSeq database. Predicting genes were validated by Asian cancer research group (ACRG) cohort and Oncomine database. RT-qPCR was used to analysis the expression of ECT2 in STAD. Results: A total of 32,166 AS events were identiﬁed, among which 2,042 AS events were signiﬁcantly associated with patients survival. Biological pathway analysis indicated that these genes play an important role in regulating gastric cancer-related processes such as GTPase activity and PI3K-Akt signaling pathway. Next, we derived a risk signature, using alternate acceptor, that is an independent prognostic marker. Moreover, high ECT2 expression was associated with poorer prognosis in STAD. Multivariate survival analysis demonstrated that high ECT2 expression was an independent risk factor for overall survival. Gene set enrichment analysis revealed that high ECT2 expression was enriched for hallmarks of malignant tumors. ACRG cohort and Oncomine were also showed that high ECT2 expression was associated with poorer prognosis in gastric cancer patients. Finally, RT-qPCR showed ECT2 expression was higher in STAD compared to the normal tissues. Conclusion: This study excavated the alternative splicing events in gastric cancer, and found ECT2 might be a biomarkers for diagnosis and prognosis. Abstract 14 Background: Alternative splicing (AS) is an important mechanism for regulating gene expression 15 and proteome diversity. Tumor-alternative splicing can reveal a large class of new splicing-16 associated potential new antigens that may affect the immune response and can be used for 17 immunotherapy. 18 Methods : The RNA-seq transcriptome data and clinical information of stomach adenocarcinoma 19 (STAD) cohort were downloaded from The Cancer Genome Atlas (TCGA) database data portal, 20 and data of splicing events were obtained from the SpliceSeq database. Predicting genes were 21 validated by Asian cancer research group (ACRG) cohort and Oncomine database. RT-qPCR 22 was used to analysis the expression of ECT2 in STAD. Results: A total of 32,166 AS events were identified, among which 2,042 AS events were 24 significantly associated with patients survival. Biological pathway analysis indicated that these genes play an important role in regulating gastric cancer-related processes such as GTPase activity and PI3K-Akt signaling pathway. Next, we derived a risk signature, using alternate acceptor, that is an independent prognostic marker. prognosis in STAD. Multivariate survival analysis demonstrated that high ECT2 expression was 29 an independent risk factor for overall survival. Gene set enrichment analysis revealed that high 30 ECT2 expression was enriched for hallmarks of malignant tumors. ACRG cohort and Oncomine 31 were also showed that high ECT2 expression was associated with poorer prognosis in gastric 32 cancer patients. Finally, RT-qPCR showed ECT2 expression was higher in STAD compared to the 33 normal tissues. 34 Conclusion: This study excavated the alternative splicing events in gastric cancer, and found 35 ECT2 might be a biomarkers for diagnosis and prognosis. 36


Introduction
Gastric cancer (GC) have the second-highest mortality of cancers worldwide (Siegel et al. 2019;Miller et al. 2019).With the rapid development of medical immunology and molecular biology techniques, immunotherapy as a new treatment method has received extensive attention in the field of cancer therapy.Immunotherapy is currently the most promising direction for the treatment of GC patients, however, not all GC patients are suitable for this type of approach (Fuchs et al. 2018;Panda et al. 2018;Roh et al. 2017).Finding the right antigen for targeted vaccines is a big challenge in people who benefit from immunotherapy (Nishino et al. 2017).Due to the heterogeneity of tumors, the current biomarkers for predicting prognosis have certain limitations.Therefore, this field requires new biomarkers as prognostic indicators to effectively enhance prognosis and individualized treatment.
Alternative splicing (AS) refers to the process from the precursor of mRNA to mature mRNA, in which different splicing methods enable the same gene to produce multiple different mature mRNA, and eventually produce different proteins.AS is an important mechanism for regulating gene expression and producing proteome diversity (Nilsen & Graveley 2010).AS occurs frequently in tumors and is closely related to the occurrence and development of tumors (Kim et al. 2008; PeerJ reviewing PDF | (2019:12:44089:1:1:NEW 20 Mar 2020) Manuscript to be reviewed Oltean & Bates 2014).It has been found that AS affects the family of protein genes that often mutate in tumors and changes the protein-protein interaction in tumor-related signaling pathways, indicating that AS is also an important cause of tumorigenesis (Oltean & Bates 2014).Abnormal expression of splicing factors leads to changes in the variable splicing of genes (Blencowe 2003), and may cause the formation of specific cancer-producing splicing isoforms, and leading to cancer (Pradella et al. 2017).Thus, tumor-alternative splicing can reveal a large class of new splicing-associated potential new antigens that may affect the immune response and can be used for immunotherapy.
The purpose of this study was to identify AS in GC, and to provide new splicing-associated potential new antigens on GC.Firstly, we omprehensively detected the landscape of AS events in GC.Secondly, we construct of the prognostic predictor in GC patients.Moreover, we construct survival-associated alternative splicing events.Finally, we used RT-qPCR to detect the expression of ECT2 in GC and paired adjacent normal tissue.

Data acquisition
A total of 407 samples (375 GC samples and 32 normal samples) were enrolled for comprehensive integrated analysis.The data were download from The Cancer Genome Atlas (TCGA) database.In addition, we used the Data Transfer Tool (provided by GDC Apps) to download the level 3 mRNASeq gene expression data and clinical information of those patients.
SpliceSeq is a resource for RNA-Seq data from TCGA that provides a clear overview of alternative splicing and identifies splicing events with potential functional changes resulting from splice variations.The SpliceSeq program begins with a standard reference for genes based on the transcripts of protein-coding genes from the Ensemble database (Ryan et al. 2012).The percent spliced-in index (PSI) value, ranging from zero to one, was introduced to evaluate the transcript ratio of the gene of interest to the splicing patterns.The filter condition is sample percentages with PSI values ≥75 and ΔPSI ≥ 30% (Ryan et al. 2012).Finally, the resulting matrix files and PSI files are used for subsequent analysis.

Gene Set Enrichment Analysis (GSEA)
GSEA abandons the previous method that the analysis software only focuses on a group of up-regulated or down-regulated genes, and focuses on a group of genes with the same or similar biological processes, through a comparative analysis of the overall changes of a group of genes, and then explain the effects of different treatments on the sample or reveal the biological significance.GSEA was used to enrich key Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of high and low ECT2 expression in GC.

Oncomine database analysis
The expression level of the ECT2 gene in various types of cancers was identified in the Oncomine database (Rhodes et al. 2007).The threshold was determined according to the following values: P-value of 0.001, fold change of 1.5, and gene ranking of all.

Quantitative reverse transcription polymerase chain reaction (qRT-PCR) assays
Total RNA from cells or tissues was isolated using TRIzol (Invitrogen, Canada) reagent, the specific operation is carried out with reference to the instructions for the operation of the kit.RNA (1 μg) was converted into cDNA using the RevertAid First Strand cDNA Synthesis Kit (Takara, China).qRT-PCR was performed using SYBR Green Mixture (Takara, China) in the ABI StepOne-Plus System (ABI7500, USA).Target gene expression was normalized against GAPDH.

Statistical analyses
We used the R packages("UpSetR") to get a overview of AS events profiling in GC (Conway et al. 2017).We combine the survival data with the AS data to obtain survival-related AS data for subsequent analysis, then use the R package("UpSetR and survival") to analyze the variable shear events associated with survival.Univariate Cox regression analysis was performed to identify survival-associated splicing events (SASEs) with P values less than 0.05.Manuscript to be reviewed was evaluated by the receiver operating characteristic (ROC) curve and Kaplan-Meier (K-M) analysis.In addition, the mRNA expression data of counterpart genes were obtained from TCGA RNA-seq data.Differential mRNA expression analysis between tumor and para-tumor tissues, and K-M analysis were also performed to evaluate the clinical value of the counterpart genes.
Subsequently, to identify the potential regulatory relationship, a second correlation analysis was performed between the PSI value of splicing events of splicing factors and survival-associated splicing events of non-splicing factors.Cytoscape 3.5 was employed to construct the potential regulatory network.R language packages (ggplot2, pheatmap, pROC, and corrgram) are used for other statistical computations and figure drawing.

Result
The landscape of AS events in gastric cancer Manuscript to be reviewed and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway.The results showed that OSassociated genes are enriched in GTPase activity and PI3K-Akt signaling pathway (Figure 1 G and H).Obviously, most of the top 20 significant OS-associated AS events were better prognostic factors (Z<0) (Figure 2 A-G).For instance, AA of ECT2, LMO7, STAT3, CBX7, TRAPPC2L, TSC2,TROAP, ZNF410 and HNRNPR were adverse prognostic factors in GC patients, however others were better prognostic factors (Figure 2 A).Table 1 shows the top 15 most significant AS events for up-and down-regulation.

Construction of the prognostic predictor in GC patients
Next, risk score constructed using the top 20 significant OS-related AS events of the eight types, identified by multivariate Cox proportional hazards regression.As the patient's risk score increases, the number of dead patients increases, indicating that the risk score is related to survival.
At the same time, as the patient's risk score increases, the PSI value of AS increases (Figure 3 A-X).For instance, the PSI value of AP RCAN1-60494 increases as the risk value increases (Figure 3 C).
Next, the area under the curve (AUC) of ROC was generated, the result showed that risk score exhibited the AUC of 0.841 in AA, followed by AD, ES, All, ME, AP, AT and RI model with AUC of 0.827, 0.818, 0.801, 0.765, 0.759, 0.756 and 0.716, respectively (Figure 4 A-P).The K- M curve was used to analyze the survival time of patients in the low-risk and high-risk groups.
The results show that in each Cox regression model constructed from eight types of AS events, high risk score had a poor survival (Figure 4 A-P).In addition, Both the univariate (HR: 1.71; 95% CI: 1.51-1.95)and multivariate Cox regression analyses (HR: 1.64; 95% CI: 1.44-1.86)results indicated that the risk score and age were all correlated with the OS (Figure 5 A and B).
Next, to determine which SF is associated with AS events associated with survival in the GC, we performed a survival analysis of SF.The results showed that 26 SF was significantly associated with overall survival.In addition, the correlation between the PSI value of significant AS events and the expression of survival-related SF was investigated using the Spearman test (Figure 5 C).

High expression of ECT2 predicts poor prognosis in GC patients
Next, both the univariate (HR: 1.32; CI: 1.11-1.56)and multivariate Cox regression analysis (HR: 1.26; CI: 1.06-1.51)results indicated that high ECT2 expression correlated significantly with a poor overall survival (Figure 7 A).To identify signaling pathways that are differentially activated in GC, we conducted GSEA between low and high ECT2 expression data sets.We selected the most significantly enriched signaling pathways based on their normalized enrichment score.The GSEA shows that cancer pathway, prostate cancer pathway, and wnt signaling pathway are differentially enriched in ECT2 high expression phenotype (Figure 7 B-F), and parkinson's disease, ribosome, oxidative, and Huntington disease are differentially enriched in ECT2 low expression phenotype (Figure 8 B).Next, we further validated ECT2 using data from Oncomine, TIMER, and GEO database.The results showed that mRNA levels of ECT2 were significantly upregulated in GC patients compared with normal samples, and high expression of ECT2 predicts poor prognosis in GC patients (Figure 8 A-C).To further validate ECT2 in GC, RT-qPCR was used to detect the ECT2 mRNA expression in GC, and paired adjacent normal tissue (PANT).Compared with the PANT group, the ECT2 mRNA level was significantly higher in the GC group (Figure 8 D).

Discussion
Invasion and metastasis are the important characteristics of GC, and leads to a poor prognosis.
Surgery, radiotherapy, and chemotherapy are the predominant treatments for GC.Immunotherapy Manuscript to be reviewed represented by anti-PD-1/PD-L1 monoclonal antibody drugs and CAR-T cell therapy has attracted much attention, and encouraging results have continued.Both of them are essentially the ability of human autoimmune system to recruit and activate human core immune guardian-T cells to identify and clear cancer cells through antigen-antibody response (Le DT et al. 2017).However, not every patient responds to this treatment, especially in GC (Grosser R et al. 2019).Therefore, there is an urgent need to clarify and identify new biomaker for therapeutic target.
Previous studies suggest that AS may be associated with 50% of the human genetic diseases (Pan et al. 2008), including hypercholesterolemia (Zhu et al. 2007), frontotemporal dementia (Ayala et al. 2005), and tumors (Kim et al. 2008).The overall function of variable splicing is to increase the diversity of mRNA expressed from the genome, altering the protein encoded by the mRNA, and the effect of variable splicing on protein structure and function changes the phenotype (Lara-Pezzi et al. 2017).Studies on the different phenotypes of the same species through variable shear have positive implications for biological evolution (Bush et al. 2017;Lin et al. 2016).
AS provides a means for cells to diversify proteomes, and there is growing evidence that AS plays a key role in the development or progression of human disease, including GC (Li & Yuan 2017).
The expression of tumor-specific splicing variants affects many cellular activities closely related to cancer, such as cell proliferation, motility, and drug response (Skotheim & Nees 2007).
This article provides analysis of the alternative splicing of genomic maps from 375 GC patients by reanalysing mRNA data.Benefiting from the contribution of TCGA SpliceSeq database, SASEs in GC were identified using univariate Cox regression analysis.Among the risk scores constructed based on all type, RI-type, ME-type, ES-type, AT-type, AP-type, AD-type, and AA-type of SASEs, risk score (all) exhibited an efficiency for predicting the prognosis of patients with BLCA.Next, the potential regulatory network with SASEs of splicing factors and other genes was constructed.Biological pathway analysis indicated that these SASEs play an important role in regulating gastric cancer-related processes.Next, we derived a risk signature, using alternate acceptor, that is an independent prognostic marker.Moreover, high ECT2 expression was associated with poorer prognosis in STAD.Multivariate survival analysis demonstrated that high Manuscript to be reviewed ECT2 expression was an independent risk factor for overall survival, and as validated in GEO database.GSEA revealed that high ECT2 expression was enriched for hallmarks of malignant tumors.Finally, RT-qPCR showed ECT2 expression was higher in STAD compared to the normal tissues.
ECT2 gene, located on human chromosome 3q26, is a highly conservative gene (Solski PA et al. 2004).It can transform fibroblasts into cancer cells and interact with members of the Rho GTP family to cause malignant transformation, induce cell division and regulate the polarity of epithelial cells (Kim H et al. 2014).ECT2 has been thought to be associated with a variety of cancers.The expression of ECT2 is closely related to cell cycle regulation and cell division.Downregulation of ECT2 expression can block cells in G1 phase, and ECT2 expression can dynamically regulate the whole cell cycle (Fortin SP H et al. 2012).Therefore, ECT2 may play a very important role in the mechanism of tumorigenesis.Whether in GC tissue or serum, the expression of ECT2 was significantly higher than that of normal controls, and the expression of ECT2 was closely related to clinicopathological parameters including tumor grade, TNM stage, and lymph node metastasis.Therefore, ECT2 plays an important role in the occurrence and development of gastric cancer, and may be the basis for GC diagnosis and targeted therapy (Wang HB et al. 2016).

Conclusions
Our study depicts a comprehensive landscape of alternative splicing events in GC and identified that survival-related alternative splicing signatures can be used to predict overall survival of GC patients.Prognostic splicing factors and an alternative splicing events network were constructed in GC to present a whole picture of the expression interactions, which revealed a novel underlying mechanism in the tumorigenesis of GC.Further investigations are needed to reveal the clinical and biological significance of the non-cancer cells and genes of alternative splicing events in GC, so as to better guide the more effective diagnosis and prognosis of GC.
Figure  Manuscript to be reviewed  Manuscript to be reviewed Manuscript to be reviewed  Manuscript to be reviewed  Manuscript to be reviewed Manuscript to be reviewed Manuscript to be reviewed Manuscript to be reviewed   Manuscript to be reviewed   Manuscript to be reviewed PeerJ reviewing PDF | (2019:12:44089:1:1:NEW 20 Mar 2020) PeerJ reviewing PDF | (2019:12:44089:1:1:NEW 20 Mar 2020)

Figure 1 :
Figure 1: The landscape of AS events in gastric cancer.

Figure 1 :
Figure 1: The landscape of AS events in gastric cancer.A-B.UpSet plot of interactions between the seven types of alternative splicing events in GC.C-D.UpSet plot of interactions between the seven types of survival associated alternative splicing events in GC.One gene may have up to five types of alternative splicing to be associated with patient survival.E-F.Volcano plots of alternative splicing events difference for TCGA datasets.Red represents a significant difference, blue represents no significant.G. Top 20 pathways of GO analyses of genes from OS-related alternative splicing events.Rich factor represents gene enrichment in a specific pathway.H. Top 20 pathways of KEGG analyses of genes from OS-related alternative splicing events.Rich factor represents gene enrichment in a specific pathway.GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; CC, cellular component; MF, molecular function; BP, biological process; OS, overall survival.P<0.05 was statistically significant.

Figure 2 :
Figure 2: Forest plots for subgroup analyses of survival associated AS events in TCGA-STAD cohort.

Figure 2 :
Figure 2: Forest plots for subgroup analyses of survival associated AS events in TCGA-STAD cohort.(A) Forest plots of top 20 survival associated AA, AD, AP, AT, ES, ME and RI events in GC. (A) Forest plots of top 20 survival associated AA events in GC. (B) Forest plots of top 20 survival associated AD events in GC. (C) Forest plots of top 20 survival associated AP events in GC. (D) Forest plots of top 20 survival associated AT events in GC. (E) Forest plots of top 20 survival associated ES events in GC. (F) Forest plots of top 20 survival associated ME events in GC. (G) Forest plots of top 20 survival associated RI events in GC.The color scale of the circles represents p-values by the side, the larger the circle, the smaller the P value.Horizontal bars represent Z score.

Figure 3 :Figure 3 :
Figure 3: Construction and analysis of risk score based on the prognosis-associated splicing events using multiple Cox regression analysis.GC patients were divided into low-and high-risk groups based on the median value of risk score.The top of each as

Figure 4 :
Figure 4: ROC and K-M curves of eight risk scores constructed using survival-associated alternative spicing events in GC.

Figure 4 :
Figure 4: ROC and K-M curves of eight risk scores constructed using survival-associated alternative spicing events in GC. (A-P) K-M curves of all-type, AA-type, AD-type, AP-type, ATtype, ES-type, ME-type, RI-type risk scores in GC patients, divided into low-and high-risk groups based on the median value of the risk score, and ROC curves of all-type, AA-type, ADtype, AP-type, AT-type, ES-type, ME-type, RI-type risk scores for predicting survival status of patients with GC.

Figure 5 :
Figure 5: Network of survival-associated AS splicing factors.

Figure 5 :
Figure 5: Network of survival-associated AS splicing factors.A. Univariate Cox regression analysis of the association between clinicopathological factors (including the risk score) and OS of patients in the TCGA datasets.B. Multivariate Cox regression analysis of the association between clinicopathological factors (including the risk score) and overall survival of patients in the TCGA datasets.C. Correlation network between expression of survival AS factors and PSI values of AS genes generated using Cytoscape.Gray dots were survival associated splicing factors.Green/Red dots were favorable/adverse AS events.Red/green lines represent positive/negative correlations between substances.

Figure 6 :
Figure6: mRNA expression and K-M curves of genes from the fifteen splicing events used in constructing "risk score (AA)" in GC

Figure 8 :
Figure 8: ECT2 expression levels in different types of human cancers

Table 1 :
The detailed information of the top 30 most different AS events.

Table 2 :
GC-specific genes involved in the ideal prognostic model