Germ Cell-Specific Gene 1-Like Protein Regulated by Splicing Factor CUGBP Elav-Like Family Member 5 and Primary Bile Acid Biosynthesis are Prognostic in Glioblastoma Multiforme

Background Alternative splicing (AS) modifies 92-94% human genes, abnormal splicing events might relate to tumor development and invasion. Glioblastoma Multiforme (GBM) is a fatal, invasive, and malignant tumor in nervous system. The recurrence and development leads to poor prognosis. However, few studies have focused on AS in GBM. Methods RNA-seq and Alternative splicing events (ASEs) data of GBM samples were downloaded from The Cancer Genome Atlas (TCGA) and TCGASpliceSeq databases, respectively. Firstly, the Cox regression analysis was utilized to identify the overall survival splicing events (OS-SEs). Secondly, a multivariable model was applied to access the prognostic value of risk score. Then, we constructed a co-expressed network between splicing factors (SFs) and overall survival alternative splicing events (OS-SEs). Additionally, to explore the relationship between the potential prognostic signaling pathways and OS-SEs, we constructed a network between these pathways and OS-SEs. Ultimately, to better explain the results, validations from multi-dimension platforms were applied. Results In the first step, 1,062 OS-SEs were selected by Cox regression. Then, 11 OS-SEs were integrated in a multivariate model by Lasso regression. The area under the curve (AUC) of receiver operator characteristic (ROC) curve was 0.861. In addition, the risk score generated from the multivariate model was confirmed to be an independent prognostic factor (P < 0.001). What's more, in the network of SFs and ASEs, CELF5 significantly regulated GSG1L|35696|AP and GSG1L|35698|AP (P < 0.001, R = 0.511 and = -0.492). Additionally, GSG1L|35696|AP (P = 0.006) and GSG1L|35698|AP (P = 0.007) showed a significant relationship with cancer status. Eventually, KEGG pathways related to prognosis of GBM were selected by GSVA. The primary bile acid synthesis (P < 0.001, R = 0.420) was the significant pathway co-expressed with Germ Cell-Specific Gene 1-Like Protein (GSG1L). Conclusions Based on the comprehensive bioinformatics analysis, we proposed that aberrant splicing factor CUGBP Elav-like family member 5 (CELF5) significantly, positively and negatively, regulated ASE of GSG1L, and the primary bile acid synthesis pathway might play an important role in tumorigenesis and prognosis of GBM.


INTRODUCTION
Alternative splicing (AS) is a common process for producing multiply mature RNA from pre-RNA, and it can contribute to the modification of 92-94% human genes (Wang et al., 2008). Alternative splicing events (ASEs) are regulated by splicing factors (SFs) (Lee and Abdel-Wahab, 2016). Abnormal splicing can improve tumor development by changing metabolism (Kozlovski et al., 2017). In addition, the regulatory relationship between OS-SE and SF can be involved in the tumor biology process and be regarded as the predictor for prognosis in certain cancers (Yang et al., 2019).
Glioblastoma multiforme (GBM) is the most fatal and invasive malignant tumor in the nervous system. The World Health Organization (WHO) defines GBM as the IV grade glioma. It is reported that there are three new cases of GBM per 100,000 people every year, and the two-year and five-year survival rate are 26-33% and 4-5%, respectively (Batash et al., 2017). Generally, the common effective treatments for GBM are surgery, chemotherapy, and radiation. However, due to high incidences of recurrence and progression, GBM patients often have poor prognosis (Alifieris and Trafalis, 2015;Zhou et al., 2018b). Among the reasons for poor prognosis, transcriptome alteration has been proved to be of great importance in the process of tumorigenesis and progression in rat models. However, to date, most previous studies of GBM have focused on genomic and transcriptome levels, and posttranscriptional regulatory mechanisms have been largely neglected. There are few remarkable biomarkers for prognosis, and no studies have been devoted to exploring the relationship between SFs and ASEs in GBM, which might also play a role in the tumorigenesis and prognosis in GBM.
In the current study, we utilized a Cox regression analysis to identify the overall survival of splicing events (OS-SEs) and constructed a multivariate prognosis model. In addition, the risk score generated from the multivariate model was confirmed to be an independent prognostic factor. We constructed a coexpressed network between SFs and OS-SEs. Meanwhile, to explore the relationship between the potential prognostic signaling pathways and OS-SEs, we constructed a network between these pathways and OS-SEs. Ultimately, we found a regulatory network related to the cancer status and prognosis in GBM, in which the interaction SFs and ASEs were included.

METHOD Data Extraction
This study was approved by the Ethics Committee of Tongji University School of Medicine. Clinical information and gene expression profiling in 599 primary GBM samples were available from The Cancer Genome Atlas (TCGA) database (https://tcgadata.nci.nih.gov/tcga/). In addition, Percent Spliced In (PSI) values of ASEs more than 75% in primary GBM were downloaded from TCGASpliceSeq database (https:// bioinformatics.mdanderson.org/TCGASpliceSeq/) (Ryan et al., 2016). Then, an UpSet plot was developed to demonstrate genes processed by splicing events and patterns of splicing events in GBM.

OS-SEs Identification
In data preprocessing, the missing data in PSI values was supplemented by K-Nearest Neighbor algorithm (k = 10, rowmax = 0.5, colmax = 0.8, maxp = 1500, rng.seed=362436069). Next, the ASEs with percent spliced in (PSI) values were filtered. ASEs with mean PSI values of less than 0.05 and/or standard derivations of PSI values less than 0.01 were excluded. Samples from patients without demographic information and follow-up records were also excluded. Next, the rest data were analyzed by univariate Cox regression to identify the OS-SEs. The UpSet plot was applied to demonstrate the result. In the third step, a volcano plot was used to illustrate the prognostic of OS-SEs, and the x-axis and y-axis represented z-score and -log10 (P value) in the volcano plot, respectively. Moreover, the top 20 OS-SEs in each splicing pattern: Alternate acceptor site (AA), Alternate donor site (AD), Alternate promoter (AP), Alternate terminator (AT), Exon skip (ES), Mutually exons (ME), and Retained intron (RI) was selected and demonstrated in bubble plots.

Multivariate Model Construction and Independent Prognostic Factor Identification
To avoid over fitness in the multivariate model, Lasso regression was utilized to screen the most significant OS-SEs in each splicing pattern. Then, the selected OS-SEs were integrated into the multivariate model. Receiver operator characteristic (ROC) curve was applied to access the efficiency of the prognosis model. The risk score of each sample for predicting the prognosis was calculated by the following formula: In the formula, "n" represented the number of OS-SEs in the multivariate model, "b" represented the regression coefficient of each OS-SE from multivariate model, "Risk score i " represented the risk score of No. "i" patient. Further, the median was selected to divide the risk score into high and low. Moreover, the Kaplan-Meier curve was utilized to evaluate the relationship between risk score and survival probability.

Independent Prognostic Analysis
The univariate and multivariable Cox regression analyses were utilized to access the prognostic value of risk score generated from the multivariate model. The demographics and clinical information including age, gender, and cancer status were used for model correction.

SF and OS-SEs Network Construction
Based on SpliceAid2 database (Piva et al., 2009), a dataset with 390 SFs was downloaded. Next, the relationships between 390 SFs and OS-SEs were analyzed by Pearson correlation. Relationships with a correlation coefficient > 0.400 and P < 0.001 (Pearson correlation analysis) were selected as significant SFs-OS-SEs for next step. Finally, the network including the significantly co-expressed SFs and OS-SEs was developed by Cytoscape (3.7.1) (Shannon et al., 2003).

OS-SEs Related to Cancer Status and Co-Expressed With Pathways Identification
The Mann-Whitney-Wilcoxon test and Kruskal-Wallis test were applied to identify the relationship between OS-SEs and GBM status, and the Beeswarm plots were utilized to illustrate the significance of the relationship.
Gene Set Variation Analysis (GSVA) was utilized to select the differential expression Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in GBM. Then, prognostic KEGG pathways were identified by univariate Cox analysis based on GSVA. The Pearson analysis was applied to identify the correlation between cancer status-related OS-SEs and prognostic KEGG pathways. Based on the result, significantly co-expressing KEGG pathways and OS-SEs were selected.

Multidimensional Online Verification
In order to increase the reliability of the results produced by in silico analysis, multidimensional validation from other platforms was performed. Pathway Card (https://pathcards. genecards.org/) was utilized to find the top 5 genes related to the selected KEGG pathway. Then, The Human Protein Altas (Uhlen et al., 2015) and Genotype-Tissue Expression (GTEx) (Consortium, 2015) showed the key genes and proteins expression levels in normal tissue. PROGgeneV2 (Goswami and Nakshatri, 2014), Gene Expression Profiling Interactive Analysis (GEPIA) (Tang et al., 2017), UCSC xena (Goldman et al., 2015), SurvExpress (Aguirre-Gamboa et al., 2013), UALCAN (Chandrashekar et al., 2017), Linkedomics (Vasaikar et al., 2018), cBioportal (Cerami et al., 2012) and Oncomine (Rhodes et al., 2004) showed the genes expression level at the tissue level in GBM. Further, Cancer Cell Line Encyclopedia (CCLE) (Ghandi et al., 2019) was applied to show the gene expression at the cellular level in GBM. Eventually, String (Snel et al., 2000) illustrated the interaction network of SF, OS-SE, and the potential pathway. Additionally, an independent dataset named Chinese Glioma Genome Atlas (CGGA) (Hu et al., 2019) was used for external validation of key genes.

Statistical Analysis
A two-tailed P-value < 0.05 was considered as statistically significant. In order to control the size of the SF regulation network, correlation coefficient > 0.400 and P < 0.001 in Pearson correlation analysis were selected as a more stringent screening criteria to screen for co-expression patterns between SFs and OS-SEs. All statistical analysis was performed by R version 3.6.1 (Institute for Statistics and Mathematics, Vienna, Austria; www. r-project.org) (Package: impute, UpSetR, ggplot2, rms, glmnet, preprocessCore, forestplot, survminer, survivalROC, beeswarm).

OS-SEs Identification
The analysis process was illustrated in Figure 1. Gene expression profiling of 599 primary GBM samples available from TCGA database was analyzed. Table 1 summarizes the baseline information of the patients. We defined a pattern to show each ASE: the gene name, the TCGASliceSeq database AS ID of each ASE, and the splicing pattern were integrated like GSG1L|35696| AP, so the GSG1L was the gene-symbol, 35696 was the AS ID, and AP was the splicing pattern. A total of 21,407 ASEs in 10,101 genes were identified in patients with GBM, including 2,738 AAs in 2,505 genes, 2,301 ADs in 2,098 genes, 3,524APs in 3,285 genes, 3,728 ATs in 3,517 genes, 6,961 ESs in 6,789 genes, 233 MEs in 36 genes, and 1,922 RIs in 1,727 genes. Besides ASE genes and splicing patterns in all primary GBM shown in the UpSet plot, several genes correlated to multiply the splicing pattern ( Figure 2A). OS-SEs identified by Cox regression are shown in the UpSet plot ( Figure 2B). Generally, ES was the most significant splicing pattern in GBM. Additionally, the volcano plot of ASEs demonstrated the significant and nonsignificant OS-SEs ( Figure 3A). Moreover, DST|76557|AT, CD3D| 18990|ES, TTC13|10258|ME, SV2B|32540|RI, MAP3K13|68008| AA, ZNF302|48996|AD, and SPOCD1|1507|AP were the most significant of the top 20 OS-SEs in each splicing pattern in each bubble plot ( Figures 3B-H).

Independently Prognostic Analysis
To access the independent prognostic value of the risk score generated from the multivariable model, the univariate (HR = 1.028, 95%CI (1.018-1.039), P < 0.001) ( Figure 5A) and multivariate (HR = 1.023, 95%CI (1.013-1.034), P < 0.001) Cox regression analyses ( Figure 5B) were applied. Some other clinical variables, such as age and cancer status, were also significant. Based on the univariate and multivariate Cox regression analyses results, the risk score generated from the multivariate model was confirmed to be an independent prognostic factor in GBM.

OS-SEs Related to Status and Survival-Related Pathways Co-Analysis
185 prognostic KEGG pathways were identified based on GSVA and univariate Cox analysis. The correlation of OS-SEs and KEGG pathways were analyzed by Pearson analysis (Figure 7). Combined with the results of the significant relationship in the SFs and OS-SEs network, CELF5 (SF) was considered as a remarkable marker related to GSG1L (OS-SEs) (P < 0.001,   All in all, the most significant SF, OS-SEs, and downstream pathway were CLEF5, GSG1L|35698|AP, GSG1L|35696|AP and primary bile acid synthesis, respectively. A schematic diagram of this scientific hypothesis is shown in Figure 8.

Multidimensional Validation
With the aim of minimizing the bias of results above, multiplatforms validations were performed. Pathway Card showed that AMACR, AKR1D1, CYP27A1, CYP46A1, and CH25H were the top five genes in primary bile acid synthesis. The detail results of the human protein atlas ( Figure S1 First of all, the expression levels of CELF5, GSG1L, AMACR, AKR1D1, CYP27A1, CYP46A1, and CH25H in multiple databases were summarized in Table S2. CELF5, CYP27A1, and CYP46A1 were high-expressed, while AMACR and CH25H were lowexpressed in normal cerebral cortex in the brain ( Figure S1). GSG1L, AMACR, and CYP27A1 were high-expressed, while CELF5 was low-expressed in tumors at tissue level ( Figures S2-S9). CELF5 was low-expressed in tumor cell line; AMACR, CYP27A1, CYP46A1, and CH25H were high-expressed in tumor cell line in CCLE ( Figure S10). The interaction PPI network of CELF5, GSGS1L, AMACR, AKR1D1, CYP27A1, CYP46A1, and CH25H in the String was shown in Figure S11.

DISCUSSION
GBM was fatal and invasive. Early diagnosis for the malignant tumor was essential for the OS . The development and recurrence might aggravate the tumor and lead to poor prognosis (Alifieris and Trafalis, 2015). Research indicated that biomarkers could be used for prognosis (Zhou et al., 2018a), based on that, we discussed the role of AS in tumorigenesis and prognosis of GBM.
In the current study, the prognostic regulation network of SFs and ASEs was constructed, showing CELF5 positively and negatively regulated GSG1L|35696|AP and GSG1L|35698|AP. Additionally, GSG1L|35696|AP and GSG1L|35698|AP showed significant relationships with cancer status. Eventually, prognostic KEGG pathways were selected by GSVA, and the primary bile acid synthesis pathway was the most significant pathway co-expressed with PSI values of GSG1L|35698|AP, GSG1L|35696|AP. Therefore, we assumed that primary bile acid synthesis might be the downstream pathway of regulatory CELF5 and GSG1L in the prognosis of GBM.
CELF5 interacted with Human cytomegalovirus (HCMV) protein and regulated its genome DNA synthesis (Zou et al., 2018). HCMV participated in tumor regulation like suppressed apoptosis, migration, and invasion (Mcfaline-Figueroa and Wen, 2017;Zou et al., 2018) in GBM. HCMV could regulate the uptake   process of glutamate by decreasing the glutamate translocator (Maurice, 1970). GSG1L was inactive in the AMPA-type glutamate receptors (AMPARs) in cancer (Gu et al., 2016;Andrew et al., 2017). In addition, GBM could secrete glutamate and trigger the AMPARs, leading to cell death in neurons surrounding the tumor (Van Vuurden et al., 2009). What's more, cancer cells in glioma could form synapses with other neurons and communicate by the AMPAR, improving proliferation and growth of glioma (Venkataramani et al., 2019;Venkatesh et al., 2019).
Up to now, there has been no publication focused on the direct regulation of CELF5 and GSG1L. We detected that they are all involved in the glutamate-related processes like cell apoptosis and tumor proliferation based on reports up to date.
Some studies indicated that bile acid and its derivatives were related to several cancers, like multiple myeloma, hepatoma, GBM, and colon cancer, and bile acid could control the development of cancer by its cytotoxicity to cancer and by signaling to immune cells (Brossard et al., 2010;Ma et al., 2018).
Further, in bile acid synthesis, some products of bile acid regulated the PI3K dependent Bad pathway, causing apoptosis of neurons by glutamate (Castro et al., 2004). As we mentioned before, CELF5 could interacted with HCMV, which could regulate the translocator, and GSG1L could suppress the AMPAR. Although there was no such report of the relationship of CELF5, GSG1L, and primary acid bile synthesis, glutamate-related apoptosis might be the link.
Overall, CELF5 was the remarkable SF and GSG1L was the remarkable OS-SE related to status. CELF5 significantly regulated GSG1L. Additionally, primary bile acid was the candidate pathway in the downstream of the regulation between CELF5 and GSG1L. However, because our analysis was only an in silico analysis, it was limited by the sample size and support of the molecular mechanism. However, to minizine the systemic bias, we applied multidimension validation for CELF5, GSG1L, and primary bile acid synthesis based on multiple online databases, showing the stability of the results of this study.
With the aim of making our hypothesis more scientific, a convincing and high-level basic experimental verification will be launched in the future. Based on some studies devoted to ASEs, such as ASEs in pan-cancer, pancreatic cancer, and Renal Cancer (Calabretta et al., 2016;Qi et al., 2016;Chen et al., 2017;Couture et al., 2017). All genes in our hypothesis will be detected in a large quantity of clinical samples (tumor vs healthy; tumor vs adjacent normal tissue) by IHC to explore the differential expression. A direct mechanism between CELF5 and GSG1L will be validated by Coimmunoprecipitation and RNA immunoprecipitation. Furthermore, an engineered splicing factor will be designed to validate the splicing pattern which produces a certain splicing isoform of GSG1L. Besides, immunofluorescence staining will be utilized to explore the location of CELF5 and GSG1L. What's more, the positive or negative regulatory relationship among CELF5, splicing isoforms of GSG1L, primary bile acid synthesis signaling pathway, and GBM tumorigenesis will be validated by biological function assays like gain/loss of function and rescue assays. These function and direct mechanism assays might offer more evidence for these potential therapeutic targets and novel prognostic factors in GBM.

CONCLUSION
Based on the comprehensive bioinformatics analysis, we proposed that aberrant splicing factor CUGBP Elav-like family member 5 (CELF5) positively and negatively regulated ASE of GSG1L and the primary bile acid synthesis pathway might play an important role in tumorigenesis and prognosis of GBM. The scientific hypothesis might offer direction for subsequent biological experiments.

DATA AVAILABILITY STATEMENT
All datasets used for this study are available from the TCGA-GBM program.

ETHICS STATEMENT
This study was approved by the Ethics Committee of Tongji University School of Medicine.