A novel alternative splicing-based prediction model for uteri corpus endometrial carcinoma

Alternative splicing (AS) is crucial a mechanism by which the complexity of mammalian and viral proteom increased overwhelmingly. There lacks systematic and comprehensive analysis of the prognostic significance of AS profiling landscape for uteri corpus endometrial carcinoma (UCEC). In this study, univariate and multivariate Cox regression analyses were conducted to identify candidate survival-associated AS events curated from SpliceSeq for the construction of prognostic index (PI) models. A correlation network between splicing factor-related AS events and significant survival-associated AS events were constructed using Cytoscape 3.5. As consequence, 28281 AS events from 8137 genes were detected from 506 UCEC patients, including 2630 survival-associated AS events. Kaplan Meier survival analysis revealed that six of the seven PI models (AD, AP, AT, ME, RI and ALL) exhibited good performance in stratifying the prognosis of low risk and high risk group (P<0.05). Among the six PI models, PI-AT performed best with an area under curves (AUC) value of 0.758 from time-dependent receiver operating characteristic. Correlation network implicated potential regulatory mechanism of AS events in UCEC. PI models based on survival-associated AS events for UCEC in this study showed preferable prognosis-predicting ability and may be promising prognostic indicators for UCEC patients. Summary: This is the first study to systematically investigate the prognostic value of AS in UCEC. Findings in the presents study supported the clinical potential of AS for UCEC and shed light on the potential AS-associated molecular basis of UCEC.


INTRODUCTION
Endometrial cancer (EC), referred to as uterine corpus endometrial cancer (UCEC), is one of the most common gynaecologic malignancy all over the world and most frequently occurs in postmenopausal women [1,2]. Symptoms arising from UCEC included postmenopausal vaginal bleeding, enlarged uterus, low abdominal pain, and pelvic cramping, which forms the basis of clinical diagnosis [3,4]. There were 569,847 newly registered cases and 311,365 deaths caused by corpus uteri cancer in 2018, causing a serious burden to public health, particularly to people in developing countries [5]. Precision medicine, powered by health record and genetic data of patients, refers to the concept that health care is individually tailored on the basis of a person's genes, lifestyle and environment. Advances in genomic sequencing has made precision medicine the main melody of current anti-cancer treatment and we attempted to seek reliable genetic changes from the aspect of alternative splicing (AS) to enhance the individualized prognosis prediction of UCEC patients [6].
Alternative splicing (AS) is crucial a mechanism by which the complexity of mammalian and viral proteom increased overwhelmingly [7,8]. Through selective AGING removal of introns and junction of exons, mRNA isoforms with diversified functions can be generated from a single gene [9,10]. AS events occurred in cancer-related genes have significant impact on the progression of human cancers, which is evidenced by the fact that extensive studies reported a large number of AS events in multiple human cancers [11][12][13]. The occurrence of AS abides by a tissue specific and disease stage specific manner [9].
In UCEC, multiple splice variants of estrogen receptor (ER) and progesterone receptor (PR), two vital molecules that played important roles in the initiation and development of UCEC, were discovered as the results of AS [14]. These receptor variants have been reported to affect the carcinogenesis of UCEC with distinct functions [14][15][16]. Spliced variants discovered in other genes such as synuclein gamma also implicated the contribution of AS to the tumorigenesis of EC [17]. Convinced of the critical influence of AS events on the tumorigenesis of UCEC, we inferred that AS events might serve as novel prognostic marker for UCEC. Previous studies have indicated that one isoform of ERa: ERaD7 and YT521 exon6-retention mRNA were significantly correlated with the survival of UCEC patients [15,18]. Nevertheless, more unknown AS events in UCEC awaits further excavation. Herein, we pursued the present study on systematically exploring the prognostic significance of AS events in UCEC based on RNA sequencing data in TCGA in order to find promising prognostic predictors for UCEC patients.

A preview of survival-associated AS events in UCEC
In total, 28281 AS events from 8137 genes were detected from 506 UCEC patients. Number of AS events identified in seven AS types were recorded in Table 1. For the 28281 AS events from 8137 genes, ES  was the predominant type with the maximum number of AS events (n=9744). The intersecting sets of genes and AS events were visualized by UpSet plot in Figure 1, which indicated that one gene might possessed up to six types of AS. With respect to the relationship between AS events and OS of UCEC patients, a total of 2630 survival-associated AS events in 1752 genes were reported from the univariate Cox regression analysis (P<0.05). The distribution of the 2630 survivalassociated splicing events in seven AS types was listed in Table 2. We selected top significant survivalassociated AS events (P<0.001) to investigate the enrichment of these AS events in biological functions and pathways as well as the interaction network beneath them. The results showed that these significant survivalassociated AS events were obviously clustered in biological processes including viral RNA genome replication, regulation of RNA splicing and spliceosomal complex assembly (P<0.01). Top three pathways assembled by these AS events were snRNP Assembly, COPI-mediated anterograde transport and Insulin re-ceptor recycling (P<0.01) ( Table 3) ( Figure 2). Molecular Complex Detection (MCODE) was used to screen the modules of the protein-to-protein network using the following parameters: degree cut-off = 2, node score cutoff = 0.2, k-core = 2, and maximum depth = 100 [19][20][21]. Protein-protein interaction network analysis from Metascape for these genes revealed these AS events were gathered in seven MCODE components ( Figure 3).    Figure 4. The predicting efficiency of the seven PI models was assessed by tROC curves and Kaplan-Meier survival analysis. As illustrated by a panel of tROC curves in Figure 5, PI-AT demonstrated the highest capacity of estimating the prognosis of UCEC patients with an AUC value of 0.758, followed by PI-RI with an AUC value of 0.719. We also used Kaplan-Meier survival analyses to appraise the prognosis-predicting ability of the seven PI models.

PI models featured by AS events for UCEC
UCEC patients were separated into low risk and high risk group according to the median values of PI. The results suggested that six of the seven PI models (AD, AP, AT, ME, RI and ALL) exhibited good performance in stratifying the prognosis of low risk and high risk group ( Figure 5). Survival time of UCEC patients in low risk group of six PI models (AD, AP, AT, ME, RI and ALL) was significantly prolonged compared to that of UCEC patients in high risk group (P<0.001). Accord-ing to the assessment from univariate and multivariate Cox regression analysis, four PI models including PI-AP, PI-AT, PI-ME and PI-RI figure prominently with superior survival-associated AS events. One-to-one match between colors of the nodes and enrichment terms were labeled in the left. Nodes that share the same cluster ID are typically close to each other; (B) Nodes in the network represent corresponding genes of top significant survival-associated AS events. One-to-one match between colors of the nodes and P values were labeled in the left. Enrichment terms containing more nodes tend to have a more significant P value.
AGING independent prognosis-predicting value in both univariate and multivariate Cox regression analysis (Table 5), (all P < 0.05). For all PI models, high-risk UCEC patients were more inclined to suffer from advanced clinical progression than low-risk UCEC patients, which is especially obvious for grade classification of UCEC (Table 6).  AGING Correlation network of splicing factor-related AS events and survival associated AS events We downloaded information of 74 splicing factors and the corresponding splicing factor-related AS events from the SpliceAid2 database and TCGA. Results from univariate Cox regression analysis suggested that 16 splicing factor-related AS events were remarkably linked to the survival of UCEC patients (Supplementary Table 1). The correlation between the 16 splicing factor- AGING related AS events and 26 significant AS events from multivariate Cox regression analysis were calculated and significant correlations were presented as a correlation network in Figure 6E (P<0.05) (Supplementary Table 1). Blue nodes (n=16) and purple nodes (n=24) represented splicing factor-related AS events and significant AS events from multivariate Cox regression analysis, respectively. Positive and negative correlations between splicing events were marked as red lines (n=68) and green lines (n=64), respectively. We also conducted Kaplan-Meier survival analysis for the 12 splicing factors of the 16 splicing factor-related AS events after dividing UCEC patients according to the average expression value of the 12 splicing factors. We found that four splicing factors including RBM4, ESRP1, TRA2B and SRSF2 served as significant prognostic indicators for the worse survival of UCEC patients (P<0.05) ( Figure 6A-D).   Blue nodes (n=16) and pruple nodes (n=24) represented splicing factor-related AS events and significant AS events from multivariate Cox regression analysis, respectively. Positive and negative correlations between splicing events were marked as red lines (n=68) and green lines (n=64), respectively.

DISCUSSION
Existing risk stratification for UCEC patients based on morphological classification had limited power in predicting the overall survival conditions of UCEC patients and it remained unsolved to find effective prognostic indicator for UCEC patients. Accumulated evidence suggested that AS exerted vast influence on the biological events of human cancers [22][23][24], which enlightened us that the aberrant AS profiles in UCEC may provide valuable prognostic information. Although multiple splice variants of molecules such as ER and PR have been reported in previous studies to participate in the pathogenesis of UCEC, there lacks systematic and comprehensive analysis of the prognostic significance of AS profiling landscape for UCEC.
The present study is the first to investigate global pattern of survival-associated AS events in UCEC using TCGA data. Results from univariate Cox regression analysis revealed that thousands of AS events were associated with the survival of UCEC patients (P<0.05). Subsequent functional annotation for genes corresponding to the significant survival-associated AS events in UCEC (P<0.001) indicated that these genes were mainly involved in biological processes and pathways including viral RNA genome replication, regulation of RNA splicing, spliceosomal complex assembly, snRNP Assembly, COPI-mediated anterograde transport and Insulin receptor recycling. AS events generated from these genes might affect the initiation and development of UCEC through interfere with the above biological processes and pathways.
Bioinformatics analysis of the significant survivalassociated AS events in UCEC is one of the highlights of our research.  [26,27]. The difference values between these two gene-expression signatures and PIs in our study were within the range of ±0.1, which indicated that the prediction efficiency of PIs in this study were comparable with gene-expression models. In the current study, UCEC cases enrolled in the prognostic analysis were restricted to those whose OS time exceeded 90 days and the validation cohorts were composed of 506 UCEC patients from TCGA, which is different from previous study with similar works. It is understandable that a certain amount of error was inevitable due to the inclusion criteria of patients with prognostic data and heterogeneity of validation cohort. Results in our study offered a novel visual angle for the precision medicine of UCEC patients and the molecular mechanism of tumorigenesis of UCEC. Evaluation results from tROC curves and Kaplan-Meier survival curves proved that building PI models based on survival -associated AS events was a feasible way to stratify UCEC patients into risk groups of different survival outcome.
It is well known that AS may introduce nonsensemediated mRNA decay, truncated protein and increased or decreased miRNA binding sites, eventually changing the quality and quantity of protein product. Additionally, splicing events in untranslated regions or non-coding RNAs might lead to abnormal gene function [28]. Cancer-specific mRNA transcripts may affect the formation and progression of human cancers via activating oncogenes or inhibiting tumour suppressor genes [29]. Splice-switching of MYO1B into an oncogenic isoform drove gliomagenesis [30]. Presence or absence of exon7 in two splicing isoforms of MBNL1 conveyed opposite phenotypical implications of cancer [31]. Two splicing isoforms of ZNF148 exerted mutual antagonistic effect to each other on the biological activities of colorectal cancer [32]. Throughout the 26 corresponding genes of component AS events for PI models, four genes were closely associated with UCEC. In the study of Wong YF et al., OLFM1 was found to display significant down-regulation in endometrial cancer of Hong Kong Chinese women [33]. Oestrogen receptor α (ESR1) has great impact on the susceptibility and prognosis of endometrial cancer [34]. Latest research discovered that five adjacent tag singlenucleotide polymorphisms at the 5' end of ESR1 denoted lower risk of UCEC [35]. Significant correlation was established between single nucleotide polymorphisms of ERCC1 and chemosensitivity of UCEC in the study conducted by Chen L et al. [36]. Elevated GRB2 was engaged in oncogenic events of UCEC triggered by insulin [37]. Apart from the four genes, other genes such as FBXL19, CSTF2, ZC3H11A, CRTC1 and MAGI3 influenced the formation and progression of human cancers with either carcinogenic or tumor-suppressive function [38][39][40][41][42]. Although none of the corresponding AS events of the 26 genes was reported in UCEC, it is conjectured that loss-of function for the tumor suppressor gene or gain-of-function and retain-of-function for the oncogene induced by AS may connect the 26 component AS events in PI models to the cancer biology of UCEC.
As critical regulators of splicing events, the prognostic significance of splicing factors and the correlation between splicing factor-related AS events and survivalassociated AS events are also worthy of exploration. Correlation network in this study depicted the complicated interactions between splicing factor-related AS events and survival-associated AS events. Both positive and negative correlations were observed between one splicing factor-related AS event and multiple survivalassociated AS events; or one survival associated AS event and multiple splicing factor related AS events. For example, SERBP1_AA_3354 was negatively correlated with HNRNPA1_AA_22145 and was positively correlated with HNRNPC_ES_26552. We speculated that splicing factors might execute diversified regulatory functions in mediating AS events of UCEC. Moreover, assessment from Kaplan-Meier survival analysis indicated that four splicing factors including SRSF2, TRA2B, ESPR1 and RBM4 were all associated with the worse survival of UCEC patients. Of note, SRSF2 was linked with poor survival of patients with myelodysplastic syndromes and the frequent mutation of SRSF2 could induce oncogenesis in hematopoietic cells through activating a cascade of alternative splicing [43]. Expression of TRA2B served as independent prognostic factor for the worse progression-free survival of UCEC patients in the study of Ouyang YQ1 et al. [44], which were in concordance with our results. However, the prognostic significance of ESRP1 and RBM4 in this study was conflicting with documents in previous studies. ESRP1 is a kind of epithelial cell-specific epithelial cell-specific alternative splicing controller with involvement in epithelialmesenchymal transition (EMT) [45,46]. ESRP1 could suppress tumorigenic potential in various cancers including colorectal cancer, pancreatic cancer and ovarian cancer [47][48][49]. Similarly, RBM4 was reported to inhibit tumor progression via specifically controlling splicing related to the apoptosis, proliferation, and migration of cancer cells [50]. Whether the expression of ESRP1 and RBM4 indicated good or poor clinical outcome of UCEC patients require further investigations in future studies.
Although PI models with impressive predicting power were produced in this study, limitations of the present research should also be pointed out. The prediction efficiency of PIs in this study was not the best among all prognostic models to date. Functional annotation of the genes corresponding to significant survivalassociated AS events were theoretical analysis based on public databases. The regulatory network was constructed on the correlations calculated between PSI values of splicing factor-related AS events and survivalassociated AS events. Experiments were warranted in future studies to validate the functional role of survivalassociated AS events in UCEC and the stimulating or inhibitive influence of splicing factors on AS events.
In conclusion, we identified PI models based on survival-associated AS events for UCEC with preferable prognosis-predicting ability. Findings in this study were anticipated to provide novel options for selecting reliable prognostic indicators for UCEC patients. Furthermore, the correlation network between splicing factor-related AS events and survival-associated AS events may deepen the understanding of the carcinogenesis of UCEC.

Process of AS data curation
TCGA data portal (https://portal.gdc.cancer.gov/) provided the RNA sequencing data of UCEC cohorts. Analysis of mRNA splicing profiles in UCEC was conducted with the aid of SpliceSeq [51], a java program that explicitly quantifies RNA-Seq reads and identifies its possible functional changes as a consequence of AS in the context of transcript splice graphs. We downloaded the percent spliced in (PSI) value for seven types of AS events: Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD) and Alternate Acceptor site (AA) to quantify AS events in UCEC. PSI value is a commonly used ratio for the scoring of AS events from zero to one.

PI models featured by AS events for UCEC
Multivariate Cox regression analysis was applied to top significant survival-associated AS events (P<0.001) selected from univariate Cox regression analysis in each AS type for further evaluation of the prognostic value of AS events in UCEC. AS events with P <0.05 from multivariate Cox regression analysis were retained to construct prognostic index (PI) for the corresponding AS type, which was calculated from the following formula: PI=∑ (β means the regression coefficient). To compare the efficiency of PI models for each AS type, survivalROC package (version 1.0.3) in R (version 3.3.0) that enables time-dependent receiver operating curves (tROC) estimation to accommodate censored data [53] was employed to calculate area under the curve (AUC) value for the tROC curves of each PI model. Kaplan-Meier survival curves were also used to compare the prognostic ability of prediction models. P values reported from all analyses were twosided. To examine the independence between PI models and important clinical features, we performed univariate and multivariate Cox regression analysis to compare the hazard ratio (HRs) of PI models and important clinical features for UCEC. Furthermore, the relationship between PI models and clinical progression of UCEC was calculated through Chi square test in SPSS v.22.0.

Correlation network of splicing factor-related AS events and survival associated AS events
Splicing factors played indispensable role in regulating splicing events [54]. In this study, we dived deeper into the underlying molecular mechanism of AS events in UCEC through exploring the correlation network of splicing factor-related AS events and survival associated AS events. We obtained the information of splicing factors from SpliceAid2 (www.introni.it/ spliceaid.html) and downloaded the level 3 mRNA-seq expression data of the splicing factors from TCGA data portal. Considering the rationality of transcripts per million (TPM) format in the interpretation of RNA-seq data [55], primitive count values were converted into TPM. We conducted univariate Cox regression analysis to assess the association between OS of UCEC patients and PSI of splicing factor-related AS events in TCGA. Whether significant correlation existed between PSI of survival-associated splicing factor-related AS events (P<0.05) and distinct AS events from multivariate Cox regression analysis (P<0.05) were judged by Spearman correlation test. Interactions between survivalassociated splicing factor-related AS events and distinct AS events from multivariate Cox regression analysis were displayed in the form of correlation network by Cytoscape (version 3.5.0). Adjusted P values were considered significant when less than 0.05.

Statistical analysis
Statistical analysis for this study has been detailed in previous study [56].