Role of global aberrant alternative splicing events in papillary thyroid cancer prognosis

Background: Alternative splicing events have been increasingly reported for anomalous perturbations in various cancers, including papillary thyroid cancer (PTC). Methods: Integration analysis of RNA sequencing and clinical information were utilized to identify survival associated splicing events in PTC. Then, several prognosis-related splicing events were submitted to develop moderate predictors for survival monitoring by using least absolute shrinkage and selection operator model. In addition, several biomedical computational algorithms were conducted to identify pathways enriched by genes with prognostic splicing events and construct regulatory network dominated by splicing factors. Results: Survival analysis in 496 PTC patients indicated that TNM stage, tumor stage, distant metastasis and tumor status were significantly correlated with PTC patients' progression-free interval. 2799 splicing events were identified as prognostic molecular events. Functional enrichment analysis suggested that prognostic splicing events are associated with several energy metabolism-related processes. Based on these prognostic events, several prognostic signatures were developed. The final prognostic signature acted as an independent prognostic factor after adjusting for several clinical parameters. Interestingly, splicing regulatory network was constructed to display potential regulatory mechanisms of splicing events in PTC. Conclusions: Our analysis provides the status of splicing events involved in the progression and may represent an underappreciated hallmark of PTC.


INTRODUCTION
Recently, progress using high-throughput sequencing technologies has spurred cancer genome research. Strikingly, the limited number of protein-coding genes makes it difficult to account for the abundant proteomic phenotypes, especially for tumors [1,2]. Alternative splicing (AS) is a major physiological phenomenon that allows for transcript variants in mammalian cells and for subsequent reprogramming of protein diversity for better environmental fit [3]. Given the indispensable functional impact of AS in genomic analysis, it is not surprising that disruptions of AS often lead to aberrant cellular homeostasis and are linked to cancer. Characterization of the AS landscape in cancers via reliable big data to identify biomarkers provides a wealth of insight into cancers with excellent prognostic value. Hence, increasing evidence has indicated that AS is actively involved in the initiation, progression and prognosis of cancers [4,5]. Additionally, cancer researchers realize the significant clinical utility of AS and its potential as a useful therapeutic signaling target [6][7][8]. In general, AS is a complex and tightly regulated process orchestrated by limited splicing factors (SFs) [9]. Therefore, recent analyses of cancer genomes have predominantly focused on evaluating the clinical significance of AS events and SFs in tumors, their potentially pathogenic impact for downstream pathways and the development of their regulatory network.
Thyroid cancer is the most common endocrine neoplasia, where papillary thyroid cancer (PTC) represents more than 90% of all thyroid cancers [10]. In recent decades, the sharp increase in PTC morbidity has aroused substantial concern [11,12]. Although PTC is indolent in most cases, high-incidence patients develop distant metastasis and lymph node metastasis, which often lead to high mortality. Therefore, it is imperative to screen for biomarkers to characterize PTC recurrence and metastasis for effective prognostic monitoring. AS AGING is exploited by tumor cells, allowing the cells to live by sustaining a series of malignant behaviors. AS is emerging as an important diagnostic and prognostic signature to further understand tumorigenesis and to develop new targets for precision therapy in PTC [13].
To elucidate the global set of AS events, their contribution to the onset and development of PTC and how they affect molecular signaling to exert their biological function, a comprehensive analysis based on large clinical samples is urgently needed. RNA sequencing (RNA-seq) data generated by The Cancer Genome Atlas (TCGA) have supported data at the genome-wide level to identify the prognostic value of AS. Cancer-specific AS events could be effective biomarkers for clinical monitoring; however, dysregulation of splicing patterns on a transcriptomewide scale has been less well studied until recently in PTC.
To unravel the pattern of global aberrant AS and its clinical implications in PTC, we focused on global AS patterns with complete clinical information from the TCGA database. From the perspective of AS, prognostic risk score systems were constructed based on AS events to predict the prognosis of PTC. Furthermore, we study how biological processes are affected by these AS evens. To assess the regulatory relationships between AS events and SFs, an evaluation of the regulatory network in silico tools is also presented.

Cohort characteristics
We processed TCGA splice-seq files and clinical information of PTC patients. A total of 496 patients were included in the present analysis. The general clinical characteristics are summarized in Table 1.
Before exploring the prognostic values of AS events, univariate Cox hazard analyses were performed to assess the relationship between the clinical features and clinical outcome of PTC patients (Table 1). American Joint Committee on Cancer (AJCC) TNM stage (hazard   Figure 1B). Thus, one gene might have two or more AS events that were markedly related to the PFI of PTC patients. The UpSet plot vividly revealed that ES was the most common prognosis-related event, and a gene could have up to three prognosis-related events ( Figure 1C).

Molecular characteristics of survival-associated AS events
The distributions of AS events significantly correlated with patient survival are displayed in Figure 2A. The 20 most significant prognosis-related AS events are also shown ( Figure 2B-H). Among them, there were only 18 prognostic M events. To reveal the molecular characteristics of genes with survival-associated AS events, several bioinformatics analyses were conducted. First, a PPI network was constructed to demonstrate the relationships among these genes. RHOA, SRC, PXN and PTK2 ranked at the core in the network ( Figure 3). According to the functional annotations of clusterProfiler, "electron transport chain", "ribonucleoside triphosphate metabolic process" and "intracellular receptor signaling pathway" were the three most significant biological process terms ( Figure 4A). "Focal adhesion", "cellsubstrate junction" and "cell-substrate adherens junction" were the three most significant cellular component terms ( Figure 4B). For molecular function, "cadherin binding", "cell adhesion molecule binding" and "oxidoreductase activity, acting on NAD(P)H, quinone or similar compound as acceptor" were three most enriched categories ( Figure 4C). More importantly, we found that the "thermogenesis" pathway was correlated with these genes most significantly ( Figure 5).

Prognostic signatures for PTC patients
By using the least absolute shrinkage and selection operator (LASSO) Cox analysis following univariate Cox, we developed seven types of prognostic signatures based on AA, AD, AP, AT, ES, ME and RI ( Figure 6, Table 2). Furthermore, we selected the top 20 most significant survival-associated AS events in the seven types to construct the final prognostic signature (Table  3). Interestingly, we found that the seven prognostic signatures could predict the clinical outcome of PTC patients ( Figure 7). ROC curves validated the performance of prognostic signatures in prognosis prediction ( Figure 8). The final prognostic signature was the most ideal predictor ( Figure 9A). This signature could distinguish the PTC patients with distinct clinical outcome very well ( Figure 9B). After multivariate adjustment by clinical factors, the prognostic signature remained a moderate and independent prognostic indicator (HR=5.809, 95%CI:1.669-20.211, P<0.001; Figure 9C).

Survival-associated SF-AS network
AS events are mainly orchestrated by SFs, which often bind to pre-mRNAs and regulate RNA splicing via influencing exon selection and splicing site. Therefore, exploration of the SF-AS regulatory network is imperative in PTC. Survival analyses based on TCGA data suggested that 12 SFs possess the ability to predict the PFI of PTC patients ( Figure 10A). Next, correlation analyses between the expression of SFs and PSI value of the most significant AS events (P<0.0001) were conducted ( Figure 10B).

DISCUSSION
Currently, scientific research on the role of AS events in PTC still has many unanswered questions owing to the dearth of available large-sample public AS profiles and the paucity of systematic analysis referring to their clinical significance and deep molecular function. These bottlenecks have prevented cancer researchers from   AGING effectively recognizing the widespread applicability of AS events in PTC. Exploration of AS patterns broadens our vision and our understanding in traditional transcriptome molecular biomarkers. In this project, we adopted several biomedical computational approaches, which integrate the AS event profiles and clinical information of PTC patients to mine prognosis-related AS and construct splicing prognostic signatures that could stratify PTC patients into subgroups with distinct survival outcomes. Moreover, the SF-AS network could provide further insights into regulatory mechanisms in patients with PTC from the perspective of splicing.  AGING PTC are characterized as an indolent disease, effective biomarkers that predictive the clinical outcome is limited. And the effectiveness prophylactic central compartment lymph node dissection in PTC is still controversial. The prophylactic dissection should be avoided to reduce surgical complications after thyroidectomy [14,15]. Hence, it is imperative to precise prediction instead of potential "overtreatment".
In recent years, next-generation sequencing technology has extensively promoted the investigation of wholegenome analyses, including genome splicing exploration. Previously, several studies conducted SpliceSeq analyses to generate alterative splicing profiles for some types of cancer, as well as to construct prognostic signatures for cancer prognosis monitoring, including non-small cell lung cancer [16], ovarian  AGING cancer [17], bladder cancer [18], and gastrointestinal cancer [19]. To the best of our knowledge, we are the first group to integrate splicing profiles and TCGA clinical factors of PTC patients together to comprehensively investigate the prognostic value of AS. This computational bioinformatics pipeline could provide novel insights into the clinical value and potential regulatory mechanisms of AS at the genomewide level. Previously, several studies have proposed transcriptomic signatures associated with the carcinogenesis and aggressiveness of PTC [20,21]. The present in-depth study further explored alterations of transcriptomes used as prognostic predictors and could broaden our horizons in the clinical significance of transcriptomic signatures.
Given the multitude of AS events impacted by their own pre-mRNAs, the downstream functional impact is partly used to describe the molecular function of AS alteration events. In the PPI network analysis, RHOA, SRC, PXN and PTK2 were the hub genes. Notably, RHOA and SRC have both been identified as important molecules involved in the biological process of PTC. For example, phosphatidylcholine-specific RHOA has been identified as indispensable in the thyrotropininduced activation of phospholipase D in thyroid cells [22]. Src inhibitors have been proposed to be very useful in suppressing the growth of PTC [23]. Srcmediated inhibition in advanced PTC with BRAF and RAS mutations holds great promise for improving the clinical outcomes [24]. In this topic, we provided a potential novel function that is different from the traditional biological function. These findings also pave the way for future clinical applications. Functional enrichment analyses suggest that, in PTC, genes with prognosis-related AS events are associated with several energy metabolism-related pathways and therefore may provide a selective advantage to cancer cells through reprogramming of the energy metabolism. Tumor cells fuel wild growth that requires adjustments of energy metabolism [25]. The thyroid gland plays an irreplaceable role in body metabolism, owing to the various hormones produced by the thyroid. Multiple studies have reported that serine/glycine, glutamine metabolism-related protein expression is different according to the thyroid cancer subtype [26,27]. Quite surprisingly, precision metabolism target therapies of thyroid cancer have been investigated in recent decades [28][29][30]. Our findings proposed a panel of AS events that exert their biological functions in metabolic alterations of PTC.
The highlight of the current study was that we proposed prognostic signatures based on AS events for monitoring the prognosis of PTC patients. Owing to the favorable overall survival of PTC patients, PFI was selected as the endpoint. Recently, some multimolecule-based signatures in PTC have been proposed. Bisarro Dos Reis M et al. generated a prognostic scoring system in well-differentiated thyroid carcinoma AGING 21 DNA methylation probes [31]. Cai WY et al. also developed a prognostic signature by combining two mRNAs and two long noncoding RNAs for predicting the progression and prognosis of PTC [20]. Among patients with PTC, there remains a huge demand for additional tools to improve clinical management. Furthermore, different molecular biomarker-based signatures could provide different perspectives. Here, we used the LASSO Cox regression model to screen out a set of AS events to facilitate clinical transfer. Prognostic signatures we proposed displayed ideal performance in predicting the clinical outcome of patients. However, another independent cohort should be used to validate the efficiency.
An enormous number of AS events in organism cells are orchestrated by limited SFs [32]. The alteration spectrum of AS events that occur in multiple tumor types highlights splicing factors as an important mechanism of splicing deregulation in cancer [33]. Alterations of SFs in PTC are increasingly considered as independent molecules involved in carcinogenesis and progression via various mechanisms [34][35][36]. Splicing correlation network analysis has also clearly revealed larger regulated nodes, suggesting the prominent positions they hold in the SF-AS network. NOVA1 was highly connected in the network could contribute to the prognosis induced by splicing events. NOVA1 has been reported to play a significant role in carcinogenesis [37][38][39]. However, the role of NOVA1 in PTC has not been explored. Nevertheless, our algorithm suggested deregulated AS events as a hallmark of PTC. However, several limitations inevitably influenced the results' reliability. First, PTC are characterized by an AGING indolent disease course. As a consequence, a significant proportion of patients considered to be disease-free could develop a disease recurrence in the subsequent follow-up. Second, another independent validation cohort should be used to validate in the future. Third, more in vivo and in vitro functional experiments are clearly needed to further explore the impact of dysregulated AS events and SFs in cancer development.
In summary, the current study established a detailed phenomenon base of prognosis-associated AS events in PTC patients, which is valuable in deciphering the functional contribution of AS events in PTC. These findings should facilitate the ongoing effort in developing novel genomic models for clinical cancer management. Moreover, the further identification of prognostic splicing factors and construction of the SF-AS network will pave the way for further exploration of splicing-related mechanisms.

Data acquisition
TCGA SpliceSeq [40] is a data portal that provides AS profiles across 33 tumors based on TCGA RNA-seq data. SpliceSeq evaluates seven types of splice events, including AA, AD, AP, AT, ES, ME, and RI. To quantify AS events, TCGA SpliceSeq processed the percent spliced in (PSI) value for cancer research analysis, which indicates the inclusion of a transcript element divided by the total number of reads for that AS event. Alterations in PSI values range from 0 to 100 (%), which suggests a shift in splicing events. AS events with a PSI value of more than 75% in thyroid carcinoma cohort samples were downloaded from the TCGA SpliceSeq database. The AS events with standard diversion < 1 were removed.
Clinical information of PTC patients was also downloaded and extracted from the pan-cancer atlas database of TCGA [41]. The PFI was used as the endpoint for survival analysis owing to the limited number of events for overall survival. More importantly, PFI records more endpoint events during the follow-up period. Only pathologically confirmed PTC patients with both follow-up and AS event data were included for our analysis. The same TCGA ID was used to integrate clinical information and AS events data.

Survival analysis
In the survival analysis, the follow-up periods ranged from 91 days to 5423 days after removing patients with PFIs of less than 90 days. Univariate Cox analysis was conducted to assess the relationships between the PSI value (from 0 to 100) of each AS event and the PFI of PTC patients (P < 0.05). LASSO method is a popular method for the regression of high-dimensional predictors [42]. LASSO has been extended for use in Cox regression survival analysis and is ideal for highdimensional data. We selected the LASSO Cox regression model to determine the ideal coefficient for each prognostic feature and to estimate the deviance likelihood via 1-standard error (SE) criteria. The coefficients and partial likelihood deviance were calculated with the "glmnet" package in R.

Functional annotation
Functional annotation for the genes with survivalassociated AS events was performed by the bioinformatics tool "clusterProfiler" to comprehensively explore the dysregulated functional relevance of these genes [43]. Gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) were used to assess the functional categories. GO and KEGG terms with a P-value and q-value both smaller than 0.05 were considered significant categories.

Prognostic signature construction
The top 20 most significant AS events in univariate Cox analysis were submitted to LASSO regression Cox analysis to develop prognostic signatures based on seven types of AS events. Finally, prognostic signatures for PFI prediction were calculated by multiplying the PSI values of prognostic indictors and the coefficient assigned by LASSO Cox analysis. The evaluation of the splicing-based prognostic signature as an independent predictor was conducted by integrating the following clinical parameters into the multivariable Cox regression analysis: age (≥50 and <50), gender (male or female), AJCC TNM stage (stage III/IV or stage I/II), tumor stage (T3-T4 or T1-T2), lymph node metastasis (positive or negative), distant metastasis (positive or negative), neoplasm focus type (multifocal or unifocal) and tumor status (with tumor or tumor-free).

SF-AS regulatory network
A compendium of 404 splicing factors was obtained from a previous study [44]. The expression profiles of SF genes were curated from the TCGA dataset. The count value of SF level-3 mRNA data was downloaded and converted to log2(count+1) for further univariate Cox analysis. We selected axes between the expression value of SFs and PSI values of prognosis-related AS events to construct the SF-AS regulatory network according to the following conditions: P value less than 0.05 and Pearson correlation coefficient more than 0.3. Then, we built the correlation plots via Cytoscape version 3.6.1.