Identification of RNA-binding protein SNRPA1 for prognosis in prostate cancer

Prostate cancer is one of the deadliest cancers in men. RNA-binding proteins play a critical role in human cancers; however, whether they have a significant effect on the prognosis of prostate cancer has yet to be elucidated. In the present study, we performed a comprehensive analysis of RNA sequencing and clinical data from the Cancer Genome Atlas dataset and obtained differentially expressed RNA-binding proteins between prostate cancer and benign tissues. We constructed a protein-protein interaction network and Cox regression analyses were conducted to identify prognostic hub RNA-binding proteins. SNRPA1 was associated with the highest risk of poor prognosis and was therefore selected for further analysis. SNRPA1 expression was positively correlated with Gleason score and pathological TNM stage in prostate cancer patients. Furthermore, the expression profile of SNRPA1 was validated using the Oncomine, Human Protein Atlas, and Cancer Cell Line Encyclopedia databases. Meanwhile, the prognostic profile of SNRPA1 was successfully verified in GSE70769. Additionally, the results of molecular experiments revealed the proliferative role of SNRPA1 in prostate cancer cells. In summary, our findings evidenced a relationship between RNA-binding proteins and prostate cancer and indicated the prognostic significance of SNRPA1 in prostate cancer.


INTRODUCTION
Prostate cancer (PCa) is one of the leading urogenital malignancies worldwide. PCa ranked first in incidence and, following lung cancer, had the second highest mortality rate in American men accounting for approximately 1 in 5 newly diagnosed cases in 2019 [1]. Numerous men with metastatic PCa who receive androgen deprivation therapy develop castration resistance, which is associated with a 5-year mortality rate of over 80% [2,3]. The pathogenesis of PCa is complex and involves copious genetic aberrations [4]. Therefore, further understanding of the molecular dysfunction and identification of significant biomarkers are critical for early diagnosis and better prognosis in PCa.
AGING RNA-binding proteins (RBPs) mediate the interactions of various RNAs and the formation of ribonucleoproteins to control post-transcriptional regulation of gene expression [5]. Additionally, they play an essential role in regulating the metabolism of RNA, such as splicing, translation, and localization [6]. To date, over 1500 RBPs have been identified in humans [7]. However, the specific biological functions of most RBPs have yet to be fully elucidated. Due to the regulatory roles of RBPs, recent studies have focused on RBP dysfunction in cancer initiation and progression [8,9]. Reports suggest that Musashi RNA Binding Protein 1 (MSI1) was overexpressed in several tumors of the central nervous system by regulating the Notch signaling pathway [10,11]. Moreover, the overexpression of RNA-binding motif protein 3 (RBM3) facilitated proliferation and drug resistance via the β-catenin pathway in colon cancer cells [12,13]; however, it indicated a favorable prognosis in breast cancer [14]. Additionally, eukaryotic translation initiation factor 4E (eIF4E) overexpression led to the development of B cell lymphomas and facilitated lymphomagenesis [15,16]. Moreover, the inhibition of eIF4E in human tumor xenografts significantly induced apoptosis and suppressed tumor growth [17]. Based on these findings, it is important to conduct systematic research on the role of RBPs in carcinogenesis.
Whether specific RBPs play a critical role in the pathogenesis of PCa has yet to be elucidated. In this study, we performed an integrated analysis of RNA sequencing data from The Cancer Genome Atlas (TCGA) database and identified hub RBP genes related to PCa prognosis. The biological functions and clinical traits of RBPs have been successfully identified and validated in multiple databases and molecular experiments. Therefore, the aim of this study was to enhance the current knowledge by providing novel information regarding PCa-related RBPs and potential biomarkers to better predict the development and prognosis of PCa.

Analysis of differentially expressed RBPs in PCa
A flow diagram of the procedure undertaken in this study is illustrated in Figure 1. After screening for duplicate entries, 495 tumor and 52 normal samples including the mRNA expression profiles of 1472 RBPs were selected from TCGA for further analysis. After processing using the R package, a total of 186 significantly differentially expressed RBPs, including 82 downregulated and 104 upregulated genes, were identified (|log2FC| > 0.5 and adjusted p value < 0.0001). A heatmap presenting the expression profiles of RBPs is depicted in Figure 2.

Identification of biological functions of RBPs
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted to reveal the biological roles of RBPs. The GO and KEGG profiles are presented in Figure  3A, 3B. RBPs were mainly involved in biological processes including RNA splicing via multiple pathways and mRNA metabolism ( Figure 3C). In addition, cellular components significantly associated with RBPs were ribonucleoprotein granules and ribosomes. Moreover, RBPs were significantly enriched in molecular function related to the catalytic and binding activity of RNA. KEGG pathway analysis revealed that RBPs mainly participate in RNA transport, surveillance, degradation, and ribosome activities ( Figure 3D).

PPI network and identification of hub RBPs
To expound RBPs that play a key role during the pathogenesis of PCa, significant RBPs were analyzed using the STRING database. Figure 4A visualizes the protein-protein interaction (PPI) network including 160 nodes and 599 interactions. Significant RBPs were ranked according to degree using Cytoscape plugin cytoHubba. Finally, a gene module comprising 45 hub RBPs was identified ( Figure 4B). RRS1, SNRPA1, ELAVL2, and BOP1 were involved in this module.

Prognostic analysis of hub RBPs
Of the 45 hub RBPs, 7 candidate genes were related to prognosis with regards to disease-free survival (DFS) in 489 patients, determined by both the log-rank test and univariate Cox regression methods (p value < 0.05). After stepwise multivariate Cox regression analysis, SNRPA1, DDX39B, and ESRP2 remained significant in the model and were regarded as prognosis-related RBPs (p value < 0.05; Supplementary Table 1). The survival curves demonstrated that a higher expression of SNRPA1 and DDX39B and lower expression of ESRP2 were significantly related to worse DFS (p = 0.00551, 0.00011, and 0.0194, respectively; Figure 5A-5C). Furthermore, the expression profiles of SNRPA1 and DDX39B remained significantly associated with overall survival (OS) (p = 0.00955 and 0.0022, respectively; Figure 5D, 5E). This suggests that the abovementioned RBPs are suitable prognostic markers of PCa. SNRPA1 had the highest hazard ratio (HR) value and has not previously been reported in PCa. Based on this, SNRPA1 was selected for further analysis.

Clinical significance of SNRPA1
To reveal the clinical relevance of SNRPA1, we first compared its mRNA expression levels in tumor and normal samples. SNRPA1 was highly expressed in PCa (p < 0.0001; Figure 6A). Additionally, SNRPA1 expression was positively associated with Gleason score (p < 0.0001; Figure 6B). It appeared that SNRPA1 expression increased as the Gleason score increased from 6 to 10. Thereafter, the relationship between pathological TNM stage and SNRPA1 was explored. As depicted in Figure 6C, the T stage demonstrated a similar trend to that of the Gleason score; the higher expression of SNRPA1 corresponded with the advanced T stage (p = 0.033). Finally, samples in the N1 stage had higher expression of SNRPA1 compared with those in the N0 stage (p = 0.011; Figure 6D). Since less than 5 samples exhibited distant metastasis, the correlation between SNRPA1 and M stage was not evaluated. Based on these findings, SNRPA1 evidenced clinicopathological significance in PCa.

External validation of SNRPA1 across multiple databases
To further verify the effect of SNRPA1 in PCa, the expression and prognostic profiles of SNRPA1 were evaluated using the Oncomine, Human Protein Atlas (HPA), Gene Expression Omnibus (GEO), and Cancer Cell Line Encyclopedia (CCLE) databases. The Oncomine database contained 4 studies where SNRPA1 expression was significantly higher in PCa tissue compared to that of normal prostate gland tissue (2.2fold to 2.4-fold increase; p < 0.05; Figure 7A). Similarly, the HPA database demonstrated that SNRPA1 was strongly positive in PCa tissue and moderately positive in normal tissue (antibody HPA045622; Figure 7B). Moreover, the clinical data of 94 patients in GSE70769 were analyzed. Consistent with AGING the prognostic results of DFS and OS from TCGA, increased expression of SNRPA1 in the GSE70769 cohort was significantly associated with worse DFS (p = 0.0137; Figure 7C). Finally, the CCLE database revealed that SNRPA1 was highly expressed in PCa compared to other solid tumor types, except for hematological malignancy, and it was not differentially expressed in various PCa cell lines (Supplementary Figure 1A, 1B).

External validation of SNRPA1 using clinical specimens and molecular experiments
In addition to verifying the effects of SNRPA1 across multiple datasets, the expression levels of SNRPA1 in four pairs of PCa and normal samples were detected by western blotting. Consistent with the results from other datasets, the expression of SNRPA1 at protein level was significantly higher in the tumor group compared to that of the normal group ( Figure 8A, 8B).
Next, we conducted molecular experiments in vitro. After shRNA transfection targeting SNRPA1, the expression levels of SNRPA1 significantly decreased in both CWR22Rv1 and C4-2b cells ( Figure 8C, 8D). SNRPA1 inhibition also decreased tumor cell migration and colony formation ( Figure 8E-8H). Finally, cell proliferation was significantly inhibited when SNRPA1 was downregulated in CWR22Rv1 and C4-2b cells ( Figure 8I). These results reveal SNRPA1 indicates poor prognosis in PCa.

GSEA of SNRPA1
Gene Set Enrichment Analysis (GSEA) was conducted using samples with high and low expression levels of SNRPA1 to examine enriched pathways ( Figure 9). In the hallmark gene set, the upregulation of SNRPA1 was significantly involved in pathways associated with DNA repair, mTORC1 signaling, MYC, and E2F targets.   AGING However, the downregulation of SNRPA1 significantly corresponded to signaling pathways related to TGF-β signaling, Notch signaling, epithelial-mesenchymal transition, and KRAS signaling, which are mainly involved in tumorigenesis.

DISCUSSION
Globally, PCa is one of the deadliest urogenital tumors in men [18]. Therefore, identifying key genes that can be used as biomarkers is necessary for early diagnosis and prognosis, especially in castration resistance. Many studies have evaluated gene expression profiles in PCa, such as miRNAs, lncRNAs, and autophagy-related genes [19,20]. However, the systematic evaluation of the role of RBPs in PCa has yet to be reported. Additionally, only a few RBPs have been explored in depth in relation to the pathogenesis of cancer. In this study, we first examined differentially expressed RBPs between PCa and normal samples using RNA sequencing data from TCGA. After analyzing the biological functions of significant RBPs, survival analyses were used to identify prognosis-related RBPs, after which SNRPA1 was selected for further study. The clinical traits of TCGA data demonstrated that SNRPA1 had a positive correlation with Gleason score and TNM stage; higher SNRPA1 expression was related to worse prognosis. Additionally, the expression and AGING prognostic profiles of SNRPA1 were successfully validated by GEO, Oncomine, HPA, and CCLE data. Finally, molecular experiments that downregulated SNRPA1 expression evidenced inhibitory functions in PCa cells. These results provide novel information thereby enhancing our understanding of the role of RBPs in PCa development and prognosis.
Using the threshold of |log2FC| > 0.5, we identified differentially expressed RBPs, which were associated with GO terms and KEGG pathways. We evidenced that RNA splicing was the most significant biological process related to RBPs. Changes in mRNA splice patterns play a key role in the pathogenesis of PCa [21]. Gene function could be altered by mRNA isoform switching in PCa through the use of alternative promoters via the androgen receptor (AR). For example, TSC2, a normal tumor suppressor gene in PCa, facilitates cell proliferation under the androgen-driven switch in the mRNA isoform [22]. Additionally, the alternative splicing pattern of the TMPRSS2-ERG fusion gene decreased the skipping of two exons associated with more clinically advanced PCa [23]. Moreover, AR mRNA splicing has an important effect on castration resistance, which could promote PCa cell growth when androgen concentrations are low [24]. AR-v7, a common AR splice variant, increased substantially as patients progressed to castration-resistant prostate cancer [25]. Moreover, the results of cellular component analysis focused on ribonucleoprotein granules, which are involved in biosynthesis. RBPs perform functions mainly by forming ribonucleoprotein complexes with RNA targets as well as transporting, supervising, and degrading RNA [26]. Our KEGG pathway analysis revealed that all of these functions were enriched Significant RBPs in the PPI network with a medium confidence score were further screened and 45 hub genes were identified. The expression of MBNL1 isoforms lacking exon7 inhibits cell viability and induces DNA damage and could be a negative protein implicated in PCa [27]. Lee et al. [28] reported that GNL3 was associated with low DFS and harbored SNPs related to oncogenesis in PCa. Therefore, GNL3 could be a novel metastasis susceptibility gene in PCa. Liang et al. [29] identified RPL22L1 as a diagnostic and prognostic biomarker in PCa as it promoted PCa cell proliferation and invasion in vitro. These hub RBPs were involved in the log-rank test as well as the univariate and multivariate COX regression analyses, and SNRPA1, DDX39B, and ESRP2 were identified. The survival curves demonstrated that all three of these RBPs had significant prognostic profiles with regards to DFS. SNRPA1 and DDX39B remained significantly correlated with OS, although more than 90% of patients AGING in this cohort were still alive at the endpoint. Research has evidenced that ESRP2 is highly expressed in primary PCa and associated with disease progression by androgen-dependent splicing switches [30]. Additionally, DDX39B contributes to the generation of AR-V7 in PCa [31]. Therefore, knockdown of DDX39B could lead to dramatically and selectively downregulated AR-v7 expression. Compared with ESRP2 and DDX39B, SNRPA1 indicated a higher chance of survival based on Cox regression analysis but had not previously been reported in PCa. Thus, we focused on SNRPA1 in PCa for further study.
Firstly, we examined the clinical relevance of SNRPA1. SNRPA1 was significantly associated with Gleason score and pathological TNM stages in PCa patients. Furthermore, it appeared that SNRPA1 expression level was positively correlated with Gleason score. Similarly, higher mRNA expression of SNRPA1 was significantly associated with a more advanced TNM stage. Further, SNRPA1 profiles were validated across multiple databases. Consistent with TCGA data, SNRPA1 was highly expressed in PCa samples based on four studies from Oncomine and HPA. Meanwhile, SNRPA1 had a similar predictive capacity of prognosis as GEO data, and the prognostic profile was well validated. Moreover, the results of clinical specimens by western blotting demonstrated a higher expression level of SNRPA1 in PCa tissue. Furthermore, in vitro studies revealed that the inhibition of SNRPA1 could significantly decrease PCa cell migration, proliferation, and colony formation. These results indicate SNRPA1 plays a proliferative role in PCa. SNRPA1, known as small nuclear ribonucleoprotein polypeptide A, belongs to the spliceosome family and is responsible for processing pre-mRNA into RNA [32]. Additionally, SNRPA1 is one of the key players in the regulation of pluripotency-specific spliceosome assembly [33]. Wu et al. [34] evidenced that the loss of SNRPA1 caused insufficient mRNA splicing and resulted in the failure of spermatocyte differentiation. In cancer research, SNRPA1 bound to the insertion allele of rs386772267 related to pancreatic cancer and influenced gene expression associated with RNA processing and decay [35]. Zeng et al. [36] discovered that SNRPA1 is widely expressed in colorectal cancer cell lines and that downregulation inhibits cell proliferation. Similarly, SNRPA1 was highly expressed in hepatocellular carcinoma and correlated with the clinical stage and OS in hepatocellular carcinoma patients [32]. In our study, we conducted GSEA analysis of SNRPA1 in PCa. The results indicated that a high-risk score was associated with DNA repair, mTORC1 signaling, MYC, and E2F targets, which are also mainly involved in the pathogenesis of PCa [37][38][39]. Based on these results, we propose that the novel RNA-binding protein SNRPA1 plays an essential role in the pathology and prognosis of PCa.
There are some limitations of this study. Firstly, the study did not explore the specific signaling pathways related to SNRPA1 in PCa, although function enrichment and GSEA analyses were conducted. Furthermore, in-depth molecular experiments need to be conducted to reveal more profound profiles of SNRPA1 as well as clinical data for prognostic analysis in PCa.
In summary, we performed a comprehensive analysis of the role of RBPs in PCa based on RNA sequencing and clinical data. After screening using interaction and survival analysis, SNRPA1 was identified as a significant hub gene related to the prognosis of PCa. Finally, the expression and prognostic profiles were evaluated with clinical traits and well validated across multiple databases and molecular experiments. This is the first study to examine the role of RBPs in PCa. Therefore, our results provide novel information to improve our understanding of PCa development and prognosis.

Data acquisition and processing
Transcriptomic data via RNA sequencing of PCa were downloaded from TCGA database (https://portal.gdc.cancer.gov/), containing 498 tumor and 52 adjacent normal tissues as well as clinical information in May 2020. This database includes the mRNA expression profiles of 1472 RBPs, which were used in this research study [7]. Thereafter, differentially expressed RBP genes were calculated between tumor and normal tissues by "Limma" package [40] in R software (Version 3.6.2). Differentially expressed genes (DEGs) with thresholds of |log2FC| > 0.5 and adjusted p value < 0.0001 were selected for further analysis.

Functional enrichment analysis
To reveal the biological functions of significant RBPs, GO, including molecular function, cellular component, and biological process, and KEGG pathway enrichment analyses were conducted by R package "clusterProfiler" [41] with thresholds of adjusted p value < 0.05 and q value < 0.05. The results were visualized using R package "GOplot" [42].

Screening of hub RBPs based on PPI network
Significant RBPs were put into the STRING database (http://string-db.org) [43] to expound the interactions AGING between nodes. After disregarding disconnected nodes and screening using an interaction score > 0.4, selected RBPs were further visualized by constructing the protein-protein interaction (PPI) network using Cytoscape software (Version 3.7.1) [44]. RBPs with degrees ≥ 10 were identified as hub RBPs using the cytoHubba plug-in [45].

Prognostic analysis
To identify the prognostic value of hub RBPs, log-rank tests and univariate Cox regressions were executed for survival analysis with clinical data from TCGA [46]. Hub RBPs with p values < 0.05 (in both methods) were regarded as prognosis-related candidate RBPs and included in the multivariate Cox model using stepwise regression. Finally, prognosis-related RBPs, with p values < 0.05, were identified. Thereafter, Kaplan-Meier survival curves were plotted between survival status and different groups according to the medium value or optimal cut-off value for gene expression levels of prognosis-related RBPs. The primary endpoint was disease-free survival (DFS) and the secondary endpoint was the overall survival (OS). P values < 0.05 were regarded as statistical significance.

Clinical significance in TCGA
To validate the clinical correlation of prognostic RBPs, their expression profiles were compared between tumor and adjacent normal tissues. Additionally, the relationship between clinical information including Gleason score and TNM stage, and expression levels of prognostic RBPs were analyzed. The Wilcox test and Kruskal-Wallis test were used to make comparisons between two groups or multiple groups, respectively. P values < 0.05 were regarded as statistical significance.

External validation across multiple databases
To further verify the expression profiles of prognostic RBPs, we first used the Oncomine database (http://www.oncomine.com) [47] to compare the transcriptional expression between PCa and normal prostate gland tissues using the Student's t-test. Thereafter, immunohistochemical data were obtained from the HPA database (https://www.proteinatlas.org/) [48] to compare the levels of protein encoded by genes. Immunohistochemistry results of tumor and normal tissues were detected by the same antibody. Another dataset, GSE70769 [49], including the clinical data of 94 patients, obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo) was used to validate the predictive capability of prognostic RBPs with DFS as the endpoint. Finally, the CCLE database (http://www.broadinstitute.org/ccle/home) [50] was used to reveal the expression of prognosis-related RBPs in multiple solid tumors and PCa cell lines.

GSEA analysis
To explore the potential signaling pathways underlying the gene signature between different expression levels of prognostic RBPs, we conducted GSEA with the hallmark ("hallmark.all.symbols.gmt") gene sets collection based on TCGA [51]. The nominal (NOM) p values < 0.05 and false discovery rate (FDR) q values < 0.25 were regarded as statistical significance.

Identification of SNRPA1 in molecular experiments
The PCa cell lines CWR22Rv1 and C4-2b were cultured in RPMI1640 medium supplemented with 10% fetal bovine serum. Short hairpin RNAs (shRNAs), targeting SNRPA1 or the shRNA negative control, were transfected into CWR22Rv1 and C4-2b cells. Protein samples were extracted from patients with PCa and shRNA-transfected cells. After electrophoresis and incubation with primary antibodies against SNRPA1 (ab128937; Abcam,) and vinculin (#13901, Cell Signaling Technology) as well as secondary antibodies, the expression levels of target proteins were analyzed with an ECL kit and exposure system (Bio-Rad Laboratories).
To reveal the role of SNRPA1 in PCa cells, a colony formation assay was conducted with CWR22Rv1 and C4-2b cells in 6-well plates. After 2 weeks, cell colonies were fixed with 4% paraformaldehyde, stained with 0.2% crystal violet, imaged using microscopy, and counted using ImageJ software. A scratch assay was performed by scratching straight lines with a 200-μm pipette tip into the monolayer of CWR22Rv1 and C4-2b cells, cultured in 6-well plates. After 48 hours, the cells were imaged using microscopy and the scratch widths were measured to determine the migration and invasion of cells. Finally, cell proliferation was measured by a cell counting kit-8 (C0037, Beyotime) according to the manufacturer's instructions, and assessed by measuring absorbance at 450 nm.
Data analysis was conducted using ImageJ software (1.50i) and Statistical Package for Social Science (SPSS 23.0). P values < 0.05 were regarded as statistically significant. Results were visualized as figures using GraphPad Prism 8.0. This study was approved by the Ethics Committee of Tongji Medical College, Huazhong University of Science and Technology.

CONFLICTS OF INTEREST
No potential conflicts of interest are to be disclosed by authors.