Pol3Base: a resource for decoding the interactome, expression, evolution, epitranscriptome and disease variations of Pol III-transcribed ncRNAs

Abstract RNA polymerase III (Pol III) transcribes hundreds of non-coding RNA genes (ncRNAs), which involve in a variety of cellular processes. However, the expression, functions, regulatory networks and evolution of these Pol III-transcribed ncRNAs are still largely unknown. In this study, we developed a novel resource, Pol3Base (http://rna.sysu.edu.cn/pol3base/), to decode the interactome, expression, evolution, epitranscriptome and disease variations of Pol III-transcribed ncRNAs. The current release of Pol3Base includes thousands of regulatory relationships between ∼79 000 ncRNAs and transcription factors by mining 56 ChIP-seq datasets. By integrating CLIP-seq datasets, we deciphered the interactions of these ncRNAs with >240 RNA binding proteins. Moreover, Pol3Base contains ∼9700 RNA modifications located within thousands of Pol III-transcribed ncRNAs. Importantly, we characterized expression profiles of ncRNAs in >70 tissues and 28 different tumor types. In addition, by comparing these ncRNAs from human and mouse, we revealed about 4000 evolutionary conserved ncRNAs. We also identified ∼11 403 tRNA-derived small RNAs (tsRNAs) in 32 different tumor types. Finally, by analyzing somatic mutation data, we investigated the mutation map of these ncRNAs to help uncover their potential roles in diverse diseases. This resource will help expand our understanding of potential functions and regulatory networks of Pol III-transcribed ncRNAs.

Recent researches have characterized multiple Pol IIIassociated transcription factors (TFs) to regulate the transcription of Pol III-transcribed ncRNAs (6)(7)(8)(9)(10). Importantly, the development of chromatin immunoprecipitation followed by sequencing (ChIP-seq) delineates the genomewide transcriptional profile of Pol III-associated TFs (11)(12)(13)(14). In addition, UV cross-linking and immunoprecipitation coupled to high-throughput sequencing (CLIP-seq) is a technology developed to define the genome-wide profiling of RNA-RBP (RNA-binding protein) interactions (15) and is useful to investigate the functions and mechanisms of Pol III-transcribed ncRNAs. Recently, the transcriptome mapping of RNA modifications gives the possibility to identify RNA modifications on Pol III-transcribed ncRNAs (16,17). Moreover, tremendous amount of small RNA sequencing (sRNA-seq) data generated by multiple consortium projects, such as ENCODE (18), TCGA (19) provides new opportunities to understand the expression and function of Pol III-transcribed ncRNAs. Therefore, it is necessary to integrate these sequencing data to explore the dynamic expression, functions, regulatory network and D280 Nucleic Acids Research, 2022, Vol. 50, Database issue clinical implications of Pol III-transcribed ncRNAs in physiological and pathological processes.
In this study, we developed Pol3Base (http://rna.sysu.edu. cn/pol3base/ or http://biomed.nscc-gz.cn/DB/Pol3Base/) for decoding the interactome, expression, evolution, epitranscriptome and disease variations of Pol III-transcribed ncRNAs from multi-omics sequencing data ( Figure 1). In Pol3Base, we performed a large-scale integration of Pol IIItranscribed RNAs in human and mouse, and deciphered their regulatory relationships with dozen of TFs in various cells. Pol3Base also illustrated the global map of RNA modifications in Pol III-derived ncRNAs. Combining with CLIP-seq and sRNA-seq, we investigated the association between Pol III-derived ncRNAs and >200 RBPs and detected tRNA-derived small RNAs (tsRNAs) in 32 different tumor types. Importantly, we elucidated the expression profiles of these ncRNAs in 75 normal tissues and 28 cancer types and their adjacent tissues. We further characterized the mutation map on Pol III-transcribed ncRNAs in 29 different diseases and cancers by analysis of somatic mutation data. Notably, we probed the common features and evolutionary conservation of these ncRNAs between human and mouse, which may provide a valuable lesson for studying their functions in different species. Pol3Base provides a variety of web modules and graphic visualizations to investigate the potential functions and mechanisms of Pol III-derived ncRNAs ( Figure 2).

Association analysis of Pol III-derived ncRNAs and RBPs
RBP binding sites of human and mouse were curated from starBase (30), ENCODE (22) and POSTAR2 (31) (Supplementary Table S2). Peaks came from the same dataset were merged by BEDTools (32) and further applied for strand specific intersecting with Pol III-derived ncRNAs. They were considered as interacting pairs if the overlap length is >20% of the peaks.

Expression profiles of Pol III-transcribed ncRNAs
sRNA-seq datasets of 75 cell lines and tissues were retrieved from ENCODE, and RNA-seq datasets of 28 tumor types were curated from TCGA (19). The featureCounts software (v1.6.0) (36) was applied to bam files for transcript count quantifications. And the raw counts data was recomputed as the Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values to calculate the expression of transcripts. The expression value in different normal tissues or tumors was normalized by z-score or mean.

tsRNAs produced from Pol III-transcribed tRNAs in tumors
tsRNAs were downloaded from tRF2Cancer database (37) and they were classified into six different types including tiRNA-5, tiRNA-3, tRF-5, tRF-3, tRF-i and tRF-1 according to the position of cleavage site. These sites were selected with a significant enrichment of small RNAs with Pvalue < 0.01 (binomial distribution). All these tsRNAs were mapped to the Pol III-derived tRNAs.

Evolutionary conservation of Pol III-transcribed ncRNAs in different species
All human and mouse Pol III-derived ncRNAs sequences were extracted from the corresponding genome and then were aligned to each other by blastn (v2.12.0) (38). 'MaxLen' was defined as the longer length of ncRNA and the reference. Then the aligned pairs whose overlapped length was greater than 90% of maxLen were considered as conserved candidates. Based on that, after filtering with E-value <1e-5 and accuracy (exact matches / Identities) > 85%, evolutionary conserved ncRNAs were classified into three types: (i) 'query-full' means the whole ncRNA was

Database and web interface implementation
All data sets were processed and stored in a MySQL Database Management System. The database query and user interfaces were developed using PHP and JavaScript. The query result tables are based on jQueryUI and DataTables, which is a highly flexible tool for sorting and filtering the search result. The diagrams in the web pages are implemented by Highcharts.

The annotation and identification of regulatory relationships between TF and Pol III-transcribed ncRNAs
To explore the regulatory relationships of between TF and Pol III-transcribed ncRNAs, the gene locus of the ncRNAs were intersected with all Pol III-associated TFs (Supplementary Table S1). In total, we identified 51 097 ncRNAs that regulated by 11 TFs from 10 cells in human (Table 1), and 8234 ncRNAs regulated by four TFs from 4 cells in mouse ( Table 2).
In TF-RNA browser page, users can explore any items interested them to get the relationships of TF-RNA, such as different Pol III transcription factors and different experiments. For users' convenience, we also provide a main search box in the right panel to provide a quick search function of ncRNA symbol or other items. Moreover, for each TF interacted Pol III-transcribed ncRNA, we provide an outlink to a new page showing the detail information including genomic loci, gene name, type and the fragments that bound by TF. We also offer one search tool for users in the 'Tool' page and they can input gene location, gene sequence or gene name to further investigate whether it hits an annotated Pol III-associated ncRNA in the database. In addition, we provide the consensus sequences of the Pol IIIassociated TFs in the 'Motif' page.

Exploration of the interactome between Pol III-derived ncR-NAs and RBPs
Accumulating evidence demonstrated that RNA-RBP interactions play crucial roles in the RNA metabolism, including transcription, processing, function and degradation (39,40). However, researches on the relationships between Pol III-derived ncRNAs and RBPs lack systematic investigation. Therefore, we intersected tens of millions of RBP binding sites of 283 RBPs for human and mouse with the Pol III-derived ncRNAs and identified over 20 000 interacting pairs covered 102 339 RBP binding sites of 184 RBPs for human, and >10 000 sites of 54 RBPs for mouse. On the 'RNA-RBP' page, users can browse the Pol III-derived ncRNA-RBP interactions by ncRNA type. Similarly, this page also provides a quick search module for users to inquire interested ncRNA by its RNA symbol.

Decoding the tsRNAs derived from Pol III-transcribed tR-NAs in diverse tumor types
tRNA-derived small RNAs (tsRNAs), which participate in various physiological and pathological processes and function as key players in the occurrence and development of tumors, are generated from mature or precursor Pol IIItranscribed tRNAs. (41). To help users better investigate the biological roles of Pol III-transcribed tRNAs, we curated about 5000 tsRNAs in 32 tumor types. Finally, we obtained 4336 Pol III tRNA-derived tsRNAs and provided them with a series of information, including the correspond-ing tsRNA ID, tumor specificity, genomic coordinate, type and sequence.

Exploring the expression profiles of Pol III-transcribed ncR-NAs in diverse normal tissues and tumor types
Specific quantitative expression of ncRNAs in certain tissues or cells is often used to study the function of ncRNAs in biological processes. To do this for Pol III-transcribed ncRNAs, we provided two webpages to quantitate their expression in normal cells or tissues and tumors. Two normalized expression values, including mean value and z-score value, of these ncRNAs were available for understanding the relative expression in different tissues or tumor types. Firstly, we provided 'Expression' page to explore the expression profiles of these ncRNAs in 75 cells and normal tissues and the corresponding detail page to display the real expression value of ncRNAs and their variance. In addition, we provided 'Pan-Cancer' page including two sub-pages: 'ncRNA case' sub-page is used to investigate the expression profiles of these ncRNAs in 28 tumor types and their adjacent tissues; and 'cancer case' sub-page shows the expression profile of the certain ncRNA in all samples of a specific tumor type.

Mutation map of Pol III-transcribed ncRNAs in various diseases
We further elucidated the map of mutation residues on these ncRNAs in different cancers and diseases. We collected 6 mutation types with about 10 million residues and mapped them to Pol III-transcribed ncRNAs. Finally, we obtained 10 529 mutation residues on 10 610 ncRNAs, including 14 552 Subs, 329 Ins, 605 Del and 3 DelIns.

Evolutionarily conservation of Pol III-transcribed ncRNAs in mammals
Previous studies suggested that the common features of Pol III-transcribed ncRNAs affect their structure and downstream functions, among which the role in the assembly and function of RNPs is evolutionarily conserved (42,43). In addition, we deciphered interspecies evolutionarily conservation of these ncRNAs by aligning ncRNAs to each other's transcriptome. After filtering with E-value < 1e-5 and accuracy >85%, we finally identified 2528 and 1402 evolutionary conserved ncRNAs for human and mouse respectively, which were classified into three types: 'query-full', 'ref-full' and 'part'.

DISCUSSION AND CONCLUSIONS
By integrating a large set of ChIP-seq, CLIP-seq, sRNAseq, epitranscriptome data, functional annotations and public resources, Pol3Base provided the most comprehensive transcription, expression, epitranscriptome, interaction and mutation profiles of Pol III ncRNAs in human and mouse. Currently, only tRNA databases (e.g. GtRNAdb (44), tRNAdb (45), T-psi-C (46), tRNADB-CE (47) Pol3Base is the first database for decoding the interactome, expression, evolution, epitranscriptome and disease variations of Pol III-transcribed ncRNAs. The advances of our Pol3Base database are as follows. (a) Pol3Base is the first database providing the most comprehensive transcriptional regulatory networks of Pol III-transcribed ncRNAs by analyzing transcription factor binding maps identified from high-throughput ChIP-seq datasets. (b) We integrated and analyzed 240 CLIP-seq datasets to explore the binding pattern of RBPs on Pol III-transcribed ncR-NAs. It may help biologists to investigate the function of Pol III-transcribed ncRNAs. (c) We constructed 'modification' module to provide a quick overview of characterization of RNA modifications on the Pol III-transcribed ncRNAs by analyzing epitranscriptome sequencing data. These results will help to reveal the post-transcriptional regulation of Pol III-transcribed ncRNAs caused by RNA modifications. (d) We investigated the expression of Pol III-transcribed ncRNAs across cancer tissues and cell lines. We also performed pan-cancer analysis on Pol III-transcribed ncR-NAs based on expression data from TCGA small RNAseq experiments of 28 different tumor types and normal tissues. This will generate numerous differentially expressed ncRNAs or cell/tissue-specific ncRNAs for the functional study of bench biologists. (e) Pol3Base also for the first time to systematically characterize somatic mutations on Pol III-transcribed ncRNAs. This will help researcher to discover disease-related ncRNAs for further functional validation. (f) We built 'Evolution' module to illustrate evolutionary conservation of Pol III-transcribed ncRNAs in different mammals to facilitate function exploration of these ncRNAs. (g) Pol3Base provides a variety of interfaces and graphic visualizations to facilitate analysis and exploration of functions and mechanisms of Pol III-transcribed ncR-NAs. In all, Pol3Base provides considerable richness and enormous convenience and will become increasingly important for the study of Pol III transcriptome.

FUTURE DIRECTIONS
Pol3Base will continue to improve the computer server performance and built an automatic pipeline for storing and analyzing new high-throughput data generated from CLIPseq, ChIP-seq, sRNA-seq and Ribo-seq technologies, and then to decipher biological function and mechanism, regulatory networks and translational potential of Pol IIItranscribed ncRNAs. We will also develop new tools to integrate more annotation data, high-throughput sequencing data and additional species to further expand this resource. We will continually maintain and update the resource every 3 months or whenever new data sets are released in public databases.

DATA AVAILABILITY
Pol3Base is freely available at http://rna.sysu.edu.cn/ pol3base/ or http://biomed.nscc-gz.cn/DB/Pol3Base/. The Pol3Base data files can be downloaded and used in accordance with the GNU Public License and the license of primary data sources.