Ouroboros resembling competitive endogenous loop (ORCEL) in circular RNAs revealed through transcriptome sequencing dataset analysis

Emerging evidence indicates that Circular RNAs (circRNAs) exert post-transcriptional regulation of gene expression. A subclass of circRNA was found enriched with miRNA target sites. This evidence suggests that this kind of circRNA functions as natural miRNA sponge. Noticing the potential impacts of circular RNA research, we were motivated to identify novel circRNAs as well as putative circRNA-miRNA interactions through retroactive sourced transcriptome sequencing samples. Through the analysis in 465 RNA-seq runs and 22 reports published in recent years, putatively circRNA sponged miRNA that had been experimentally verified targeting circRNA host gene were found. From this observation, supporting evidence of the competitive endogenous relationship of circRNAs and miRNAs targeting circRNA host genes can be observed. Given the self-regulation and self-induction nature of these circRNAs, this kind of hypothetical phenomenon was hereby called Ouroboros Resembling Competitive Endogenous Loop (ORCEL) in circular RNAs. The fact that miRNA sponge circRNA originated from region miRNA target sites enriched regions, while genes encoded from these regions are conserved to be miRNA targets rationalize the existence of ORCEL.


Background
More than 50 years have passed since H. Harris deduced that most nuclear RNAs are likely to be non-proteincoding in 1959 [1]. Existence of functional noncoding RNAs has become convention knowledge. Thanks to the dramatically expanded scope of transcriptomics research with high throughput sequencing technology developed in recent years [2], it is possible now to more accurately investigate the expression of non-coding RNA. Differential expressed long non coding RNAs have been found and reported in almost a weekly basis [3]. Among this trend of digging into retroactive sourced experimental data, circular RNAs emerge on stump eventually. Circular RNAs (circRNAs) represent a type of regulatory noncoding RNA whose head 3′ and tail 5′ ends covalently bond together to result in a circular form. The circular form was verified with electron microscope in 1979 [4]. In 2012, Salzman et al. [5,6] developed an algorithm to detect scrambled exons in RNA-Seq datasets, and reported that circular RNA isoforms are actually predominant in many human gene isoforms. Later, an improved version of the algorithm with exon splicing site AU/AC searching was applied to find the fact that circRNAs serving as natural microRNA "sponges", which enriched with miRNA targeting site and serving competitive endogenous RNAs (ceRNAs) [7,8]. A circRNA named CDR1as [9] expressed in human and mouse brain was shown to negatively regulate miR-7 in a posttranscriptional manner [7,9,10]; this mechanism appears to be evolutionarily conserved [7]. With the regulation potential of miRNAs, circRNAs became widely interested in the research field [11]. Circular RNA can be enriched within the sample through treating samples with RNase R before conducting RNA-Seq [12,13].
In the following years, extended identification of cir-cRNAs in mouse [14], fly [15] and other animals [16] suggests that circRNA ubiquity is evolutionallyconserved. The reported circRNAs are not results of singular case. These experiment results tend to be reproducible. Further evidence indicates that human circRNA expression exhibits tissue specificity, and now tens of thousands of circRNAs have been found and reported across human tissues [14,[17][18][19][20][21][22].
Aware of the significance of circRNAs research, in 2015, we constructed a database called CircNet [23] to not only collect the published data, but also extend the scope of reported circRNAs and provide resources to aid in field.
Circular RNA back-spliced junction sites were identified within these samples. To acquire the expression patterns of circRNAs within these samples, a pipeline was developed to annotate the sequence of the circRNAs. With the putative sequences of circRNAs, miRNA target prediction was conducted on these sequences. Transcripts abundance within the samples was conducted through the transcript deconvolution algorithm [41].
Through analyzing the collect circRNA, combined with the annotation and miRNA sponge prediction, we found that in 728 human circRNA host genes, the generated circRNAs were found to have high affinity to sponge miRNAs which were reported to target these genes. The calculated P-value of the circRNA and miRNA pairs are lower than 0.005. From this observation, existence of competitive endogenous loop of circRNAs and their host gene can be observed. Given the self-regulation and self-induction nature of these circRNAs, it was hereby named Ouroboros Resembling Competitive Endogenous Loop (ORCEL) in circular RNAs.

Results
Through the expression profiling of circRNA, combined with the previously reported back spliced junction sites, we found 2747 circRNAs originated from 2693 back spliced junction sites that meet the defined threshold. From the analysis, we found that in 728 human circRNA host genes, including 909 circRNA transcripts, the generated circRNAs were found to have high affinity to sponge miRNAs. Meanwhile these miRNAs were reported to target the circRNA originated genes.

Annotation of circRNA sequence
From our analysis, we found 2693 back splice junction sites that meet the defined threshold on human genome. The combined amount of peer review reports and RNA-Seq samples in which these junction sites were found or reported ranges from 10 to 126. The miRNA sponge CDR1as [9] was found in over 100 of our collected samples. A detail list of these circRNAs, as well as their corresponding SRPBM and FPKM values within the samples can be found in Additional file 1. The data was also updated into our public database CircNet [23] in late 2016.

Self-regulation and self-induction nature of ORCEL
With the observation of most circRNAs originated from circularization of coding gene exons, it was deducted that circRNA biogenesis competes with pre-mRNA splicing [14]. As an isoform originated from part of gene locus, expression of circRNAs correlates with their originated genes. While on the other hand, the posttranscriptional roles of miRNAs to most of the genes has become a conventional knowledge [42]. Result of our analysis suggest the existence of competitive endogenous loop of circRNAs and their host genes. In the sequences of 909 circRNA transcripts from 728 human genes, we found enriched miRNA targeting sites. Echo to the recent evidences suggesting that one circRNA can sponge multiple different miRNAs [34], we found 1112 miRNAs that can be sponged by these circRNAs. Meanwhile these miRNAs were also found to target the circRNA originated genes. Hence a looping regulative relationship among circRNA, circRNA originated gene and miRNA targeting the gene was found. Given the self-regulation and self-induction nature of these cir-cRNAs, the phenomenon was hereby named Ouroboros Resembling Competitive Endogenous Loop (ORCEL) in circular RNAs. The term was inspired by Friedrich August Kekulé and his famous discovery of benzene ring. A complete list of ORCEL is available in Additional file 2.

Enrichment analysis of the ORCEL genes
An enrichment analysis of the 728 ORCEL genes was conducted through DAVID [43]. In the result of this analysis we found that many ORCEL genes participate in important KEGG [44] pathways such as Ubiquitin mediated proteolysis, Pathways in cancer, Focal adhesion and Progesterone-mediated oocyte maturation, as summarized in Table 1. In the 32 genes participated in the hsa05200: Pathways in cancer, 115 miRNAs and 45 circRNA transcripts were found participated in the ORCEL, as illustrated in the network of Fig. 1. The network was generated through Cytoscape [45].
From the network illustrated in Fig. 1, it can be noticed that among the 32 genes enriched in the pathways in cancer, only the ORCEL of HIA1 doesn't involve miR-NAs targeting multiple other genes involved in the pathway. A complete table of the enrichment can be found in Additional file 3. In the network illustrated in Fig. 2, for the 19 genes participated in the hsa04120: Ubiquitin mediated proteolysis, 52 miRNAs and 22 circRNA transcripts were found participated in the ORCEL. It is also worth mentioned that 61 ORCEL genes were enriched in the GO term GO:0046907~intracellular transport and 121 ORCEL genes were enriched in GO:0031974~membrane-enclosed lumen. The ORCEL phenomenon can potential be correlated to intracellular transport.

Discussion
In this study, putative roles of circRNAs serving as endogenous miRNA sponges were investigated through analysis of transcriptome sequencing data sets and published results. With the head-tail junction structure, circular RNAs are more stable than other kind of long noncoding RNAs. Hence circRNAs are hypothetically easier to acuminate in cells. The longer half-lives of circRNAs allow the presence of miRNA target sites increase within the cells. Through statistical analysis of the abundance of the miRNA target seeds on circRNA sequences, putative circRNA serving as miRNA sponges were identified. Recent years experiment results suggest certain threshold of miRNA target sites needs to be reached for the ceRNAs to have physiological effects [46,47], henceforth the analysis was focused on the abundance of miRNA target sites. The application of FPKM of the putative circRNA sequences, and SRPBM of the back spliced junction sites as thresholds should increase the prudence of the circRNA sequence prediction in this study. With the further compliance with results of recent year studies, results of our analysis suggest that only a certain subset of expressed circRNAs potentially serve as nature miRNA sponges. Among the miRNAs predicted to be sponged by these identified circRNAs, many had been experimentally verified to target the circRNA source genes. From these observations we hypothesize that genes targeted by miRNAs tend to be conserved with enriched miRNA targeting site in the coding region. Cir-cRNAs coded from these regions can henceforth sponge the miRNAs when overexpressed. This phenomenon was hereby named Ouroboros Resembling Competitive Endogenous Loop (ORCEL) in circular RNAs. The term was inspired by Friedrich August Kekulé and his famous discovery of benzene ring [48]. Given the observation of this phenomena in genes involved in cancer pathways, validation of this hypothetical phenomena shall significantly impact the research prospects in medical science. ORCEL can potentially serving as a kind of control mechanism to resist miRNAs overdose. The fact that miRNA sponge circRNA originated from region miRNA target sites enriched regions, while genes encoded from these regions are conserved to be miRNA targets rationalize the existence of ORCEL.

Conclusions
Through the bioinformatics analysis it was found that for certain subset of circRNAs, putatively sponged miRNA had been experimentally verified targeting circRNA host gene. From this observation, the existence of competitive endogenous loop of circRNAs and their host gene can be observed. Given the self-regulation and self-induction nature of these circRNAs, this kind of phenomenon was hereby called Ouroboros Resembling Competitive Endogenous Loop (ORCEL) in circular RNAs.

Methods
The data analysis process of this research is summarized in Fig. 3. First, to identify circRNA, transcriptome sequencing data sets were obtained from the NCBI Sequence Read Archive (SRA). The back-spliced junction sites in each RNA-seq sample were identified using a circRNA discovery pipeline adapting the scripts provided on circBase [7,49], which was referred as find_circ [50]. Detected back-spliced junction sites, along with the collected junction sites from previous reports, were further compared with the hg19 human genome annotation from RefSeq to annotate circRNA isoform sequence. The annotated sequence were then applied in the prediction of circRNA-miRNA interactions. Occurrence of miRNA target seeds in circRNA isoforms were examined and normalized by isoform length. The significance of interactions was evaluated by referring to the background distribution of miRNA seeds in all transcripts and only circRNA-miRNA The ORCEL genes enriched in KEGG pathways with significant P values interactions with P-values < 0.005 were collected. Expression profiling of the circRNAs within each of the samples collected from SRA was conducted in two different approaches: normalized counts of reads spanning the back spliced junction sites SRPBM [13] and normalized counts of reads aligned on the annotated sequences of circRNAs in units of FPKM. Only the circRNAs with estimated expression level over the threshold and found in multiple researches or samples were analyzed in this study.
In addition, 465 transcriptome sequencing data sets were collected from NCBI Sequence Read Archive [23,40]. The back-spliced junction sites in each RNA-seq sample were identified using a circRNA discovery pipeline referred as find_circ [7,49,50]. We apply the criteria defined in the pipeline hence the detected junction sites met same standards as those in the previous reports, as described in the Memczak et al. 2013 study.
Annotation of circRNA full sequence The method was described in our previous study [23].
The method was further applied on the updated data from reports between year 2014 and 2016. To acquire the full length nucleotide sequence from RNA-  seq reads, back-spliced junction sites were compared with the hg19 human genome annotation as obtained from UCSC genome browser and RefSeq [51,52]. Given the results of recent research, multiple cir-cRNA isoforms might originate from the same backspliced junction site [6,53]. Hence we annotated multiple circRNA isoforms for one back-spliced junction site. The annotation was conducted following the guideline: (1) For the back spliced junction sites locate on exact "head" and "tail" locus of exons from same transcript from RefSeq [51,52], all the flanking exons of the transcript were considered as part of the same circRNA.
SRPBM ¼ Reads count Â 10 9 Read lenth Â Mapped reads ð2Þ (2) For those back spliced junction sites flanked multiple isoforms from RefSeq [51,52], existence of multiple isoform of circRNA was assumed. (3) For those circRNAs associated with these backspliced junction sites having small misalignments to exon locations, flanked exons and a small portion of intron sequence locating in the head or tail locations were considered as parts of the isoforms. (4) For the junction sites that were found to be located in intergenic positions while others, despite overlapping with certain genes, localized to their antisense strands, the entire flanked sequence was considered as the sequence of the circRNA.
The resulted sequence annotation was took for expression analysis and miRNA target search. The annotation along with the gene transcripts was recorded in gtf format for the expression profiling.

Identification of potential miRNA sponges
Developed from results of our previous study [23,54], to find the potential miRNA and circRNA interactions, we conducted a statistical analysis on the amount of miRNA binding sites on the annotated circRNAs sequences.
The miRNA target sequences deemed typical: 6mer, 7mer-A1, 7mer-m8 and 8mer sequences [42] were extracted from miRBase [55]. Perfect complementarity sites were found on the annotated circRNA as well as gene transcripts sequences through iterative searching. To normalize the number of occurrences of these sites by the length of the transcripts, the following formula was used: Where the 'N' is the length of the seed. N = 6 for 6mer, N = 7 for 7mers and 8 for 8mer. With this formula, four frequency numbers can be acquired from each pair of circRNA and miRNA. To distinguish circRNA from Fig. 3 Overview of the data analysis process in this research. The general view of the process of ORCEL discovery is illustrated in this figure linear isoforms, frequency values were also calculated for gene transcripts mRNA. We calculated all the frequency value of the circRNAs as well as the linear isoforms pairing to miRNAs, and then converted the Z-score of the normal distribution into one tail P-value through survival function.
The circRNA-miRNA pair with P-value < 0.005 was considered high regulatory potential between the circRNA and miRNA. The miRNAs and experimentally verified gene targets were collected from miRTarBase [54].

Expression profiling of circRNAs
As previously described, the abundance of the circRNA within the collected samples were estimated through the transcript deconvolution algorithm of the Cufflinks pipeline [41]. To further increase the prudence of circRNA detection within the transcriptome, the normalized counts of sequence reads spanning the back spliced junctions were considered.

The normalized count of reads on back spliced junctions
To normalize the amount of the normalized sequence reads spanning the junction sites, a concept of spliced reads per billion mapping (SRPBM) was applied [13]. Amount of reads mapped onto hg19 human genome was acquired through the tool STAR [56]. The equation applied to calculate SRPBM is as illustrated in Eq. 2. The junction sites with the value of SRPBM larger than 1.0 were selected.

Transcript deconvolution of circRNAs
RNA-seq aligner STAR [56] was applied to realign the sequence reads from the 465 RNA-seq samples on human genome. With the forth-mentioned gtf file containing annotated exon locus of circRNAs and mRNAs, and the bam files generated from STAR, we estimated the abundance within the sample of the annotated sequence through Cufflinks [41]. The resulted transcripts with FPKM over 1.0 were selected.

Co-occurrence analysis of circRNA
From the result of recent year comparison study [50], we deducted that inconsistency between different cir-cRNA detection tools and false discovery of highly abundant circRNA could occur in the result of our analysis. Hence in addition to the two values of estimated abundance of circRNA, we further applied the following conditions: The amount of previous peer review reports in which the back spliced junction sites were reported.
The amount of samples among the 465 collected samples in which the back spliced junction sites were found meeting the criteria defined in find_circ [7,49,50].
Only the circRNAs with the combined amount of these two values over 10 were considered in the analysis of this report.