1 Introduction

The emergence of circulating RNAs (circRNAs) was reported in the 1970s, where they were observed as circular structures captured by electron microscopy in the eukaryotic cytoplasm. Interest in this noncoding RNA did not last long due to the belief then that it was low in abundance. circRNAs are often assumed to be splicing noise or byproducts of splicing [1,2,3,4], and that microRNAs (miRNAs) may outcompete their activation [5]. However, circRNAs have reemerged and have become a hot topic in many areas such as agriculture [6, 7] and medicine with the discovery of their roles in gene regulation [1, 8,9,10,11]. circRNAs have been identified through advances in RNA sequencing (RNA-seq) and bioinformatics analyses, where data have been successfully pooled to establish databases for predicting circRNA downstream targets. The establishment of circRNA databases provides guidelines for the prediction and further evaluation of circRNAs to further understand their biogenesis and roles.

2 Current Knowledge of circRNA Functions

2.1 Bona Fide Disease Biomarkers and Therapeutic Targets

circRNA expression is tissue specific, indicating that they harbor biological functions and can be used as biomarkers to distinguish cell origin and also tumor types [4, 12]. circRNAs can adopt a protective tumor-suppressive role and can also exhibit oncogenic properties. Currently, only a few circRNAs and their intersection molecules and/or pathways have been identified, namely, cZNF292 in glioma [13], circ-ITCH in lung cancer [14], and CDR1 (or ciRS-7) in the ischemic myocardium (MI) [15]. Circ-ITCH downregulation has been observed in esophageal squamous carcinoma (ESC). Circ-ITCH overexpression suppresses tumor growth via miR-7, miR-17, and miR-124 sponging to activate ITCH mRNA, which then undergoes degradation and ubiquitination by phosphorylated Dvl2, ultimately repressing the Wnt/β-catenin pathway [16]. In lung cancer, circ-ITCH expression in tumor tissues was significantly decreased compared to the adjacent non-tumor tissues (n = 78) [14]. Ectopic expression of circ-ITCH results in increased expression of the ITCH gene, its parental cancer-suppressive gene. This subsequently leads to inhibition of lung cancer cell proliferation. circ-ITCH is a sponge of the oncogenic miR-7 and miR-214, which are important for increasing ITCH expression and suppressing Wnt/β-catenin signaling activation [14]. cZNF292 is crucial for tube formation in glioma, and cZNF292 silencing reduced cell proliferation and cell cycle progression in human glioma U87MG and U251 cell lines [13]. Here, the S/G2/M phase in cell cycle progression was arrested through the Wnt/β-catenin signaling pathway and several genes, including PRR11 (proline rich 11), CCNA (cyclin A), p-CDK2 (phosphorylated cyclin-dependent kinase 2), VEGFR1/2 (vascular endothelial growth factor 1/2), p-VEGFR1/2, and EGFR (epidermal growth factor receptor) [13].

In addition to their roles as biomarkers, circRNAs are involved in many cellular processes, including regulating the processes of cellular energy, cell death and senescence, genome stability, angiogenesis, invasion and metastasis, inflammation, and metabolism [17]. In a study of lead-induced neurotoxicity, circ-Rar1 acted as a sponge of miR-671 and promoted apoptosis by inducing caspase-8 and p38 expression [18]. Moreover, circRNAs also regulate the epithelial-to-mesenchymal transition (EMT). For example, circHIPK3 played a crucial role in miR-558 sponging, leading to inhibition of migration, invasion, and angiogenesis [19]. circHIPK3 is highly expressed in liver cancer, and its depletion has proliferative effects on liver cancer cells [20]. Likewise, circPVT1 upregulation in gastric cancer (GC) is postulated to have an effect on GC tissue proliferation, acting as a sponge of miR-125 [21]. A similar effect has been observed in other cancer cells; for example, Huh7, HCT116, and HeLa cell proliferation was significantly reduced upon knockdown of circHIPK3 [20]. circRNA profiling has yielded positive insights in the classification of circRNA biomarkers in cancer.

Expanding on the potential role of circRNAs as disease biomarkers, several studies have explored the function and implication of circRNAs in metabolic and non-cancer diseases. One such circRNA is ciRS-7, a well-identified circRNA that acts as a miR-7 a/b sponge or inhibitor in brain tissues or islet cells [22]. Hypoxia treatment led to increased expression of Cdr1as and miR-7a in mice with myocardial infarction (MI), causing an increase in the cardiac infarct size [15]. Cdr1as overexpression in mouse cardiomyocytes (MCM) promotes apoptosis and is reversed by miR-7a overexpression. Sp1 transcription factor (SP1) and poly (ADP-ribose) polymerase (PARP) were both targeted by miR-7a and induced apoptosis under hypoxia treatment [15]. Myosin VIIA and Rab-interacting protein (Myrip) and paired box 6 (Pax6) activation were observed as the downstream targets of miR-7 [23], indicating that ciRS-7 is a potential alternative diagnostic strategy in metabolic disease, namely, diabetes. However, further studies are required to rule out the functional roles of ciRS-7 and to prove whether it truly has any implications in the advancement of diabetes.

2.2 circRNAs in Agriculture

In humans, circRNA biogenesis is dependent on RNA polymerase II transcription and the back-splicing reaction of the precursor mRNA (pre-mRNA). In plants, circRNAs are localized in the nucleus or other organelles, such as the mitochondria and chloroplast, or outside the cells, such as with viroids [6]. The functional assessments of circRNAs in plants are still in their infancy. However, circRNAs may be involved in plant developmental processes and stress responses (abiotic versus biotic stress), and similar to humans, they also serve as miRNA sponges. circRNAs are tissue organelle dependent, indicating that they play a role in plant organelle development [7, 24] and can be distinguished via RNA-seq datasets. There is evidence that circRNAs also contribute to malnutrition in Oryza sativa (rice) and barley [25, 26]. The described regulatory pathways indicate that a number of circRNAs may be involved in cellular metabolism, hormonal signaling, intracellular protein sorting, respiration, and carbohydrate metabolism [25]. An RNA-seq dataset of Arabidopsis thaliana showed that circRNAs are clustered in the chloroplast, suggesting that circRNAs may play a vital role in plant photosynthesis [27]. The composition of circRNAs has been proven to be an important element in the field of plant RNAs. In-depth studies aimed at understanding the mechanism behind plant development are crucial, as they may benefit the future agriculture sector.

3 Challenges to Current circRNA Investigation

3.1 Challenges in Regulating circRNA Biogenesis

It is accepted that circRNAs act as miRNA and RNA-binding protein (RBP) sponges in addition to roles in regulating gene transcription and expression, as well as protein/peptide translators [28]. circRNAs are formed in a circular tail-less mode at the 5′ end, as also seen in the canonical linear mRNA. circRNAs are commonly configured from pre-mRNA. They are organized by the joining of the 3′ end of a downstream exon of a gene and then back-spliced to the 5′ end of the first exon, forming the covalently circularized RNA [1].

circRNA back-splicing is generated by the presence of inverted repeat sequences or Alu repeat elements in the introns flanking the exons that bring the spliced sites of exons at the 3′ end to the 5′ end of the first exon in close proximity to one another via base-pairing [12]. circRNAs are mainly derived from exonic coding regions and can appear in three different forms: exonic circRNAs (e-circRNAs), intronic circRNAs (ciRNAs), and retained-intron circRNAs (ElcircRNAs).

circRNAs have become a hot topic for research with an enormous amount of research being performed to understand their mechanisms. Solid evidence has emerged by in vivo and in vitro studies. circRNAs are regulated by multiple mechanisms and serve various functions. To date, four functional roles have been reported for circRNAs. First, circRNAs act as miRNA sponges or competing endogenous mRNAs. Introducing circRNAs may have sponging effects on multiple miRNAs, altogether abrogating miRNA binding to the target mRNA(s) [29]. Second, circRNAs bind and sequester proteins. For example, circularized muscleblind (MBL1) is highly abundant in Drosophila and humans [4, 30, 31]. The start site coding sequence present in circMbl and the binding sites of MBL1 mRNA signifies the recruitment binding of MBL1, indicating the capacity of circRNAs to sequester proteins through MBL1 ribosomes to initiate translation in Drosophila [31]. It has been suggested that circRNA protein binding can also inhibit translation by acting as a competing counterpart, as observed in cytoplasmic circ-FOXO3 (forkhead box O3), where interactions with ID1 (inhibitor of DNA binding 1, HLH protein) and E2F1 (E2F transcription factor 1) [32] deplete activation in the cytoplasm; it has also been implicated in the promotion of cell cycle arrest in NIH3T3 mouse embryo fibroblast cells [33]. The formation of circRNA–mRNA complexes plays an inhibitory role in cellular and biological functions. However, there is a crucial need for in-depth exploration of the physical interactions of circRNAs with multiple target proteins to understand their actual mechanisms and pathways. Third, circRNAs regulate mRNA splicing or transcription. ElcircRNAs are localized in the nucleus, implying that they will undergo alternative splicing and gene transcription [12]. Unlike the canonical linear mRNAs, transcripts containing introns in the nucleus will be exported from the nucleus and be degraded in the cytoplasm [34]. Intron-containing circRNAs (ElcircRNAs and ciRNAs) remain in the nucleus, indicating their involvement in mRNA transcriptional regulation [4]. Nevertheless, ciRNAs do not have defined roles or biological processes. Lastly, circRNAs can be translated. circRNA transformation by N6-methyladenosine (m6A) gives rise to the initiation of protein translation [35]. Translating ribosomes have been associated with circRNAs known as ribo-circRNAs via ribosome footprinting. Ribo-circRNAs harbor a conserved termination codon at which translation initiation occurs by utilizing the start codon of the target mRNA bound to membrane-associated ribosomes [3]. Important evidence from in vitro and in vivo studies has demonstrated the translational activity of circRNAs. The protein encoded by Drosophila mbl has been detected via mass spectrometry, indicating that the translational mechanism occurs in normal settings [3]. circRNAs have also been implicated as driver mechanisms of protein synthesis. An internal ribosome entry site (IRES) was stably integrated in a circRNA minigene to induce expression in vitro, and western blotting and flow cytometry showed that it activated a number of putative proteins. This is further strong evidence that circRNAs may also function as mRNAs to drive protein synthesis [10].

Studies on the involvement of circRNAs in various diseases are still in the infancy stage, where expression has been identified but further functional and population validation are still lacking [15, 36,37,38,39]. Some key functions of circRNAs have been identified; however, the emerging circRNA–miRNA–mRNA interactions require further validation. In addition, the studies of circRNAs in cancer have not focused on the mechanisms of initiation, progression, and metastasis. In cancers, most of the studies involved identification of circRNAs, including lung, colorectal, breast, and bladder cancers as well as glioma [4]. Currently, several circRNAs have been identified to be upregulated or downregulated in cancers. The analysis of circRNAs poses several challenges including in terms of experimental design, techniques to be used, and most importantly, the bioinformatics tools to be used. The current knowledge regarding the regulation of circRNA biogenesis is limited, which renders the studies on circRNAs more difficult.

3.2 Challenges in Experimental Design and Technique

CircRNAs can be identified using several approaches, including RNA-seq, microarray, and reverse transcription (RT)-PCR [17, 40, 41]. As circRNAs are novel products of regulated alternative splicing processes, detecting splicing through RNA-seq is the most favorable method. However, there are several limitations, where circRNA cannot be detected due to several RNA library preparation steps, including RNA purification, RNA or cDNA size selection, and RNA fragmentation. For purification, rRNA-depleted libraries is much better compared to poly(A) selected or depleted since they retain the circRNAs in a particular sample. circRNAs lack a poly(A) tail; however, poly(A) enrichment step will significantly reduce circRNAs, thus making them difficult to be detected by RNA-seq. For size selection, random priming is preferable as this method is not biased where small sizes of circRNAs can also be detected. The RNA needs to be fragmented prior to adaptor ligation or priming in order to distinguish between the small RNA and circRNAs. In addition, RNA-seq protocols can introduce technical artifacts during ligation and RT. The ligation step can produce chimeric complementary DNAs (cDNAs), thus might generate low levels of artefactual circRNAs [42]. On the other hand, reverse transcriptase can lead to extensive template-switching artifacts, where two different RNA molecules can join together and affect the discovery of novel RNA isoforms [43, 44]. RT can also lead to strand displacement, which subsequently leads to differences in circRNA quantitative measurement [45]. RT is highly dependent on good RNA quality, and most importantly, the right library preparation method should be selected [46]. Other techniques that can be used to detect circRNAs include the microarray and RT-quantitative (Q) PCR techniques [17]. Most genome-wide studies use microarray for detecting circRNAs, but it is performed in limited numbers of cancer types and their adjacent normal tissue samples. However, microarray cannot identify novel circRNAs. RT-QPCR is commonly used to validate the circRNAs identified in the discovery phase in a larger sample size. Currently, most circRNA studies have been performed retrospectively; thus conducting a large prospective clinical trial is important for confirming circRNAs as disease biomarkers. The details regarding the challenges in bioinformatics are discussed in the next section.

3.3 Challenges in Bioinformatics Analysis

CircRNA identification in single-end (SE) RNA-seq data is performed by aligning the reads to the back-splice junctions. Many available algorithms can be used to identify circRNAs from the sequence data; however, their performance differs [47]. Five circRNA prediction algorithms, i.e., circRNA_finder, CIRCexplorer, find_circ, CIRI, and MapSplice, have different false-positive and false-negative circRNA detection rates [40]. The ability of the algorithms to identify a circRNA based on the splice site distance also varies. CIRI and circRNA_finder can identify circRNAs with very low proximal splice sites (<100 bp) compared to other algorithms; however, most of the circRNAs found were false positives [40]. On the other hand, CIRCexplorer and MapSplice produce the most reliable list of circRNAs with low positive rates. However, these algorithms took 2–3 days to complete individual dataset predictions [40]. Hence, identifying circRNAs with high specificity and sensitivity using a single algorithm is challenging. The usage of more algorithms for circRNA identification is advisable [41]. More importantly, individual validation is crucial for novel circRNAs with regard to the type of algorithms used, as false-positive results are common [40]. The technical challenges to studying circRNAs are discussed in the next section.

4 Comparisons Between Current circRNA Investigation Methods

4.1 Comparison Between RNA Preparation Methods for Identification of circRNAs

As circRNAs are biologically closed loops of RNA and lack open ends, it is quite difficult to identify and validate their presence. To date, there are three main methods of screening for the presence of circRNAs in biological samples: microarray, RNA-seq, and bioinformatics analyses of readily available data (Table 28.1) [48, 49]. Nevertheless, before proceeding with any analysis, it is important to enrich the circRNA population and remove any other linear RNAs that may skew the population [24, 50]. Total cellular RNA contains a wide array of RNA species and artifacts, and to avoid false positives, samples are usually treated with RNAse R [24] to digest all other linear RNAs with >7 nucleotides, which will exclude circRNAs because they lack the required 3′ end [24]. For microarray, only one available platform is Arraystar. This platform utilizes circRNA information from six published databases [24, 51,52,53,54,55]. Currently, this is the most straightforward approach for identifying circRNAs from human samples because it does not require extensive bioinformatics analysis. A more comprehensive approach to identifying circRNAs is by sequencing the transcriptome population. Current RNA-seq library preparations vary greatly, particularly in terms of biochemical preparation [41]. RNA purification is among the first steps in RNA-seq and can greatly influence circRNA population [56]. For example, the poly (A) selection in RNA-seq library preparation can heavily deplete the circRNA population, as circRNAs do not contain poly (A) tails [41]. The size selection generally used in RNA-seq excludes nucleotides of <200 bp [24]. This would exclude smaller circRNAs (even if present) unless a small RNA-seq library kit is used [24]. Therefore, the most reliable, cost- and time-effective method for identifying circRNAs at the moment is microarray. The qRT-PCR approach is commonly used to validate the presence of circRNAs [57]. Other methods such as droplet digital PCR (ddPCR), fluorescence in situ hybridization, and northern blotting are also used [57]. It is imperative to determine the sequence of the circRNAs and identify the junction spanning 5′ and 3′ ends [58]. From there, divergent primers can be designed to flank the junction sequence and use sequencing [58, 59]. This is to avoid the primers amplifying random linear splicing events [59]. ddPCR is a new technology that enables higher sensitivity in nucleic acids quantification [60]. It is more reliable and accurate than qRT-PCR for determining the quantity of circRNAs, as it eliminates the effect of rolling RT products [60]. Northern blotting is another commonly used method for detecting circRNAs [61, 62]. Similar to qRT-PCR, probes are designed to target the junction sequence and are detected on blots. Sometimes, northern blotting is preferred because the species targeted is known, thus reducing false positives [47, 62]. Nevertheless, it remains an unpopular choice due to its time-consuming and laborious procedure [62]. Fluorescence in situ hybridization can also be used to detect circRNAs [63, 64]. It utilizes fluorescence-coupled probes targeting the junctional sequence [63, 64]. All four methods have been proven to be able to detect circRNAs, but some are preferred over others. Table 28.1 compares the advantages and disadvantages of each method.

Table 28.1 Pros and cons of different methods to identify and validate circular RNAs

4.2 Comparison Between Bioinformatics Tools for Identifying circRNAs

CircRNA identification in RNA-seq data analysis was not performed until the use of splicing analysis [41, 65]. Therefore, numerous bioinformatics pipelines and algorithms have been developed for circRNA detection [51, 53, 66,67,68,69,70,71,72,73,74,75], including SE or paired-end (PE) RNA-seq data (Table 28.2). These tools or algorithms use different and unique combinations of identification strategies based on the reference genome and annotations, mapper of choice, back-splicing junction (BSJ) identification methods, and other filtering options [41, 76]. Typically, the flow of the bioinformatics analysis starts from the removal of mapped reads, followed by the mapping of splice junction sites from the unmapped reads to identify the circRNAs [76, 77]. Among these tools, the majority, except segemehl [68], use external aligners such as Bowtie [78, 79], Burrows-Wheeler Aligner (BWA) [23, 80], or Spliced Transcripts Alignment to a Reference (STAR) [81] to filter out the sequencing reads aligned to the reference genome (mapped reads) [41, 76].

Table 28.2 Comparison between circRNA identification and prediction tools

To identify circRNAs from unmapped reads, two methods or algorithms are used [76]. The first method uses the split-alignment approach that focuses on BSJ reads to identify circRNAs. The algorithm is based on the fact that BSJ reads are aligned to the reference sequences in reverse, particularly to the established GU-AG splicing sequences flanking the splice sites [76]. The split-alignment approach can be gene annotation dependent, where the BSJs are identified in between annotated exons, or gene annotation independent, where the BSJs are identified from alignment to a reference genome without gene annotation. There is a slight difference between the tools (Table 28.2) in terms of executing this approach. Segemehl [68], MapSplice [69], CIRCexplorer [53], circRNA_finder [74], and double conjugated clustering (DCC) [75] create an algorithm that aligns the unmapped reads to spliced sites for detecting BSJ sequences. By contrast, UROBORUS [66] and find_circ [51] remove mapped reads and extract the first and last 20-bp anchor sequences from the unmapped reads before identifying BSJ sequences from the mapping of these anchors. CIRI [71, 73] performs local alignment using BWA-MEM to identify paired chiastic clipping (PCC) for de novo detection of BSJ sequences and filters systematically to eliminate false positives. The second method is a pseudo-reference approach that creates a pseudo-sequence database or de novo prediction of BSJs by combining the reference genome with the corresponding gene annotation to identify circRNAs (Table 28.2) [76]. Posttranscriptional exon shuffling (PTES)Finder [67] and non-co-linear transcripts (NCL)Scan [70] create putative circRNA sequences based on mapping information of the exon-to-exon junction sequences after alignment to the reference genome. However, Known and Novel IsoForm Explorer (KNIFE) [72] creates all potential exon-to-exon junction sequences from gene annotations before the alignment. Additionally, KNIFE uses mRNA forward-spliced junction reads to construct a forward-spliced sites database [72] and also for further removal of candidate reads aligned to both databases to improve accuracy.

Previous comparison studies have shown that most circRNA algorithms have little overlap between the prediction results, and no single gold-standard algorithm is sensitive and specific enough for all data [40, 77, 82, 83]. Using rRNA-depleted libraries with or without RNAse R treatment, up to 76% and 28% of false positives were detected in rRNA-depleted libraries and rRNA-depleted RNAse R–treated libraries, respectively [40]. Testing these algorithms with simulated data improved sensitivity and specificity, although there was a trade-off between sensitivity and specificity, in which the algorithms with the highest specificity also had the lowest sensitivity, and vice versa [40]. Therefore, none of these tools share an exact pipeline of analysis and adapt to a combination of selection criteria and filtering.

One such difference is the implementation of gene annotation, which is often preferred for its improved accuracy and sensitivity for BSJ detection in comparison to pseudo-reference or de novo prediction [82, 83]. Annotation-dependent algorithms only detect uniquely mapped reads and require canonical splice signals [41, 76]; thus this can cause a blind spot for detecting circRNA isoforms (non-canonical signals). In other words, the pseudo-reference approach offers more flexibility for detecting non-canonical splice signals, but at the cost of high false positives. Even so, most of the non-canonical signals are small; thus many pseudo-reference algorithms choose to implement GU-AG splicing sequences as a default filter to control the false discovery rate (FDR) [41, 76], therefore potentially undermining the detection of non-canonical splice signals. Importantly, this dependency on annotation may create a blind spot in the algorithms for detecting novel circRNAs.

Another important point is the type of mapper used in the sequence mapping. find_circ [51], CIRI [71, 73], KNIFE [72], and PTESFinder [67] use the most standard reference aligners (Bowtie and BWA), which are not specifically designed to note splice sites. Excluding KNIFE [72], these algorithms implement the split-alignment approach or 20-bp anchor filters to control sensitivity, FDR, and specificity. Meanwhile, splice-aware mappers such as TopHat [84, 85], STAR [81], and Novoalign have been optimized and updated with the current reference genome, thus making them more convenient. One disadvantage is that splice-aware mappers have more restricted reporting modes, filtering, and arguments and may not be able to detect circRNAs in non-canonical signals [41, 76].

Besides that, the sensitivity and accuracy of these algorithms also depend on the recovery of unbalanced BSJ reads [41, 76]. Standard 20-bp anchor sequence filtering cannot capture the very short reads of unbalanced BSJs because there is little information for detection [41, 76]; thus algorithms such as CIRI [71, 73] can detect unbalanced BSJ reads due to its dynamic alignment algorithm, which can identify balanced BSJ sequences to control the FDR. The detection of unbalanced BSJs using balanced BSJ reads may require an additional step to improve the accuracy, especially for low-abundance circRNAs. The usage of multiple seed matching improves the recovery of unbalanced BSJs [73]. In this study [73], CIRI created two putative genomic regions from short segments corresponding to BSJs and forward-spliced sequences. Then, it divided the short segments or potential unbalanced BSJ reads into multiple short seeds and counted the rate those seeds were matched to the genomic region [73]. By comparing the matching counts and seed locations, the exact position of the unbalanced BSJs or short segments could be determined [73]. This evidence therefore indicates that CIRI is a more preferable algorithm for those who need to identify low-abundance circRNAs.

4.3 Comparison of circRNA Databases and Biological Function Tools

Besides circRNA identification tools, other computational tools have been developed to investigate the biological function of circRNAs by focusing on databases, differential expression, quantification, visualization, and length assembly. Currently, the available circRNA databases are circBase (http://www.circbase.org/) [86], circRNADb (http://reprod.njmu.edu.cn/circrnadb) [14], circ2Traits (http://gyanxet-beta.com/circdb/) [87], and CircNet (http://circnet.mbc.nctu.edu.tw/) [88]; they are all freely accessible online databases. circBase is a database of eukaryote circRNAs with evidence of their expression [86]. It supports the genome assemblies of Homo sapiens (hg19), Mus musculus (mm9), Caenorhabditis elegans (ce6), Latimeria chalumnae and L. menadoensis (latCha1), and D. melanogaster (dm3) [86]. circRNADb [14] contains 32,914 annotated e-circRNAs with information on the circRNA genetic sequences, exon splicing, IRES, open reading frames (ORF), and references [14]. circ2Traits [87] reports the associations of circRNAs and disease by estimating the likelihood of a circRNA being associated with miRNA in the disease and the network prediction of these associations between circRNAs, miRNAs, protein-coding genes, and long non-coding RNAs [87]. CircNet accounts for circRNA isoforms and incorporates a novel naming system for antisense circRNAs or ciRNAs and the location of BSJ sites on well-annotated exons [88]. CircNet also provides information on novel circRNAs, integrated miRNA–circRNA networks, expression profiles, genomic annotations, and circRNA isoform sequences [88].

A few application tools have been developed for investigating the biological functions of circRNAs. Both Sailfish-cir [39] and CircTest [75] have been used to identify circRNA differential expression or quantification. Sailfish-cir [39] estimates circRNA expression based on BSJ reads and other related reads from the data, while CircTest [75] estimates circRNA expression based on the BSJ read counts distributed in the samples. An additional algorithm in CIRI, CIRI-AS, which refers to alternative splicing, can be used for investigating the internal structure and alternative splicing of circRNAs [89]. Similarly, three other tools have been developed to characterize and visualize circRNA function and structure. Full Circular RNA Characterization from RNA-seq (FUCHS) [90] was developed to detect the known exon-skipping events within circRNAs, whereas CircPro [91] can be used to identify the protein-coding potential of circRNAs. CircView [92] can be used to aid visualization and understanding of circRNA functions, particularly regulatory elements such as miRNA response elements (MREs) and RBP binding sites. For understanding circRNA interaction with proteins and miRNAs, CircInteractome is a user-friendly, web-based tool for exploring circRNA interactions based on their potential binding sites [93]. However, more such tools are needed to further explore circRNAs, their biogenesis, and functions.

5 Suggestions on Methods to Investigate circRNAs

5.1 Suggestions for Improvement

For identifying circRNAs, technology providers should create more microarray platforms. This will help reduce bias and create more opportunities to include other types of newly discovered circRNAs. For RNA-seq, the biochemical steps in the library preparation protocol should be established and validated [41]. A proper guideline on enriching circRNAs and removing artifacts and linear splicing events should be implemented. For circRNA validation, both microarray and RNA-seq have their advantages and disadvantages, but perhaps two or three methods can be used in combination to confirm the presence of circRNAs. There should also be minimal requirements or criteria for publishing circRNA results, for example, whether the circRNAs were enriched using enzymatic reaction and how the primers/probes targeting the junctional sequences were designed and validated.

In terms of the bioinformatics pipelines for circRNA identification and prediction, there is no single perfect and accurate algorithm or analysis pipeline that has been developed so far [41, 76]. circRNA detection in RNA-seq reads requires careful consideration, as most tools depend on BSJ reads that have significant read density bias, and some of the sequence reads at the exon boundaries are usually associated with sequencing errors [2, 10, 37]. Therefore, these can lead to false-positive alignments at BSJ reads which will not be able to represent a truly expressed circRNA.

Due to the high rate of inconsistency between the bioinformatics tools discussed above, a validation approach is needed to assess the accuracy and performance of these algorithms. One such method is using simulated data to assess the sensitivity, specificity, and limitations [41] and to identify the trade-off values, false positives, and false negatives so that the user can identify the tool that best suits their needs. An example of simulated data is Benchmarker for Evaluating the Effectiveness of RNA-Seq Software (BEERS) [94], which simulates human and mouse PE RNA-seq data (Illumina) with differential expression of genes, splicing events, sequencing errors, variants, and insertions/deletions (indels). It is important to note that experiment-generated data are more complex than simulated data, and it is unknown how similar simulated data are in comparison to experimental data [41]. Following that, a study evaluating these algorithms found that CIRI [71, 73], CIRCexplorer [53], and KNIFE [72] showed more balanced performance but that their sensitivity, specificity, and cost had a few shortcomings [77]. To improve these algorithms, a recent study suggested that combining prediction algorithms and selecting the commonly identified circRNAs resulted in reduced false positives and improved the accuracy [82], particularly for CIRI, find_circ, and UROBORUS, which benefited the most. An improved algorithm that can eliminate the need for strict post-filtering is required to provide more flexibility in exploring novel circRNAs.

Rapid identification of circRNAs has revealed that the existing circRNAs are more diverse than expected. This could mean that some circRNAs may have been overlooked. Identifying these novel circRNAs is by nature difficult due to the many confounding factors, and a reference- or annotation-free approach algorithm would be able to identify such circRNAs, but at the expense of a higher FDR [41, 76]. This is due to the fact that most circRNAs are derived from novel and unannotated splice sites and are often less likely to be true positives and with reference-free algorithms having reduced accuracy of 13–27% [82]. Interestingly, the performance of CIRI, a reference-free algorithm, is comparable to that of annotation algorithms [82], suggesting that it is currently the most reliable algorithm. Therefore, developing a reference-free algorithm with greater sensitivity and accuracy is the current challenge to improving circRNA investigation.

6 Conclusions and Perspectives

It is apparent that circRNAs are now coming of age and are being rediscovered as important regulators of the cellular system. The biological functions of circRNAs include miRNA sponging, protein binding and sequestration, regulation of mRNA splicing, and transcription of circRNAs that can be translated. circRNAs are involved in many cellular processes leading to cancer and non-cancer diseases. In cancer, circRNAs are important for regulating cell death and senescence and the proliferation, migration, and invasion and inflammation processes. In non-cancer diseases, such as diabetes, circRNAs are important players in the metabolic and inflammation pathways. Nevertheless, there is still much uncharted territory when it comes to understanding the role of circRNAs. Most current circRNA investigations are generally dependent on RNA-seq data, and many experimental designs and library preparations are tailored for this purpose. Thus, this may introduce potential bias in pooling circRNA populations for identification and prediction analysis. Moreover, most if not all the algorithms that have been developed for identifying and predicting circRNA have little agreement, and their sensitivity and specificity require improvement. Given the widespread existence of circRNAs in the human genome, it is possible that current technologies in circRNA investigation may overlook other circRNA isoforms or low-abundance circRNAs that could be biologically meaningful. In conclusion, circRNAs have a promising potential as disease biomarkers and regulators; however, understanding of their biological functions and interaction with other known regulators is currently limited and needs further studies.