Circular RNAs’ cap-independent translation protein and its roles in carcinomas

Circular RNAs a kind of covalently closed RNA and widely expressed in eukaryotes. CircRNAs are involved in a variety of physiological and pathological processes, but their regulatory mechanisms are not fully understood. Given the development of the RNA deep-sequencing technology and the improvement of algorithms, some CircRNAs are discovered to encode proteins through the cap-independent mechanism and participate in the important process of tumorigenesis and development. Based on an overview of CircRNAs, this paper summarizes its translation mechanism and research methods, and reviews the research progress of CircRNAs translation in the field of oncology in recent years. Moreover, this paper aims to provide new ideas for tumor diagnosis and treatment through CircRNAs translation.


Introduction
According to the latest global cancer burden data released by the International Agency for Research on Cancer of the World Health Organization in 2020, 19.29 million new cancer cases and 9.96 million deaths are observed worldwide in 2020. Malignant tumors seriously threaten people's health. Advances in biotechnology in recent years have deepened our understanding of the complex genetic and non-genetic heterogeneities within individual tumors. The combination of proteomics and genomics may reveal the most accurate information about the activity status of a single gene. Molecular diagnosis and treatment can help improve the poor prognosis of malignant tumors [1]. Therefore, the molecular mechanism of the occurrence and development of malignant tumors should be deeply explored.
CircRNAs are a type of single-stranded covalently closed RNA molecule, and produced by non-canonical reverse splicing events [2]. CircRNAs are important in the occurrence and development of human malignant tumors [3,4]. Given the lack of typical mRNA characteristics, CircRNAs are generally considered to be a subtype of non-coding RNA (ncRNA) [5,6]. However, some characteristics of CircRNAs suggest translational potential. For example, most CircRNAs are composed of exon sequences, are mainly located in the cytoplasm, and can CircRNAs in cells [12][13][14]. New studies also questioned the translation capabilities of CircRNAs. Stagsted et al. identified and characterized a new subset of CircRNAs, i.e., AUG circRNAs, encompassing the annotated translational start codon from protein-coding host genes. Thorough cross-species analysis, extensive ribosome analysis, proteomics analysis and related experimental data on a selected group of AUG circRNAs, showed CircRNAs have no sign of translation [15]. The translation of CircR-NAs is still a controversial issue, and is worthy of further exploration. Recently, more studies based on ribosome maps and improved mass spectrometry showed that some CircRNAs have translation functions. CircRNAs can achieve cap-independent translation through the internal ribosome entry site (IRES), N6-methyladenosine (m6A), or a unique rolling circle amplification (RCA) method, and the protein produced by CircRNAs may be involved in the occurrence of tumors. CircRNAs play an important biological function in development. The translation function of CircRNAs has remarkably enriched genomics and proteomics and provides a new perspective for tumor diagnosis and treatment to improve status quo.

CircRNA overview
CircRNAs are a type of RNA with unique structure and unknown function. In 1976, Sanger et al. first discovered CircRNAs in RNA viruses [16]. In 1979, Hsu et al. observed CircRNAs in eukaryotic cells by using an electron microscope [17]. For the biosynthesis of CircRNAs, two models, namely, "exon skipping" and "direct reverse splicing circularization", has been proposed. Both methods involve mechanically catalyzed reverse splicing of the spliceosome [18,19]. The closedloop structure formed by reverse splicing results in the lack of cap structure and Poly A tail in CircRNAs and is more resistant to exonuclease degradation compared with linear RNAs [10]. In accordance with the origin of the genome and way of splicing, CircRNAs are usually divided into three types, i.e.,intron CircR-NAs (CiRNAs) [20], exon CircRNAs (EcircRNAs) [21] and exon-intron CircRNAs (EIciRNAs) [22]. Recently, some new forms of CircRNAs including fusion CircR-NAs (f-circRNAs) [23,24], read-through CircRNAs (rt-CircRNAs) [25,26] and CircRNAs derived from mitochondrial DNA (mecciRNAs), have been reported [27] (Table 1). CircRNAs containing introns are isolated in the nucleus, whereas EcircRNAs are exported to the cytoplasm or exosomes [28,29]. Initially, CircRNAs are considered to be meaningless, low-abundance splicing byproducts [30]. Given the development of sequencing technology and bioinformatics, CircRNAs can be produced by thousands of genes, and the same gene can also produce different CircRNAs through alternating cycles [10,31]. Many CircRNAs are found in all known eukaryotes, and their expression in different species is often highly conserved [32,33]. Increasing studies showed that CircRNA plays an important role in the occurrence, development and prognosis of various diseases especially tumors [34,35] (Table 2). In addition, CircRNAs is related to the process of neurodevelopment [36], autoimmune response [37] and infertility [38]. Considering their abundance, high stability, tissue-specific expression pattern and wide distribution in various body fluids, CircRNAs have remarkable potential as a biomarker for the liquid biopsy of human diseases [39][40][41]. Currently known CircRNAs can act as miRNA sponges [5,42,43] (Fig. 1a) or interact with proteins [44] (Fig. 1b) and can regulate the expression of parental genes [22] (Fig. 1c) and participate in the occurrence and development of the disease. Recently, some scholars discovered that some CircRNAs can be

Contains introns and exons Nucleus
Promote the transcription of its host gene by interacting with U1 small nuclear ribonucleoprotein (snRNP) [22] f−circRNAs Linear fusion transcript derived from genome rearrangement (chromosomal translocation) Cytoplasm and nucleus Contribute to cell transformation, promote cell viability and drug resistance after treatment, and have tumor-promoting properties in in vivo models [23] rt− circRNAs Coding exons of two adjacent and similarly oriented genes (read-through transcription)

Cytoplasm
To be elucidated, it may be a mechanism of gene regulation in a specific environment; it may be a mechanism of protein complex evolution [26] mecciRNAs Mitochondrial Genome Mitochondria inside and outside As a molecular chaperone, it helps to fold nuclear-encoded proteins and facilitate their entry into the mitochondria. [45] effectively translated into detectable peptides and play a regulatory role in disease. Translation is the hidden function of CircRNAs, and its underlying mechanism is still unclear and requires in-depth exploration to a large extent.

CircRNA translation mechanism
Generally speaking, the translation of RNA in eukaryotic cells requires the eukaryotic translation initiation factor eIF4F. eIF4F is a complex composed of helicase eIF4A, cap-binding subunit eIF4E, and scaffold protein eIF4G. eIF4G can attract the 43S pre-initiation complex (PIC, loaded with eIF1, eIF1A, eIF3, eIF5 and the 40 s subunit of the ternary complex eIF2xMet-tRNAiMetx GTP) by interacting with eIF3 [64,65]. When eIF4F recognizes the 5′-end 7-methylguanosine (m7G) cap structure of mRNA and recruits the 43S complex, the translation process begins. This mechanism is called the the capdependent translation pathway [66,67], and the main mechanism of eukaryotic translation initiation, but is not the only mechanism. Under certain circumstances, such as stress, a cap-independent translation mechanism is present in the body [68][69][70]. The cap-independent translation pathway is the underlying mechanism of CircRNA translation.

IRES
The IRES, a secondary structure sequence located in the non-coding region of the 5'-end of mRNA [71], can directly recruit ribosomes to initiate translation and is a cap-independent translation. The IRES-mediated translation is first discovered in viruses [72,73], and cellular IRES lacks sequential/structural similarity and appears to be richer and more complex compared with viral IRES. The IRES-mediated translation in eukaryotic cells can serve as an emergency maintenance mechanism to ensure that the body's basic protein requirements are met during stress [74,75]. This mechanism is likely to be triggered by viral invasion, tumors or other diseases [76,77]. The high-throughput screening of IRES elements in the human genome shows that about 10% of human mRNAs contain IRES elements [78]. In 2016, Chen et al. established CircRNADb on the basis of 32,914 human Ecir-cRNAs. During preliminary exploration, they predicted that about half of these CircRNAs contain ORF. The VIPS method based on RNA structural similarity predicts that about half of CircRNAs containing ORFs contain IRES [79]. Other studies showed that a large number of AUrich motifs (0-10nt) have IRES-like activity and can initiate CircRNA translation [80]. These seems to indicate the importance and universality of the IRES-mediated cap-independent translation mechanism. At present, IRES is one of the widely accepted translation initiation mechanisms of CircRNAs [14,81]. However, knowledge about the mechanism of IRES-mediated ribosome assembly is lacking. Some studies believed that IRES can be directly recognized by the non-classical eIF4G protein (eIF4G2). Unlike eIF4G, eIF4G2 contains eIF4A-and eIF3-binding regions but does not have a binding site for the cap-binding subunit eIF4E [82,83]. Therefore, even without the 5'-end cap structure, the IRES on CircRNAs can assemble the eIF4 complex and directly translate the downstream ORF (Fig. 2a). Functional studies showed that IRES relies on a special molecular structure. Thus the 40S subunit can be assembled on sequences other than the 5'-end [66]. A class of proteins called IRES trans-acting factors (ITAFs) assist IRES to recruit ribosomes for the initiation of translation through a variety of mechanisms [71,84,85].

Rolling circle amplification
By synthesizing special CircRNAs in vitro and transfecting them into eukaryotic cells, studies showed that similar to the polymerase reaction the RCA method completes the translation of CircRNAs [98]. The number of nucleotides of these CircRNAs is a multiple of three and contains the start codon AUG but no IRES or stop codon. This finding indicates that these CircRNAs contains an infinite open reading frame (iORF) [99]. Therefore, theoretically, once translation begins, the extension can rotate around the ring multiple times to produce high-molecular weight proteins (Fig. 2C). A recent study proved that the RCA of CircRNAs also exists in the body, and proposed that the programmed -1 ribosomal frameshift induced out-of-frame stop codon can terminate the iORF translation process [100]. Therefore, this single but inefficiently initiated RCA mechanism may be another mechanism of CircRNA translation [101,102].

Research methods of circRNA translation
The earliest report of CircRNAs as a translation template is the study of viral nucleic acid in 1986 [8]. However, the translation ability of CircRNAs has always been controversial, requires different methods to study the structural and functional characteristics of circular RNA, and clearly demonstrate its coding potential in vivo. For RNA with translation capability, ORF is essential, and long, and conservative ORFs are likely to be encoded [103,104]. However, for CircRNAs, a short open reading frame (sORF) that is usually less than 300 nt in length seems to be important [105][106][107], and ORF spanning splice sites are also a significant feature of CircRNA-encoding peptides [81]. Aiming at the characteristics of CircRNAs with translation potential and their potential translation mechanism, current research ideas include reading frame prediction, translation initiation component prediction, conservative analysis, translationomics, proteomics, and many other aspects. Research methods also include the methods of bioinformatics predictive analysis and experimental verification.
Recently, specialized databases that integrate multiple methods and data are gradually developed. The Ribo-CIRC analyzes 3168 publicly available Ribo-seq and 1970 matched RNA-seq data sets. These data cover 314 studies of 21 different species, and established a comprehensive translatable CircRNA database that is predicted by calculations and verified by experiments [108]. The TransCirc is also recently established a comprehensive database [109]. The TransCirc contains information on more than 300,000 CircRNAs and has obtained multi-omics evidence to support CircRNA translation from published literature. Through a combination of direct and indirect evidence, The TransCirc predicts the potential of all Cir-cRNAs in encoding functional peptides.
After calculation and evaluation, experimental methods should be used to identify the translation of Cir-cRNAs. The Ribosome Atlas, the deep sequencing of mRNA fragments protected by ribosomes, is a powerful tool for the global monitoring of protein translation in the body. The Ribosome Atlas can discover the regulation of gene expression behind various complex biological processes, important aspects of protein synthesis mechanisms, and even new proteins [110,111]. The sequencing of the ribosome footprint provides an accurate record of the position of the ribosome when translation stops, revealing what transcript is being translated by the ribosome. The overall density of the ribosome footprint reflects the rate of translation that occurs on different transcripts, allowing a direct and quantitative measure of the rate at which each protein is produced by the cell [112]. Polysome profiling is based on the high sedimentation coefficient of ribosomes, and the sucrose density gradient centrifugation can be used to separate polyribosomes. Analyzing the separated components can evaluate the coding potential of ncRNA and directly determine the translation efficiency of CircRNAs in the whole genome [113,114]. Ribosome profiling and polymer separation are two complementary methods that can realize genome-wide translation analysis. Proteomics can be used to discover and directly detect the micropeptides encoded by circRNAs, and the biological mass spectrometry is a common identification and analysis method for these micropeptides [115,116].

CircRNA translation peptides in tumors
Thus far, identified translatable CircRNAs are lacking. For the function of the protein obtained by Cir-cRNA translation, Zhao et al. proposed the protein bait hypothesis. The protein encoded by CircRNAs competes with its homologous linear splicing protein isoforms for binding molecule, thereby preventing the normal function of the isomer [117]. CircRNA disorders are common in tumors including basal cell tumors, and the unbalanced expression of Cir-cRNA-encoded proteins may contribute to the occurrence and development of tumors [25]. Based on the abnormal expression pattern of CircRNAs in human malignant tumors, CircRNA-encoded proteins can provide new specific targets for tumor diagnosis and treatment.

Glioblastoma (GBM)
At present, more coded CircRNAs are found in brain gliomas than in other tumors. This may be related to the increased CircRNAs in neuronal tissues [118]. In GBM CircFBXW7 can encode a new 21 kDa protein FBXW7-185aa through IRES [119]. CircFBXW7 and FBXW7-185aa are less expressed than adjacent tissues. The expression of CircFBXW7 is positively correlated with the overall survival of patients with GBM, whereas FBXW7-185aa induces c-myc stabilization by antagonizing USP28, leading to cell cycle arrest and inhibited proliferation of GBM cells. Circ-SHPRH is expressed in large amounts in normal human brains but low in GBM. The tandem stop codon "UGA UGA " is generated during the cyclization of Circ-SHPRH. Using this overlapping genetic code, Circ-SHPRH can start and stop translation, resulting in a new functional protein SHPRH-146aa [120]. SHPRH-146aa is a tumor suppressor that can protect the full-length SHPRH protein from being degraded by the ubiquitin proteasome, thereby promoting the turnover of proliferating cell nuclear antigen in vivo. Similar to CircSHPRH, CircAKT3, which has low expression in GBM, uses overlapping start-stop codons to encode the new protein AKT3-174aa with 174 amino acids [121].
AKT3-174aa competitively interacts with phosphorylated PDK1, reduces the phosphorylation of AKT-Thr308, and plays a negative regulatory role in regulating the PI3K/ AKT signal intensity. The overexpression of AKT3174aa reduces the proliferation, radiation resistance and tumorigenicity of GBM cells. Zhang et al. also found that Circ-LINCPINT, which has low expression in GBM, encodes the 87-amino acid peptide PINT87aa [122]. The peptide can directly interact with the polymerase-related factor complex to inhibit the transcription extension of a variety of oncogenes.
In 50% of GBM, activated epidermal growth factor receptor (EGFR) signals drive tumorigenesis. Circ-E-Cad, which is highly expressed in GBM, encodes an undescribed secreted E-cadherin variant (C-E-Cad) through IRES. This protein mediates an additional mechanism to activate EGFR signaling [123]. C-E-Cad binds to the EGFR CR2 domain and activates EGFR with a unique 14-amino acid carboxyl terminus, thereby maintaining the tumorigenicity of glioma stem cells. In the treatment of GBM, the inhibition of C-E-Cad can significantly enhance the anti-tumor activity of anti-EGFR strategies. Liu et al. found that Circ-EGFR, which is up-regulated in GBM, forms a new multimeric protein complex rtEGFR through RCA and translation [100]. rtEGFR can directly interact with EGFR, maintain EGFR membrane positioning, and reduce the endocytosis and degradation of EGFR. In brain tumor-initiating cells, reducing the expression of rtEGFR weakens the tumorigenicity of tumors and enhances the effect of anti-GBM. The abnormal activation of the Hedgehog pathway is also an important factor in the occurrence of GBM. Recent studies found that the IRES-mediated Circ-SMO encodes a new protein SMO-193aa [124]. The peptide directly interacts with SMO, a key component of the Hedgehog pathway, to enhance SMO cholesterol modification, release SMO from the inhibition of patch transmembrane receptors, and activate SMO. Inhibiting the expression of SMO-193aa can weaken the Hedgehog signal strength of cancer stem cells and inhibit self-renewal, proliferation in vitro, and tumorigenicity in vivo. This finding indicates that SMO-193aa is an oncogenic protein that promotes the occurrence of GBM.

Colon cancer
Zheng et al. found that the expression of CircPP1R12A in colon cancer tissues is significantly increased and has a small ORF (216 nt), which encodes an unidentified functional protein CircPPP1R12A-73aa [125]. The protein can promote the growth and metastasis of cancer cells by activating the Hippo-Yap signaling pathway. However, CircFNDC3B, which has expression in colon cancer cell lines and tissues, can encode a new protein CircFNDC3B-218aa through IRES [126]. CircFNDC3B-218aa but not CircFNDC3B can inhibit the proliferation, invasion, and migration of colon cancer. Further research found that CircFNDC3B-218aa, a tumor suppressor factor, inhibits the expression of Snail, thereby up-regulating FBP1 and inhibiting the epithelial to mesenchymal transition. Zhi et al. found that a high expression of CircLgr4 in colon tumors can encode polypeptides. CircLgr4derived peptides interact with and activate LGR4, which further promotes the activation of Wnt/β-catenin signals and drives the self-renewal of colorectal cancer stem cells [127].

Breast cancer
As mentioned earlier, CircFBXW7 encoding FBXW7-185aa has an anti-tumor effect in gliomas. Ye et al. found that in triple-negative breast cancer (TNBC), the low expression of CircFBXW can also encode the FBXW7-185aa protein. This protein inhibits the proliferation and migration of TNBC cells by increasing the abundance of FBXW7 and inducing the degradation of c-myc [128]. Li et al. proposed that about 30% of TNBC clinical specimens express a splice variant of the HER2 gene, i.e., CircHER2. Thus, some TNBCs are not truly "HER2negative". CircHER2 can encode a new protein HER2-103 that is identical to most of the amino acids in the CR I region of HER2 [129]. This protein can promote EGFR/ HER3 homo/heterodimerization, sustained AKT phosphorylation, and downstream malignant phenotype. Pertuzumab can significantly reduce the tumorigenicity of CircHER2/HER2-103-positive TNBC cells in vivo, but has no significant effect on CircHER2/HER2-103-negative TNBC cells.

Hepatocellular carcinoma(HCC)
CircARHGAP35 from exons 2-3 of the ARHGAP35 gene is highly expressed in HCC tissues. Studies found that CircARHGAP35 has a cancer-promoting effect in HCC, whereas the parent gene ARHGAP35 of CircARH-GAP35 has a cancer-suppressing effect. Further exploration of the mechanism found that the CircARHGAP35 sequence contains a 3867 nt long ORF sequence spanning the splice junction, which can encode a new protein of 1289aa [130]. This protein overlaps with the sequence of the ARHGAP35 parent gene protein at the N-terminus and only a short segment at the C-terminus is a unique sequence. This protein also has 4 FF domains, but lacks the RhoGAP domain. The protein can be located in the nucleus and interact with TFII-I to promote the proliferation and migration of HCC cells. Circβ-catenin is also highly expressed in HCC tissues, and using the same start codon as linear β-catenin mRNA, new stop codons are formed by circularization and IRES in the sequence to achieve translation, which can generate a new 370 amino acid β-catenin subtype, i.e., β-catenin-370aa [131]. β-catenin-370aa can competitively interact with glycogen synthase kinase 3β (Gsk3β) to prevent Gsk3β from binding to full-length β-catenin. This competitive effect inhibits the phosphorylation and degradation of β-catenin induced by Gsk3β, stabilizes the full-length β-catenin, and activates the Wnt signaling pathway. The activation of Wnt pathway is related to the occurrence and poor prognosis of liver cancer.

Gastric cancer
The down-regulated CircMAPK1 expression in gastric cancer tissue can inhibit the proliferation and invasion of gastric cancer cells in vivo and in vitro. Studies found that CircMAPK1 can encode the new protein MAPK1-109aa with 109 amino acids through IRES [132]. Functional experiments confirmed that CircMAPK1 plays a tumor-inhibiting effect through MAPK1-109aa. In terms of mechanism, the tumor suppressor MAPK1-109aa can competitively bind to MEK1 to inhibit the phosphorylation of MAPK1, thereby inhibiting the activation of MAPK1 and its downstream factors in the MAPK pathway. The MAPK cascade signaling pathway is a key signaling pathway that regulates cell proliferation, survival, and differentiation and is related to the occurrence of a variety of tumors [133].

Conclusion and outlook
CircRNAs achieve protein translation through a capindependent translation mechanism, which blurs the definition between coding RNA and ncRNA, and makes us realize that the translation mechanism in eukaryotic cells is much more complicated than we understand. Most of the CircRNA-encoded proteins found in most studies play an anti-tumor or tumor-promoting role through different signal transduction pathways. This finding indicates the importance of CircRNA-encoded protein in tumorigenesis and development and fully proves its potential development and clinical application values. Given the deepening theoretical and clinical research, CircRNAs and its encoded protein may be applied to tumor diagnosis and treatment through a variety of ideas. Examples include using these peptides/proteins with classic anti-cancer drugs, vaccinating synthetic peptides or viral vector vaccines encoding related peptide sequences, and developing new drugs on the basis of potential targets in the mechanism. However, most CircrNAs seem to not be related to ribosomes. Moreover, the efficiency of cap-independent translation initiation is low, and the abundance of polypeptides translated by CircrNAs is limited. At present, the research on CircRNA translation is still in its infancy. Many issues, such as refined mechanism, regulation, extension, and termination process, are unknown, and further research and exploration are needed.