Secondary structures in RNA synthesis, splicing and translation

Graphical abstract


Introduction
mRNAs are essential molecules in the cell as they are key to extracting information stored in the DNA. Although the function of mRNA molecules is primarily determined by the nucleotide sequence, some properties are determined by secondary structures. Secondary structures are defined as distinct features, including hairpins, long range interactions, G-quadruplexes, R-loops and pseudoknots and they are formed as a consequence of the interactions of non-adjacent nucleotides. Their presence can impact various processes involving the mRNA, including synthesis, splicing and translation. Secondary structures are dynamic and can be modulated by multiple proteins, in particular RNA binding proteins (RBPs), and as they cannot be predicted solely from the primary sequence they are challenging to study. Nevertheless, several assays are available for both in vitro and in vivo profiling, and in this Review, we summarize these methods, provide an overview of some of the elucidated and putative functional roles of mRNA secondary structures, and finally we discuss their impact on disease. We discuss the consequences of secondary structure formation for splicing and translation, with particular focus in G-quadruplexes, hairpins and long range interactions. We also discuss the contribution of secondary structures in the regulation of mRNA splicing and in translation initiation and discuss the mechanisms involved.

RNA secondary structure formation
In RNA, intra and intermolecular long-range interactions, including hairpins, pseudoknots, and G-quadruplexes, are com-monly observed. Hairpins are composed of a hybridized stem and a single stranded loop ( Fig. 1a and b) and can contain mismatches and bulges. Pseudoknots contain nested stem-loop structures, with half of one stem intercalated between the two halves of another stem. G-quadruplex formation is driven by the inherent propensity of guanines to self-assemble, in the presence of monovalent cations, into planar structures known as G-quartets [1]. Each G-quartet is composed of four guanine nucleotides that interact with each other through Hoogsteen hydrogen-bonds. Consecutive runs of guanines (G-tracts) may lead to the formation of consecutive G-quartets that can stack with each other to form G-quadruplex structures (Fig. 1c). Biophysical properties such as the length of intervening loops between consecutive G-runs influence their formation dynamics. In addition, G-quadruplexes can be intramolecular or intermolecular. During transcription, dynamic hybrid structures between DNA and nascent RNA transcripts can be formed, such as R-loops (Fig. 1d) [2]. R-loops are three stranded hybrid structures in which an RNA molecule invades and hybridizes with one DNA strand, while displacing the other. The size of R-loops can range from <100 base pairs to >2000 base pairs [3]. Formation and stabilization of R-loops is particularly favorable when the non-template strand is Grich, but it can also be promoted by DNA supercoiling, the presence of DNA nicks, and the formation of G-quartets [3,4]. The impact of R-loop formation, as well as the formation of DNA and RNA G-quadruplexes and other secondary structures, impacts transcript elongation rates and can have a kinetic repercussion on co-transcriptional events involved in RNA processing, such as alternative splicing [5,6]. A number of methods that probe RNA structures have been developed. Methods such as selective 2 0 -hydroxyl acylation analyzed by primer extension (SHAPE)-seq [7] and parallel analysis of RNA structure (PARS) [8] were able to identify RNA structures in vitro, while more recent methods can deduce structures in vivo [9,10]. For instance, RNA in situ conformation sequencing (RICseq) [11] is a powerful new method that enables global detection of intra-and intermolecular RNA-RNA interactions, such as duplexes and long-range loop-loop interactions. Cross-linking immunoprecipitation high-throughput sequencing (CLIP-seq) enables the investigation of protein interactions with RNA molecules [12] from which many variant technologies have emerged. RNA G-quadruplexes can be characterized transcriptome-wide [13,14] using rG4-seq, which is a modified sequencing method that stalls at RNA G-quadruplexes, enabling identification of RNA G-quadruplexes in vitro, and RNA G-quadruplexes have also been In the absence of secondary structures, RNAPII elongation rate is higher, which disfavors the recruitment of splicing factors that promote assembly of the spliceosome and exon definition. In this situation exons flanked by weak splice sites may not be recognised, and they are consequently skipped. Exons flanked by strong splice sites can be efficiently recognized by small ribonucleoproteins (snRNPs) U1 and U2, leading to the formation of the pre-spliceosome (complex A) and promoting exon definition and inclusion in the mature mRNA transcripts. B. Formation of secondary structures at DNA and RNA can decrease RNAPII elongation speed. For example, during transcription R-loops formed at the 3 0 of genes can be stabilized by non-template DNA G-quadruplex formation. Low transcription rates promote exon inclusion by allowing the formation of secondary structures and binding of proteins that can favor the recognition of weak splice sites that would not be recognized otherwise. An RBP that recognizes and binds to the secondary structure is shown in green whereas an RBP whose binding is inhibited by secondary structure formation is shown in red. C. RNA secondary structures can modulate mRNA interactions with RBPs either promoting or inhibiting their binding at the mRNA molecule. For example, G-quadruplexes formed at the DNA or RNA level can selectively recruit RBPs to influence splicing outcome. In schematics A, B and C, thicker lining of the mRNA indicates exonic regions whereas thinner lining is indicating intronic regions. The dashed line of mRNA molecules indicates that the length of the transcript can be longer than displayed. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) visualized in cellulo using a specific antibody [15]. Moreover, researchers have developed small molecules, such as carboxypyridostatin, a cyanine dye called CyT and Thioflavin T [15][16][17][18][19], that can shift the equilibrium between the folded and unfolded state of RNA G-quadruplexes and which display preference for RNA over DNA G-quadruplexes. Identification of R-loops has been enabled by usage of specific antibodies [20][21][22][23] and other nuclease-based methods [24,25].

RNA polymerase speed and secondary structures
A variety of features are associated with RNAPII speed. For instance, the presence of introns and the length of the first intron are both positively correlated with RNAPII speed [26], while nucleosome formation can reduce RNAPII speed [27,28]. Regions with high propensity of forming DNA, RNA, or hybrid secondary structures are also associated with RNAPII pausing or slower RNAPII speed ( Fig. 2a and b) [29][30][31]. Another example of structure remodeling due to slower RNAPII speed is inhibition of hairpin formation due to competition with other alternative structures resulting in reduced binding by stem-loop-binding proteins [30]. In S. cerevisiae and S. pombe, folding energy and GC content in the transcription bubble have been correlated with RNA polymerase distribution, and RNA structures within nascent transcripts promote forward translocation of the polymerase and limit backtracking [32]. This indicates how nascent RNA structures can promote the forward movement of an RNA polymerase molecule. Analyses of nascent RNAs have provided evidence that the formation of secondary structures within introns is associated with more efficient co-transcriptional splicing, which is favored under slower transcriptional rates [32,33]. Taken together, secondary structures will impact several processes, including promoter-proximal pausing, exon recognition, splicing and transcription termination, as they are all influenced by RNAPII speed.

RNA splicing and secondary structures
Pre-mRNA splicing is a key biological process that enables the removal of introns and the joining of intervening exons, eventually resulting in a mature mRNA molecule. Alternative splicing affects approximately 90-95% of mRNA transcripts in humans [34,35] and most often occurs co-transcriptionally [33], while for a minority of transcripts it occurs post-transcriptionally [36]. Splicing is a highly conserved mechanism [37] that is pivotal for a number of biological processes such as cell growth, differentiation, immune response, neuronal development [38][39][40], while aberrant splicing is implicated in multiple diseases [41] including neurological disorders [42] and cancer [43].
Splicing is mediated through the spliceosome complex which recognizes splice signals, the key members being the 5 0 splice site (5 0 ss), the 3 0 splice site (3 0 ss), and the branch point. The recognition of these consensus sequences is commanded by U1 and U2 small nuclear ribonucleoproteins (snRNPs) and other auxiliary protein factors that are involved in early spliceosomal assembly. Since higher-eukaryotic genes are often interrupted by long introns, early spliceosomal complex assembly over exons recognizes both splice sites during a process commonly known as exon definition [37]. Nevertheless, computational analyses of vertebrate splice sites have shown that the consensus splicing signals only account for approximately half of the information required to accurately define exon/intron boundaries [34], suggesting that other regulatory elements such as RBP sites and secondary structures are crucial for splice site definition. Splice sites with sequences that are substantially different from the consensus signals lead to suboptimal recognition of splice sites (weak splice sites), and are often associated with alternative splicing events. Recent models using deep learning can predict to a large extent splicing events using the primary DNA sequence and can integrate the effects of mutations [44,45].
Even though the RNA structural code has been less explored [46], it is known that the effects of cis-regulatory elements can be modulated by the presence of RNA structures in nascent transcripts and in mature mRNAs [47]. Co-transcriptional transient RNA structure formation can impact splicing through RNAPII pausing and backtracking, which can have a direct kinetic effect over co-transcriptional splicing events [48]. One such example is the human ATE1 gene, where splicing of two mutually exclusive exons is regulated by competing long-range hairpin structures that span up to 30 kB [49]. Mutations that disrupted each of the secondary structures shift the equilibrium between the two exons indicating direct control of splicing outcome. Reduction of transcription rates can favor further formation of RNA secondary structures [30] and binding of splicing regulatory factors that can increase splicing efficiency therefore allowing the recognition of exons that are flanked by weak splice sites, which would otherwise be skipped [5,50] ( Fig. 2a and b).

The interplay between RBPs and secondary structures
During the mRNA lifecycle, RBPs regulate to a significant extent diverse transcriptional and post-transcriptional stages including splicing, transportation, translation, stability and degradation. They bind to pre-mRNA molecules in the nucleus and regulate its maturation and transportation to the cytoplasm where they regulate translation and degradation. The number of proteins that can bind to RNA in humans is estimated to be more than 1,500, adding complexity to all the aforementioned programs [51].
RBPs can facilitate or inhibit the recognition of splice sites thereby acting as splicing enhancers or splicing silencers [46,52,53]. The majority of RBP motifs are not bound in vivo as demonstrated by high-throughput experiments that identify the sites where RBPs bind to endogenous RNAs such as cross-linking immunoprecipitation followed by high-throughput sequencing (CLIP-seq). One possible explanation is that RNA structures provide additional contextual features beyond the primary motif sequences ( Fig. 2b and c), and it has also been shown that RNA secondary structure is predictive of binding [54,55]. Several studies have shown that during pre-mRNA synthesis the formation of RNA structures influences alternative splicing by diverse mechanisms [56,57], and that local RNA structure formation can impact splicing by modulating the accessibility of core splicing signals [58][59][60] as well as RBP binding sites [58,61,62].
An example of how RNA secondary structures can dictate the binding of specific RBPs, is provided by MBNL1 and U2AF65 binding to influence exon inclusion in the fifth exon of TNNT2 [63,64]. MBNL1 favors hairpins and when bound inhibits U2AF65, which favors a linear structure, from binding the polypyrimidine tract resulting in exon skipping. Additional evidence from mice shows that MBNL1 also binds the hairpin structure of exon F in TNNT3. Another example is elF3, which recognizes and binds to hairpin structures at 5 0 UTR to exert translational activation or repression [65]. Other studies have shown preferential binding of RBPs at RNA G-quadruplex sites, e.g. CNBP, which prevents RNA Gquadruplex structure formation and promotes translation [66] and FMRP, which preferentially binds RNA G-quadruplex structures [66,67]. Secondary structures and RNA binding proteins have been systematically investigated, enabling the identification of preferences of structured RNA for particular proteins [68,69]. Interestingly, a recent genetic study showed that G-quadruplex sequences at 5 0 UTRs are selectively constrained and are enriched for eQTLs, loci containing genetic variants that result in changes of the expression level of a gene, and RBP sites [70].

Helicases as key regulators of secondary structures
Structure formation is to a large extent modulated by enzymes such as eIF4A and DHX29, that can unwind them, and their importance is demonstrated by their pivotal role in translation initiation [71,72]. Similarly, the continuous activity of DNA/RNA helicases and ribonucleases H (RNAse H1 and H2) release R-loop structures [3]. Interestingly, R-loops and G-quadruplexes were both found to be unwound by the helicase DHX9 in humans [73]. DHX9 activity protects single-stranded DNA against damage and preserves genomic stability [74]. RNA G-quadruplexes are known to interact with several proteins [70,75,76]. For example, the RNA helicase RHAU (also known as DHX36) resolves mRNA G-quadruplexes [77,78]. One of its targets is a G-quadruplex at the 5 0 UTR of Nkx2-5 mRNA, and it has been shown that DHX36-mediated G-quadruplex structure unfolding is required for the gene to be expressed [79]. Another DHX36 target is Gnai2 mRNA, a key regulator of stem cell function and muscle regeneration [78]. DHX36 and DHX9 were also found to modulate translational efficiency by resolving 5 0 UTR RNA G-quadruplexes [80], while several RBPs such as hnRNP H/F and helicases such as DDX21, DDX17 DDX3X, DDX5 and DDX1 have been found to unwind RNA G-quadruplexes and are also involved in transcription, splicing and translation regulation [81][82][83][84]. Similarly, multiple helicases have been shown to resolve hairpin structures. For instance, UPF1 can resolve RNA hairpins [85], while DDX5 can resolve DNA and RNA G-quadruplexes as well as hairpin structures [86,87] (Table 1).
The cellular mechanisms mediating the stabilization and resolution of RNA secondary structures remain incompletely understood, as are the interactions between secondary structures and protein complexes. In addition, the effect of perturbing these mechanisms and their relevance to disease progression is unclear. High throughput screens coupled with short hairpin RNAs (shRNAs) or CRISPR-based technologies have enabled systematic interrogation of the roles of diverse proteins, such as RBPs, helicases, and topoisomerases [88][89][90][91]. Furthermore, mutational analysis with CRISPR-Cas9 could be used to study the effects of secondary structure disruption in vivo or in cellulo. CRISPRinduced mutations that destroy the secondary structure motifs, for example the G-runs of G-quadruplexes or the stem sequence of hairpins, but leave other regulatory sequences such as RBP motifs unchanged, could advance the understanding of how secondary structures determine gene expression.
7. G-quadruplexes as regulators of alternative splicing G-quadruplex sequences are enriched at promoters and they have been extensively studied in this context [131]. Additionally, G-quadruplexes have been related to splicing, 3 0 processing, transcription termination, RNA localization and translation regulation [76]. Interestingly, it has been shown that G-quadruplex sequences have a high enrichment in the proximity of both 3 0 and 5 0 splice sites across a wide range of species. The effect is more pronounced  [92][93][94]. Alternative gene names are listed between parenthesis and gene paralogs with homologous functions are separated by ''/ ".

Gene name
Target Molecular function Associated phenotype upon loss of function experiments PIF1 DNA G4 Prevent genome instability associated with DNA G4s and R-loops. [95,96].
Knock down of XPD results in accumulation of G4s [99].

DNA G4 D-loops Holliday junctions
Unwinds a variety of structures DNA that emerge during DNA replication, recombination and repair [100].
Loss of functions mutations leads to Bloom syndrome [101]. Absence of BLM is associated with genome instability and excess of sister chromatid exchange events at G4 loci [102].
WRN loss of function leads to accumulation of G4s and expression changes associated with G4-containing promoters [105].
Absence of DHX9 promotes back-splicing events and induce translational repression of transcripts containing inverted-repeats Alu elements [110].
Formation of stress granules and increases protein kinase R (PKR) phosphorylation [113]. Reduced telomerase efficiency and shorter telomeres [115]. Higher UV sensitivity due to lack of p53 expression [116].
DDX21 knock down results in increased expression of genes with G4 motifs in their 3 0 UTR [83].

DDX1
RNA G4 Converts RNA G4 into R-loops [81]. DDX1 deficiency impairs class switch recombination in B cells [81] Paralogues that encode for the two subunits of the eukaryotic translation initiation factor 4A (eIF4A). These helicases resolve RNA hairpins and G4s located at the 5 0 -UTR, which has an impact on mRNA translation efficiency.
DDX2A plays an essential role in spermatogenesis, whereas DDX2B is essential for mouse viability [123].

R-loops
Resolves R-loops that emerge during transcription [124]. R-loop accumulation and genomic instability due to knock down of DDX41 [124].
Genome instability and deficiency in co-transcriptional gene silencing pathways mediated by small RNAs [129,130].
at the non-template strand, suggesting that the G-quadruplexes are formed primarily by the RNA and that they may favor or block the binding of RBPs [132].
One of the first exemplary cases of RNA G-quadruplex mediated regulation of alternative splicing was found in the hTERT gene, which encodes for the catalytic subunit of the telomerase enzyme, and one of its exon skipping events is promoted by the stabilization of intronic G-quadruplexes [133]. Gomez and colleagues hypothesized that RNA G-quadruplex formation can prevent RBP binding to intronic enhancers, leading to exon skipping. However, based on different functional assays, RNA G-quadruplex formation has also been proposed to promote RBP binding to splicing regulatory elements [134][135][136]. Since G-quadruplex-dependent splicing events were often demonstrated by introducing mutations at Gquadruplex motifs, it was unclear from these results whether the G-quadruplex structure or the linear form of these G-rich sequences act as a splicing enhancer. To disentangle these effects, Huang and colleagues showed that mutations that prevent intronic G-quadruplex formation but keep G tracts intact, led to exon exclusion of an alternative exon in the CD44 gene [137]. Since the CD44 intronic G-quadruplex motif sequence can be bound by two RBPs that have the opposite effect on exon exclusion, RNA Gquadruplex formation may function as a switch to promote the binding of one RBP over the other [138]. In another recent study where the role of wild-type and mutated G-quadruplex sequences in alternative splicing was tested using a minigene, it was also shown that the presence of an RNA G-quadruplex favors exon inclusion [132], consistent with the aforementioned findings. There is also evidence of an interplay between RNA Gquadruplex stabilization and specific binding proteins such as HNRNP H/F [116,137] and HNRPU [139] and recent studies suggest that RNA G-quadruplex formation can modulate in vitro RBP binding to mRNA molecules [66].
The genome-wide effect of RNA G-quadruplex formation over splicing factor binding remains unclear. High-throughput screening of chemical compounds via dual-color splicing reporters has identified two small molecules, emetine and cephaeline, that disrupt RNA G-quadruplex formation [140]. Genome-wide evaluation of emetine effects on alternative splicing showed substantial alternative splicing changes after treatment, with nearly 60% being exon skipping events. It was also shown that multiple RBPs colocalize with G-quadruplex motifs flanking splice junctions, suggesting an interplay between RBP binding and RNA G-quadruplex structure formation, which was further corroborated by loss of function experiments followed by RNA-seq, identifying consistent associations for 36 RBPs [132,137].

Hairpins enable long range RNA interactions during splicing
Long range interactions are important for splicing modulation [141], and they are more enriched at weak alternative acceptor splice sites [142]. Some long range interactions can span several kilobases and can bring in proximity otherwise distant splice sites. One of the best-characterized examples of regulation of splicing through RNA structures can be found in D. melanogaster for the DSCAM gene, where RNA-RNA interactions, mediated through multiple structures, regulate the selection of exons within arrays of mutually exclusive exons [143,144]. In this case, RNA looping can bring splicing elements situated thousands of bases away from each other into close proximity.
Hairpins may also directly affect exon skipping events by a mechanism known as ''looping-out", whereby inter-intronic base-pairing RNA interactions can loop out exons to promote their skipping [56]. This mechanism is supported by the enrichment of conserved complementary sequences present in intronic regions flanking exon skipping events [145]. Moreover, the artificial introduction of self-complementary regions across exons suppresses exon inclusion in yeast, suggesting a causal relationship between hairpins and exon skipping [146]. Interestingly, the expansion of self-complementary regions is related to the primate-specific Alu retrotransposon, which is enriched in regions flanking alternative exons, suggesting a role in splicing regulation [147]. During back-splicing, an unconventional splicing mechanism, the second nucleophilic attack is performed over an upstream 3 0 splice leading to circular RNA (circRNAs) products. circRNAs are particularly abundant in the brain and RNA structures that favor backsplicing are often derived from complementary intronic sequences associated with Alu elements [148]. In zebrafish, hairpin formation between dinucleotide repeats that co-occur at opposite boundaries of an intron, mediate splicing without U2AF2, which is a major component of the spliceosome [149].
The formation of RNA structures can also enhance RBP regulatory range by bringing distal regulatory elements in close proximity with their exon targets [150]. This can be particularly important for RBFOX2 regulated exons since more than half of RBFOX2binding sites are found over 500 bp away from any annotated exons, and it has been shown that long-range RNA hairpin formation is necessary for the regulatory effect of distal binding sites [151]. It has also been shown that hairpin formation can influence splicing regulatory protein binding, with enhancers and silencers having a stronger effect when present in the loop relative to the stem [52,54], suggesting that RBP binding is inhibited at the stem [58,61]. In an elegant set of experiments, it was shown that in the case of FGFR2, the formation of a hairpin structure is required for efficient splicing from two mutually exclusive exons and its splicing effect is not dependent on its primary nucleotide composition as shown using minigene assays [152].
The fibronectin EDA exon is controlled by seven hairpins and a key exonic splicing enhancer is found in the loop of one of the hairpins, which is in turn bound by splicing regulatory proteins such as SRSF1 [153,154]. Other examples include a hairpin which modulates the inclusion of the alternative exon 6B of the btropomyosin transcript in chicken [155]. It was also shown that a mutation in PS2 that deletes or destabilizes a hairpin in exon 5, results in higher levels of exon inclusion [156]. Importantly, the formation of hairpin structures could be dynamic and due to environmental changes, an example being temperature-dependent formation of a hairpin that controls splicing of APE2 gene in yeast [157]. In addition, alternatively spliced exons display an enrichment for secondary structures and evolutionary conservation of many of these structures indicates their important regulatory functions [57]. This is exemplified by conservation of secondary structures over the primary nucleotide sequence such as a conserved hairpin structure in RB1CC1 [57]. Advances in long-read RNA sequencing technologies will enable improved detection of longrange interactions and their impact in the regulation of alternative splicing events.

The role of RNA structures on RNA stability and decay
The half-life and decay rates of mRNA transcripts in human cells influence protein expression levels. A number of features determine transcript stability including GC content, transcript length, polyA tail length, RBP sites, microRNA binding sites, and mRNA secondary structures [158][159][160][161][162][163]. Structural features of mRNAs dictate to a large extent mRNA half-life with transcripts that have a structured coding sequence showing higher expression levels [159]. Hairpins in mRNA transcripts can result in increased stability [163][164][165], such as when found at the 3 0 UTR near mRNA cleavage sites. The accessibility of microRNA sites influences mRNA half-life and secondary structure formation can change the microRNA binding efficiency [166]. For example, the introduction of a hairpin in the 5 0 UTR of a transcript, results in substantial increases in gene expression [167,168]. Constitutive decay elements are RNA motifs that mediate the destabilization and degradation of mRNA molecules, and contain a hairpin Fig. 3. Mechanisms by which RNA structure formation influences translation. A. During cap-dependent translation, translation initiation factors (blue proteins) recognize the mRNA 5 0 cap structure (purple circle) and bridge its interaction with the 3 0 polyA tail, through polyA binding proteins (PABPs). During translation several helicases actively unwind the mRNA, which could remove secondary structures. This could lead to faster ribosome speeds, which may result in protein misfolding. B. Cap-dependent translation can be regulated by the dynamic formation of secondary structures in the 5 0 UTR. Hairpin formation can limit the binding of the ribosome and translation initiation factors, thereby repressing protein translation. The presence of G-quadruplexes in the 5 0 UTR may inhibit translation directly, activate upstream ORFs, or promote translation. C. Capindependent translation can take place in the presence of IRESs, which require highly structured 5 0 UTR domains that indirectly interact with PBAPs to promote mRNA circularisation. Some IRES structures can be activated by RNA G-quadruplex formation. Further formation of RNA secondary structures across the ORF can limit the translation speed and favor a step-by-step modular folding. Additional details on Cap-dependent and Cap-independent mechanisms are comprehensively reviewed at [234]. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) sequence [169] at which Roquin proteins bind to induce the decay of the transcript [170].
Massively parallel reporter assays are high-throughput technologies that enable rapid measurements of thousands of sequences for their regulatory activity and have received widespread adoption in recent years [171][172][173][174]. Multiple variants of this technology have been implemented to study a plethora of gene regulatory elements, including promoters, enhancers, 5 0 UTRs, and 3 0 UTRs, by placing synthetic sequences in the appropriate location relative to a reporter gene. In this case massively parallel reporter assay experiments have shown that its destabilizing effects increase as a function of the hairpin length [165].

Secondary structures in translation
Translation can be divided into four phases, initiation, elongation, termination and ribosome recycling [175,176]. Initiation is the rate limiting and most regulated step, consisting of several complex programs. The regulation of translation directly impacts protein levels with most regulatory mechanisms affecting the rate-limiting initiation step [177][178][179]. The multifarious effects of translational control can be observed across biological processes including development, differentiation, functions of the nervous system and disease [177,180]. Initiation can be either capdependent or cap-independent [181,182]. Cap-dependent translation is the most frequently used in eukaryotes and starts with the binding of eIF4E to the mRNA cap. The most common capindependent initiation mechanism, often utilized by viral RNAs, involves an internal ribosome entry site (IRES) of structured mRNA. IRES structures can recruit ribosomal subunits and eukaryotic initiation factors [183]. RNA molecules fold in complex configurations with the presence of RNA secondary structures in the 5 0 UTR being a major determinant of the rate of translation ( Fig. 3a and b) [184][185][186]. Moreover, the ribosome itself is a major remodeler of RNA structure [187]. Lower translation rates can not only limit protein abundance, but can also enable correct co-translational protein folding [188,189]. In addition, secondary structures can influence the recognition of the IRESs (Fig. 3c).
Although the vast majority of eukaryotic translation start sites have an AUG codon, often the first AUG codon is bypassed, resulting in usage of more distal AUG codons and alternative protein isoforms. This process is referred to as leaky scanning, with a proportion of ribosomes initiating translation from downstream start codons. Leaky scanning and translational efficiency are influenced by the presence of secondary structures [8,[190][191][192]. Moreover, there is a large proportion of suboptimal start sites that do not contain the canonical start codon. Microsatellite expansions can cause non-AUG initiation [193]. These non-AUG start sites are often associated with alternative translation start [194,195]. Ribosome profiling is one of the primary methods of identifying the occupancy of elongating ribosomes on mRNAs, therefore providing a direct readout of ribosome decoding rates [176].
Secondary structures can conceal or expose binding sites for translation regulators, and it has been shown that certain RBPs bind preferentially at structured RNA while others have a preference for linear forms [196]. Moreover, formation of secondary structures can change the distance between translationassociated motifs, an example being the distance between the stem-loop and the cap [197]. Secondary structure formation can also promote cap-independent translation, and the disruption of an IRES hairpin can in turn reduce translation efficiency in viral [198,199] and eukaryotic [200] mRNAs.
Riboswitches are components of mRNA molecules that can bind a small molecule and directly control gene expression through RNA conformational changes, without proteins being involved. They are found in both prokaryotes and eukaryotes, with most discovered riboswitches being present in bacteria and archaea [201]. The aptamer is a receptor for a small molecule, and it is usually located in the 5 0 UTR of a mRNA where it forms a secondary structure that binds to the small molecule. The expression platform is the regulatory domain of the riboswitch and it modulates gene expression upon binding of the small molecule. Riboswitches have been found to regulate a number of processes including initiation of translation [202], mRNA decay [203], transcription termination [204] and splicing [205,206]. For instance, in E. coli the lysine riboswitch when lysine is present it restricts translation initiation and also exposes RNase E cleavage sites [203].
RNA structures can directly interact with the translational machinery and influence the recognition of the translation start [207]. Note that the interaction is complicated by the fact that the translational machinery can unwind and remodel RNA structures [187]. There is also decreased translational efficiency at highly structured 5 0 UTRs [80,208]. For example, in the case of BRCA1, a tumor suppressor gene, a longer 5 0 UTR isoform is expressed only in breast cancer cells, resulting in a 10-fold decrease of translational efficiency due to the formation of a stable complex secondary structure [208]. Finally, the interplay between RNA structure formation and unwinding influences ribosome initiation, scanning and elongation. Therefore, secondary structures can account for differences between mRNA and protein levels [209].

Hairpins enable long range RNA interactions in translation initiation
Early studies indicated that hairpin formation can influence translation efficiency [210]. Hairpins with high thermal stability upstream of the translation start site resulted in reduced translation by up to 85-95%, whereas hairpin formation downstream of an AUG at specific positions resulted in an increase in translation rate by facilitating recognition of initiator codons by ribosomes [211,212]. Stem length and GC content, both of which increase thermal stability, inhibit translation, while more distant hairpins have a smaller inhibitory effect [213]. Other studies have also indicated that both the GC content of the stem and the positioning of the hairpin relative to the translation start site dramatically influence the translation efficiency [207].
Hairpins at the 5 0 UTR of ferritin-H and ferritin-L mRNAs act as an iron-responsive element controlling iron levels and are highly dynamic response elements to environmental changes [214]. Another example is a hairpin structure in the c-JUN 5 0 UTR which is recognized by eIF3 and is required for initiation of translation [215]. Another study generated a library of half a million 50 bp long 5 0 UTRs and identified hairpin structures to negatively impact protein levels, especially those with longer stems and shorter loops [216].

G-quadruplexes in translation initiation
RNA G-quadruplexes are enriched at 5 0 UTRs (Huppert et al. 2005) where they show a higher frequency at the template strand, suggesting a relative depletion of G-quadruplexes at the RNA level [217]. There is also a difference in the density of G-quadruplexes, with the highest density being found within 50 bp of the start of the 5 0 UTR and a declining frequency moving away from it [217]. It has been shown that G-quadruplexes in the 5 0 UTR of mRNAs are inhibitory elements [218], and several studies have since shown that G-quadruplexes at the 5 0 UTR interfere with the recognition by ribosomes [17,[219][220][221][222][223]. Specifically, experiments involving luciferase plasmid vectors indicate that G-quadruplexes inhibit expression across 5 0 UTR regions, perhaps by interfering with ribosome scanning. However, in many of these experiments the researchers used controls where guanines had been substituted for uracils, potentially also interfering with RBP binding sites and the GC content [218,219].
It has also been shown that G-quadruplexes at 5 0 UTRs of eukaryotic genes can promote translation by favoring recognition of the IRES [224][225][226][227]. In FGF-2, a gene that is associated with tissue development and repair, a G-quadruplex motif together with two hairpin sequences are found within the IRES, and they promote translation in a cap-independent translational program [225]. A G-quadruplex site in the RBP FMRP is a binding site for the protein itself, and it has been suggested that it could in this way control both its own expression levels [228] and its mRNA splicing [134]. In VEGF, an RNA G-quadruplex was shown to be essential for IRES-mediated translation initiation [227,229,230]; however other studies have contended its role and provided evidence for inhibitory functions [231,232].
A study that used massively parallel reporter assays to investigate mRNA translation found that G-quadruplexes in the 5 0 UTR act as translational inhibitors, and that knockdown of G-quadruplex resolving helicases aggravated these phenotypes [233]. It was also found that RNA G-quadruplex formation could promote the usage of an upstream translation start site by slowing down the preinitiation complex scanning [80]. The role of secondary structures was systematically explored in a high-throughput experiment where half a million 50 bp randomly generated 5 0 UTRs were synthesized and tested in yeast. The results showed that several secondary structures, including RNA G-quadruplexes and hairpins, are important contributors to expression levels [216]. RNA Gquadruplexes can either restrict or promote the recognition by ribosomes and even though there are more studies indicating inhibitory functions, it is not clear which effect is more widespread and what features determine if the G-quadruplex will restrict or promote ribosomal recognition.

Splicing and translation associated secondary structures in disease
Regions that are predisposed to secondary structure formation, such as G-quadruplexes have an excess of germline and somatic mutations [235,236]. The functional role of these structures is supported by the observation that eQTLs are enriched at Gquadruplexes within 5 0 UTRs and splicing quantitative trait loci (sQTLs) are enriched at G-quadruplex motifs flanking splice sites [70,132]. The accumulation of R-loops is also associated with genomic instability [237][238][239][240] As secondary structure formation modulates diverse processes including splicing and translation initiation, changes in the mRNA structure have been associated with and can result in human disease.
Mutations of alternative splicing factors can lead to R-loop accumulation, which may compromise genomic stability and be relevant in the context of cancer pathogenesis [241,242]. RNA splicing perturbation by expression of U2AF1 or SRSF2 mutants, mutations that are commonly observed in myelodysplastic syndrome, results in the accumulation of R-loops [243]. In the MAPT gene, also known as tau, in the interface between exon 10 and intron 10, there is a hairpin structure which can mask the splice site [244,245] and DDX5 was found to be involved in the resolution of this hairpin structure controlling splicing of MAPT (tau) exon 10 [86]. Mutations at the hairpin result in its destabilization, causing inclusion of exon 10 due to increased association with U1 snRNP [244] and results in higher prevalence of neurodegeneration. Hairpin sequences were also identified in the 5 0 UTR of other transcripts including the amyloid precursor protein [246] and a-synuclein [247], indicating the importance of structure-mediated control of expression levels. In spinal muscular atrophy, a stem-loop RNA structure overlaps with the 5 0 splicing site of exon 7 of SMN2 and interference with the structure formation is a therapeutic target against the spinal muscular atrophy molecular phenotype [248]. Sulovari et al. showed that variable number tandem repeats were particularly enriched at Alu elements and found an association between genes differentially spliced or expressed between human and chimpanzee brains [249].
RNA G-quadruplex structures have been identified in several cancer genes, including TP53 and TERT, where they can modulate splicing and protein isoforms [133,135]. In CD44 an RNA Gquadruplex in intron 8 functions as a splicing enhancer with roles in the control of the epithelial-mesenchymal transition [137], a process that is important for cancer metastasis [250]. One of the canonical translation initiation factors, elF4A, is a DEAD-box RNA helicase that can unwind secondary structures, including RNA Gquadruplexes, and its activity is correlated with the number of secondary structures in the 5 0 UTR [251]. Perturbation of elF4A can contribute to oncogenesis as it results in formation of RNA Gquadruplexes in the 5 0 UTRs of mRNAs targeted by elF4A, including many oncogenes, transcription factors, and epigenetic regulators [252].
The expansion of microsatellite repeats at 5 0 UTRs has been associated with aberrant translation and has been implicated in multiple disorders [193,253]. The mechanisms involve the formation of secondary structures that interfere with translation and repeat-associated non-AUG translation. One of the most wellstudied examples is the expansion of the hexanucleotide GGGGGC in the first intron of the C9orf72 gene which results in frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS). These repeats form different secondary structures including Gquadruplexes, R-loops and hairpins [254][255][256] which leads to aborted transcription at the repeat site [254]. Expansion of these repeats results in repeat-associated non-AUG translation and the generation of toxic dipeptide proteins [257], while reducing DHX36 levels in cells derived from C9orf72-linked ALS patients results in reduced dipeptide protein burden due to the formation of RNA G-quadruplexes [258]. In ALS and FTD, Nucleolin binds to the G-quadruplex forming hexanucleotide repeat, resulting in its mislocalization in the cell [254]. In addition, a number of other proteins associated with the ALS pathology such as TDP-43, FUS/ TLS, hnRNPA1, hnRNPA2B1, hnRNPA3 and EWSR1 interact with the RNA G-quadruplex [259][260][261][262][263][264]. Encouragingly, G-quadruplex binding small molecules ameliorate the pathologies associated with ALS and FTD in model systems, indicating that RNA Gquadruplexes can pose as a therapeutic target [265]. Betaamyloid precursor protein cleaving enzyme 1 (BACE1) encodes a protein that cleaves amyloid precursor protein resulting in the generation of amyloid-beta peptide, the accumulation of which is a hallmark of Alzheimer's disease [266]. An RNA G-quadruplex in exon 3 of BACE1 modulates splicing by inhibiting the binding of hnRNP H, thereby promoting a shorter isoform without the proteolytic activity that creates the neurotoxic peptide [267]. ADAM-10 is also associated with Alzheimer's disease due to its antiamyloidogenic activity and a RNA G-quadruplex in its 5 0 UTR represses its expression [268].
14. Concluding remarks RNA secondary structures are pervasive, interact with RNA binding proteins and are linked to a large number of important functions, including transcription, splicing and translation. Even though the functional importance of secondary structures has been repeatedly demonstrated, the contribution of RNA structures in these processes remains incompletely understood due to the difficulties in identifying dynamic RNA structures and their mechanisms of action. High-throughput technologies enable the systematic investigation of RNA secondary structures and the design of experiments to quantify their contribution in transcription, splicing and translation enables directly testing their mechanisms of action. New methods to dynamically identify RNA secondary structures are gradually revealing their widespread and diverse contributions in gene regulation. However, it remains difficult to capture their dynamic changes across cellular conditions and their interplay with proteins. The degree to which RNA secondary structure formation is influenced by the tissue and cell type remains largely unstudied. The availability of large scale single cell assays will enable the investigation of associations between secondary structures, the presence of various sequence motifs, and expression levels of RBPs across different cell types. Even more interesting could be the combination of single cell technologies with different small molecules that stabilize specific structures.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.