The State of Long Non-Coding RNA Biology

Transcriptomic studies have demonstrated that the vast majority of the genomes of mammals and other complex organisms is expressed in highly dynamic and cell-specific patterns to produce large numbers of intergenic, antisense and intronic long non-protein-coding RNAs (lncRNAs). Despite well characterized examples, their scaling with developmental complexity, and many demonstrations of their association with cellular processes, development and diseases, lncRNAs are still to be widely accepted as major players in gene regulation. This may reflect an underappreciation of the extent and precision of the epigenetic control of differentiation and development, where lncRNAs appear to have a central role, likely as organizational and guide molecules: most lncRNAs are nuclear-localized and chromatin-associated, with some involved in the formation of specialized subcellular domains. I suggest that a reassessment of the conceptual framework of genetic information and gene expression in the 4-dimensional ontogeny of spatially organized multicellular organisms is required. Together with this and further studies on their biology, the key challenges now are to determine the structure–function relationships of lncRNAs, which may be aided by emerging evidence of their modular structure, the role of RNA editing and modification in enabling epigenetic plasticity, and the role of RNA signaling in transgenerational inheritance of experience.


A Surprising New Entrant into the Repertoire of Expressed Genes
A role for RNA as regulatory molecules, rather than just as templates (mRNAs) and components of the machinery (ribosomal RNAs, transfer RNAs, spliceosomal RNAs) for the production of proteins, was established in the 1990s with the surprising discovery of the RNA interference (RNAi) and related pathways, which utilize small RNA guides to regulate mRNA stability and translation, and to control transposons [1][2][3][4][5][6][7].
Indeed, it appears that the vast majority of the genomes of all organisms, irrespective of the proportion that is protein-coding, is transcribed, mainly to produce non-coding RNAs [28].

Evidence of Long Non-Coding RNA Functionality
Indeed, there are many different rate classes of sequence evolution in mammals, indicating that at least 45% of the alignable regions of mammalian genomes are not evolving neutrally [53], with at least 18% of the mammalian genome conserved at the level of predicted RNA structure [54].
The case for lncRNA functionality is also supported by their dynamic expression patterns in differentiating cells and their highly specific spatial (including subcellular) localization [57,[60][61][62], especially in the brain [63,64], which also explains their low abundance in RNAseq analyses of whole tissues [26]. Indeed, high-resolution analyses using RNA capture technologies have revealed an extraordinary diversity of lncRNAs, most of which are likely to be cell-specific, and which have yet to be catalogued or characterized [27,65] Perhaps the most intriguing are the 3'UTR-derived lncRNAs that are expressed separately from, and appear to convey differentiation signals independently of, their normally associated mRNAs [66,67].
Knockdown of lncRNAs by small interfering RNA (siRNA)-related methodologies frequently results in observable changes in cellular behavior or characteristics in culture [55]. On the other hand, chromosomal deletion of lncRNA sequences often do not show overt phenotypic consequences. For example, only 5 of 18 lncRNA mouse knockouts resulted in lethality or growth defects [99,100]. However, most phenotypic screens do not examine behavioral or cognitive effects. For example, deletion of the widely brain expressed non-coding RNA BC1 showed no developmental consequences [101], but later tests showed the mutant mice, although having normal brain morphology and no obvious neurological deficits, exhibited decreased exploratory ability and increased anxiety [102].
In this context, it is worth noting that deletion of a subset of the most highly conserved sequences in the mammalian genome, ultraconserved elements (UCEs) [103,104], which are surely functional on the evolutionary evidence, also did not result in obvious abnormalities [105], although a later study showed subtle neurological alterations [106].
While skeptics remain, the most likely interpretation is that the documented functional examples are emblematic of an army of regulatory RNAs that guide epigenetic trajectories and specify cell state during a very complex and precise developmental ontogeny-from a single fertilized cell to a mobile, cognizant adult-and that most of the human genome is devoted to this purpose [37][38][39][107][108][109][110][111][112]. Indeed, the proportion of the mammalian genome devoted to cognitive function, rather than body plan development, may be considerably underestimated, given the preponderance of lncRNA expression in the brain [63]. Not surprisingly then, many lncRNAs are primate-specific [57,113,114].
Indeed, the growing body of evidence is now leading to a general acceptance of the relevance of (many or most) lncRNAs to cell and developmental biology [92], and increasingly neurobiology [63,64,96,[115][116][117], with the debate, such as it remains, shifted to the proportion of lncRNAs that may be biologically relevant. For me, the best indicator, although by no means proof, is their precise expression patterns [26], on which basis one can project that most are likely to be functional.
If so, the current protein-centric framework for understanding the genetic programming of differentiation and development is incomplete, a legacy of the mechanical worldview that held sway at the birth of molecular biology. Reconsideration of this framework to incorporate not only proteins but also structural and regulatory RNAs [109,111,118,119] is overdue.

Long Non-Coding RNA Structure-Function Relationships
The most pressing challenges now are to determine the structure-function relationships in lncRNAs and to parse their functional repertoire. This should resolve lingering questions and place lncRNAs into an integrated conceptual framework, together with small regulatory RNAs, transcription factors and signaling pathways, among others, for understanding the decisional hierarchies that control the 4-dimensional ontogeny of complex multicellular organisms [119].
There is logic and experimental evidence to suggest that lncRNAs have a modular architecture, given their likely role as scaffolds and epigenetic guide molecules [26,82,120]. This is strengthened by a recent high depth sequencing study that found, unexpectedly and in contrast to the limited information that had been previously available [57], that the internal exons of lncRNAs are almost universally alternatively spliced [27], which clearly implies modularity.
If this is correct, the establishment of the exon as the primary unit of lncRNA structure-function, combined with the observation of conservation of lncRNA structure [54] and the presence of structural orthologs around the genome [121], should provide a framework for determining which structural RNA modules associate with which effector proteins [121]. It is envisaged that such studies will lead to expanded structure-function databases [122][123][124] whereby specific protein (e.g., polycomb) binding domains in regulatory RNAs can be identified genome-and transcriptome-wide, and thereby the roles of and effector pathways for different lncRNAs and their alternatively spliced isoforms. It may be much harder, as exemplified by snoRNAs, to determine the RNA and DNA targets of lncRNAs, and which modules impart this function.
This framework should also allow parsing of the different types and roles of lncRNAs in establishing chromatin territories, enhancer looping, guidance of epigenetic modifier proteins that impose DNA and histone modifications, and the formation of subcellular domains, among others.
In addition, while most lncRNAs are nuclear and associated with chromatin, some are cytoplasmically localized [57,63] with functions yet to be discovered. There is increasing evidence that RNAs are involved in the nucleation of liquid crystal domains in conjunction with disordered RNA-binding proteins [125], potentially an entirely new dimension of cell biology beyond that of the well-characterized membrane-bound organelles. High-resolution imaging will be required, along with high-resolution RNA sequencing, and oligonucleotide or antibody capture to dissect the components of the structures where lncRNAs are localized.

From Hard-to Soft-Wiring
A new and rapidly emerging frontier is the role of RNA editing [126,127] and RNA modification [128,129] in modulating RNA signaling pathways in response to developmental cues and environmental signals, which may lie at the heart of the epigenetic plasticity seen in physiological adaption, complex diseases such as cancer and diabetes, and brain function [130].
There is still much to do to understand the role of small RNAs, especially the ti/spliRNAs that are derived from transcription start sites and exonic borders [131,132], and fragments of tRNAs [133][134][135] and snoRNAs [136], some of which may function as miRNAs [137][138][139], as well as to decipher their evolutionary links and the regulatory networks in which they participate.
Finally, and most intriguingly, is the role of RNA in intercellular and transgenerational inheritance (soft-wired inheritance), for which there is not only evolutionary logic [140][141][142] but also increasing evidence [143][144][145][146][147][148][149]. The emerging picture is not (simply) of RNA as a transient intermediate between 'gene' and protein, but rather as the central computational engine of cell biology, differentiation and development, brain function and perhaps even evolution itself. Many textbooks may have to be rewritten once the full dimensions of regulatory RNA biology are revealed.
Funding: This research received no external funding.