Long noncoding RNAs and their link to cancer

The central dogma of molecular biology, developed from the study of simple organisms such as Escherichia coli, has up until recently been that RNA functions mainly as an information intermediate between a DNA sequence (gene), localized in the cell nucleus, serving as a template for the transcription of messenger RNAs, which in turn translocate into the cytoplasm and act as blueprints for the translation of their encoded proteins. There are a number of classes of non-protein coding RNAs (ncRNAs) which are essential for gene expression to function. The specific number of ncRNAs within the human genome is unknown. ncRNAs are classified on the basis of their size. Transcripts shorter than 200 nucleotides, referred to as ncRNAs, which group includes miRNAs, siRNAs, piRNAs, etc, have been extensively studied. Whilst transcripts with a length ranging between 200 nt up to 100 kilobases, referred to as lncRNAs, make up the second group, and are recently receiving growing concerns. LncRNAs play important roles in a variety of biological processes, regulating physiological functions of organisms, including epigenetic control of gene regulation, transcription and post-transcription, affecting various aspects of cellular homeostasis, including proliferation, survival, migration and genomic stability. LncRNAs are also capable of tuning gene expression and impact cellular signalling cascades, play crucial roles in promoter-specific gene regulation, and X-chromosome inactivation. Furthermore, it has been reported that lncRNAs interact with DNA, RNA, and/or protein molecules, and regulate chromatin organisation, transcriptional and post-transcriptional regulation. Consequently, they are differentially expressed in tumours, and they are directly linked to the transformation of healthy cells into tumour cells. As a result of their key functions in a wide range of biological processes, lncRNAs are becoming rising stars in biology and medicine, possessing potential active roles in various oncologic diseases, representing a gold mine of potential new biomarkers and drug targets.


Introduction to lncRNAs
The central dogma of molecular biology, developed from the study of simple organisms such as Escherichia coli, has up until recently been that RNA functions mainly as an information intermediate between a DNA sequence (gene), localized in the cell nucleus, serving as a template for the transcription of messenger RNAs, which in turn translocate into the cytoplasm and act as blueprints for the translation of their encoded proteins [1] (Fig. 1).
Gene expression is required for all aspects of life, and its regulation defines development and homeostasis of all cells, tissues, and organisms. There are a number of classes of non-protein coding RNAs (ncRNAs) which are essential for gene expression to function. These include small nuclear RNAs (snRNAs), mainly involved in mRNAs splicing events; transfer RNAs (tRNAs) which are responsible for specifically recognising three-nucleotide sequences of mRNAs, decoding the mRNA sequence into peptide or protein, and recruiting amino acids into ribosomes in the correct order. The most abundant cellular RNA molecules are represented by ribosomal RNAs (rRNAs), forming the framework of ribosomes. snRNAs, tRNAs, rRNAs are referred to as housekeeping RNAs and are constitutively expressed and are essential for normal cellular function [1].
Next generation sequencing methods and progress in transcriptome analysis have led to the discovery that up to 70% of the human genome is transcribed into RNA, however, only up to 2% of this serves as blueprints for proteins [2][3][4][5][6]. Moreover, the number of protein coding genes has remained quite steady during evolution in metazoan (G value paradox), whereas the size of genomes tends to increase. The advent of tiling resolution genomic microarrays and whole genome and transcriptome sequencing technologies showed that the human transcriptome is more complex than a collection of protein-coding genes and their splice variants; showing extensive antisense, overlapping and https://doi.org/10.1016/j.ncrna.2020.04.003 Received 19 February 2020; Received in revised form 22 April 2020; Accepted 22 April 2020 non-coding (ncRNA) expression [3,7]. In fact, 46% of the human genome, making up the largest part, consists of repetitive elements (such as transposons), and these have probably been the driving forces of evolution [7,8]. Moreover, it is also worth mentioning that in most cases transposons do not code for proteins and have been recently discovered to be related to cancer processes [9,10].
The term ncRNA is commonly used to refer to RNA which does not encode a protein. However, this does not imply that such RNA molecules do not contain any information and serve no function. Traditionally, it has been assumed that most genetic information is transacted by proteins. Recent evidence through the development of new techniques have revolutionised the molecular world and have shown that the majority of the mammalian and other complex organisms' genomes is in fact transcribed into ncRNAs, which appear to comprise a hidden layer of internal signals controlling various levels of gene expression in physiology and development, including chromatin structure, epigenetic memory, transcription, RNA splicing, editing, translation and turnover [11].

lncRNAs
The specific number of ncRNAs within the human genome is unknown. ncRNAs are classified on the basis of their size. Transcripts shorter than 200 nucleotides, referred to as ncRNAs, which group includes miRNAs, siRNAs, piRNAs, etc, have been extensively studied. Whilst transcripts with a length ranging between 200 nt up to 100 kilobases, referred to as lncRNAs, make up the second group, and are recently receiving growing concerns. The latter transcripts lack a significant open reading frame [6,7,12].
Long non-coding RNAs (lncRNAs), encompassing nearly 30,000 different transcripts in humans, represent the most prevalent and functionally diverse class of ncRNAs [11]. There is no universal definition based on biological argumentation. Certain groups argue that lncRNAs may be classified into antisense, intergenic, overlapping, intronic, bidirectional, and processed subtypes, depending on the transcription position and direction in relation to other genes [13,14].. However, the most commonly used definition, an arbitrary one indeed, is based on the threshold of 200 nucleotides (nt) of RNA length [11] a lack of protein-coding potential and often harbour a poly-A tail and can be spliced, similar to mRNAs [1]. Conventionally, this divides RNAs into two groups; the lncRNAs which are > 200 nt in length, and the remaining ones, referred to as "small" RNAs which are therefore < 200 nt in length. The latter group includes many different RNAs, such as microRNAs (miRNAs), small nucleolar RNAs (snoRNAs), piwiRNAs (piRNAs) [11]. [15] attempted to distinguish between lncRNAs and small ncRNAs, by defining the former group as those ncRNAs which function either as primary or spliced transcripts, independent of extant known classes of small ncRNAs [3,15]. Such a definition places ncRNAs such as BC1 and snaR in the lncRNA database, even though these are less than or close to 200 nt in length [15].
LncRNas are observed in a large diversity of species, including animals [16], plants [17], yeast [18], prokaryotes [19], and even viruses [3,20]. However, lncRNAs have been poorly conserved among different species when compared with the well-studied RNAs (such as mRNAs, miRNAs, snoRNAs). This in turn has invoked uncertainty as to whether a given lncRNA is function at all. Otherwise, the fact that there is poor interspecies conservation may convey functional species-specific characteristics. In addition, lncRNAs are usually low expressed [21,22], making them look more as transcriptional noise [11].
LncRNAs play important roles in a variety of biological processes, regulating physiological functions of organisms, including epigenetic control of gene regulation, transcription and post-transcription [5,6], affecting various aspects of cellular homeostasis, including proliferation, survival, migration and genomic stability. LncRNAs are also capable of tuning gene expression and impact cellular signalling cascades [23], play crucial roles in promoter-specific gene regulation, and Xchromosome inactivation [1]. Furthermore, it has been reported that lncRNAs interact with DNA, RNA, and/or protein molecules, and regulate chromatin organisation, transcriptional and post-transcriptional regulation [23]. Consequently, they are found to be differentially expressed in tumours, and they are directly linked to the transformation of healthy cells into tumour cells. As a result of their key functions in a wide range of biological processes, lncRNAs are becoming rising stars in biology and medicine, possessing potential active roles in various oncologic diseases, representing a gold mine of potential new biomarkers and drug targets [3,23,24].
Research is also showing that lncRNAs are deregulated in a number of human cancers, and their aberrant expression leads to cell proliferation, tumour initiation, growth and metastasis of cancer cells [25][26][27][28]. More specifically [24], reported that they have identified 707 potential cancer-related lncRNAs, which act as scaffolds, interacting physically with other RNA species, resulting in a direct impact on cell signalling cascades. In this chapter, we seek to understand the link between cellular processes influenced by lncRNAs to the hallmarks of cancers [3][4][5]. This should serve to stimulate new research directions and therapeutic options, where lncRNAs can serve the purpose of novel prognostic markers, and therapeutic agents. However, even though the functional classification and link of lncRNAs to cancer is well-established, further studies are required so as to obtain a clearer characterisation with respect to phenotypic outputs, to suitably identify candidates which enable the development of new therapeutic strategies, together with the design of novel diagnostic approaches [13].

lncRNA and their link to cancer
A broad definition of cancer, also referred to as malignancy, is an abnormal and uncontrolled growth of cells, with the potential of invading or spreading of the affected cells to other parts of the body [29]. Cancer is primarily caused by genetic alteration which result in the deregulation of the gene networks that are responsible for the maintenance of cellular homeostasis, resulting due to interactions of somatic and germline mutations with various environmental factors [5]. There are more than 100 types of cancer, categorised according to the tissue of origin [30]. As a result, symptoms of cancer vary considerably with the type of tissue involved, location of origin, and type of genetic alteration causing the disease [29]. Research has pinpointed genetic alterations as being the main culprit behind this deadly disease. Several lifestyle and environmental related factors, including smoking, physical inactivity, high body fat, alcohol and caffeine intake, exposure to ultraviolet radiation, poor nutrition and high cholesterol intake diet, and use of aspirin [29,31], may also increase the risk of transforming normal cells to cancerous cells, altering the expression, at least in part, of various genes related to cellular proliferation and differentiation.
Several studies, particularly with the recent application of nextgeneration sequencing to a growing number of cancer transcriptomes, comparing malignant cells with their corresponding normal cells have revealed that many transcription factors, post-transcriptional regulators such as RNA binding proteins, microRNAs, and lncRNAs are crucial regulators for promoting or inhibiting tumour development [6,12,29]. LncRNAs have been gaining significant attention in terms of regulating the neoplastic transformation and progression [29], as well as being involved in the regulation of various cellular functions, including proliferation, migration, and DNA stability [29,32] even though only a few of these have been functionally characterised [12]. Fig. 2 below depicts the various ways lncRNAs are linked to the hallmarks of cancer. [5] reported that prostate cancer associated 3 (PCA3, also referred to as DD3) and prostate-specific transcript 1 (PCGEM1) were the first lncRNAs that were associated with cancer because of their aberrant expression, found to have differential display analysis of prostate tumours and normal tissue [5]. PCA3 is currently used as a prostate cancer biomarker [33]. [6] have reported that a number of lncRNAs are involved in important processes of breast cancer (BC), including 1. Promotion and proliferation of BC or apoptosis inhibition of BC (lncRNAs include: H19, SRA, LINC01296); 2. Promotion of drug resistance in BD cells (lncRNAs include: UCA1, CRALA, lnc-ATB); 3. Promotion of invasion and metastasis of BC cells (lncRNAs include: HOTAIR, MALAT1, CCAT2) [5,6]. Some other lncRNAs have been shown to inhibit these processes [6]. So much so that lncRNAs have the potential to serve as biomarkers in certain cancer types [29], more specifically in those malignancies where the alternation of these ncRNAs are associated with cancer development, progression, and metastasis [34]. Moreover, a unique pattern of expression of some lncRNAs in specific types of cancer has made them attractive targets for drug development [35]. The following table (Table 1) highlights the expression of lncRNAs in different types of cancer.
Moreover [12,12], have reported that the upregulation of certain lncRNAs, including HOTAIR, MALAT1, CCAT2, and the downregulation of LOC285194, UC.388, and LET have been implicated in promoting the metastasis of colorectal cancer (CRC). However, the biological and pathological functions of their mechanism remains a field to be studied, since most lncRNAs which were expected to be prognostic and predictive in cancer patients have unfortunately failed to perform these functions when tested in vivo [12].

Mechanism(s) of lncRNA action
A number of studies using next generation sequencing have revealed   [52][53][54] that a significant portion of the mutation associated with cancer development lies within the non-coding region of the human genome, which mutation has a particular effect on the expression of lncRNAs, which in turn may regulate various cancer phenotypes by interacting with DNA, RNA, and proteins [35]. Research has demonstrated that a number of lncRNAs have been reported to be aberrantly expressed in tumours, showing crosstalk with key cancer-related signalling pathways     [55], with the main mechanism(s) of action of lncRNA on cancer cells regulate the expression of target genes in the following ways [29]: i. Facilitating combinatorial actions of different transcription factors ii. Removing transcription factors and other regulatory protein from chromatin iii. Recruiting chromatin modifiers in cis and trans genes iv. acting as scaffolds bringing multiple proteins together, forming ribonucleoprotein complexes inducing histone modification v. Interact with DNA methyltransferase enzymes through other protein mediators, regulating DA methylation in both cis and trans genes Moreover, lncRNAs are significantly associated with the growth, survival, migration, and angiogenesis of a number of cancer cell types by transcriptionally or posttranscriptionally regulating the epigenetic regulators/modifiers [29,55]. The following (Fig. 3) depicts the role of lncRNA, BCAR4, in the metastasis of breast cancer via chemokine-induced binding of BCAR4 to two transcription factors having extended regulatory consequences.

Role of lncRNAs as tumour suppressors
LncRNAs can also act as tumour suppressors. Genome-wide studies have revealed that transcription factors, such as p53 [32,56], MYC [57,58] or the oestrogen receptor [59] specifically regulate the expression of a number of lncRNAs. One of the major tumour suppressor proteins and preserver of cellular homeostasis, identified so far is p53, playing a vital role in genomic stability, regulating its downstream target genes by binding specifically to p53 response element (p53RE). Research has shown that p53RE lies on the genomic region that encodes lncRNAs, suggesting a possible role of lncRNAs as tumour suppressors. For example, following DNA damage or oncogenic stress, the transcription factor p53 initiates a tumour suppressor program which involves the induction of many genes, including lncRNAs, and as shown in Fig. 4, some of these lncRNAs are direct transcriptional targets of p53. [6] have reported that there are a number of lncRNAs, as shown in Fig. 5 below, that are related with inhibiting the development of BC [6].

Conclusion
Given the important of lncRNAs in controlling important cellular processes, it is sound to say that similar to protein-coding regions of the human genome, genetic regions encoding lncRNAs play equally important roles in regulating the malignant transformation and progression. With the improvement of research methods such as the development of gene array technologies and high-throughput sequencing technologies, more categories of lncRNAs are expected to be discovered, which technologies will allow for the effective understanding of their complex mechanism/s of action, and eventually using recent CRISPR/CAS gene editing technologies to play certain roles in developing lncRNAs as tumour suppressor therapies, delivering novel and alternative treatment strategies for targeting cancer associated lncRNAs.