A brief review of noncoding RNA

Background The genetic code for every organism is stored in biomolecules the deoxyribonucleic acid (DNA) and the ribonucleic acid (RNA). In higher organisms, DNA is found inside the nucleus while RNA is found outside the nucleus. While gene, which is directly responsible for the coding of proteins which are needed by the organism, constitutes only around one per cent of DNA, the remaining 99 per cent is noncoding. Coding RNA generally refers to mRNA that encodes protein, noncoding RNAs Â act as cellular regulators without encoding proteins. Main text Although two-thirds of the human genome get transcribed, only 2% of the transcribed genome encodes proteins. It has been found that the remaining gets converted into long ncRNA and other ncRNAs. Noncoding RNA molecules known right from the early days of molecular biology are molecules like tRNA and rRNA. Long ncRNAs (lncRNA) were thought of as transcriptional noise even in the genomic era, but it has been found that they act as regulators at different levels of gene expression including chromatin organisation, transcriptional regulation and post-transcriptional control. This means that long ncRNAs control all stages of cell biogenesis and have critical roles in cell development and diseases. As much as they are vital to the development, evidence from research proves that mutations and dysregulations of these long ncRNA molecules are linked to diverse human diseases ranging from neuro-degeneration to cancers. Conclusion The noncoding gene which was largely ignored in the initial days of molecular biology has come to the centre space after the prime role it occupies in the various stages of biogenesis of organisms has come to light. The study of such molecules is vital and central in molecular biology today and they are immensely researched in drug discovery too.


Background
The intention of this paper is to present to the reader the basic biological background needed for computational/ digital signal processing analysis of noncoding RNA.While preparing this paper, care has been taken to site only the literature which gives clear ideas on the structure, location in the genome (wherever applicable) and function of noncoding RNA.The references included in this paper are also for further extensive study on the topic, if the reader so desires.The helical structure of DNA, as we know it today, was revealed by the work of Watson & Crick in 1953 [1].Following this finding for which they were awarded the Nobel Prize, there has been a phenomenal progress in genomics in the last seven decades.It is known that all characteristics of living beings are determined by the gene sequences.At present, there are quite a few number of public databases which provide genomic and proteomic data which can be put to use so that it benefits humanity.
Genomic/proteomic information available in public data resources is in the form of character strings rather than numerical sequences.However, if we properly map a character string into a numerical sequence, then it can be processed with digital signal processing techniques.
Digital signal processing has the potential to provide a set of novel and useful tools for solving highly relevant problems [2,3].For example, colour spectrograms provide significant visual information about biomolecular sequences which facilitates understanding of local nature, structure and function.Also, the magnitude and the phase of properly defined Fourier transforms [3,4] can be used to predict important features like the location and properties of protein-coding regions in DNA, which are indicative of their functions.
Genomic information is digital in a very real sense.It is represented in the form of sequences of which each element can be one out of a finite number of entities.Such sequences, namely, DNA and proteins, have been mathematically represented by numerical sequences, in which each character is a mapped to a numeric [3][4][5].

DNA, genes, formation of protein
Deoxyribonucleic acid (DNA) contains the genetic instructions used in the development and functioning of all known living organisms including some viruses.The main role of DNA molecules is the long-term storage of information.DNA is often compared to a set of blueprints or a recipe, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules [6,7].
A schematic diagram for the DNA molecule is given in Fig. 1.The DNA molecule has the structure of a double helix.
Between the two strands of the backbone which is outside, there are pairs of bases like the rungs of a ladder.The backbone is a very regular structure made from sugarphosphate.There are four types of bases (or nucleotides), denoted with the letters A, C, G and T (respectively, adenine, cytosine, guanine and thymine).A and G are called the purines, while T and C are called the pyrimidines.In the double-stranded DNA, the base A always pairs with T, and C pairs with G. Thus, the bottom strand TGA CCG TTAC is the complement of the top strand.This is called the Watson-Crick base pairing or canonical base pairing; it occurs through the weak hydrogen bond [6,7].Nevertheless as there are several million base pairs, the two strands are held together strongly.Typically in any given region of the DNA molecule, only one of the two strands is active in gene expression [6,7].
A DNA sequence has two regions as shown in Fig. 2: genes (marked violet) and intergenic spaces (marked yellow).Genes contain the information for generation of proteins.Each gene is responsible for the production of a different protein.Gene has two sub-regions called the exons (marked red in 2) and introns (marked green in Fig. 2).Exons are the regions which directly take part in the formation of protein.Introns do not take part directly in the coding of proteins [6,7].Prokaryotes like bacteria do not have introns [7].
The central dogma of molecular biology states that genetic information flows from DNA to RNA to protein.The central dogma is represented in Fig. 3.
The gene is first copied into a single-stranded chain called the messenger RNA or the mRNA molecule.This process is called transcription.The introns are then removed from the mRNA by a process called splicing.The spliced mRNA is then used by a large molecule called the ribosome to produce the appropriate protein.
The translation from mRNA to protein is aided by adaptor molecules called the transfer RNA or tRNA.It could be said that the tRNA molecules also store the genetic code [7].

RNA
The RNA (ribonucleic acid) molecule is closely related to the DNA.It is also made of four bases but instead of thymine, a molecule called uracil (denoted as U) is found in RNA. Figure 4 shows a comparative representation of DNA and RNA. Figure 5 shows the primary chemical structure of an RNA sequence.Unlike DNA, RNA is single stranded.The singlestranded RNA molecule folds onto itself to form what is called the secondary structure of the RNA.While doing so, hydrogen bonds are formed between the bases.Base pairing occurs as in the case of DNA.U pairs with A by hydrogen bonding just like T pairs with A as in DNA.The sugar in the sugar-phosphate backbone is also slightly different from the DNA molecule.DNA contains the sugar, deoxyribose, while RNA contains the sugar ribose.The only difference between ribose and deoxyribose is that ribose has one more -OH group than deoxyribose, which has -H attached to the second ( 2 ′ ) carbon in the ring as shown in Fig. 5. DNA is stable under alkaline conditions while RNA is not stable [6,7].

mRNA, rRNA, tRNA and the formation of protein
The classical knowledge of the RNA molecules was that they are short in length, typically short-lived and are used by the cell as temporary copies of portions of DNA [6,7].A typical example is the messenger RNA (mRNA).Messenger RNA is a large family of RNA molecules that Fig. 3 The central dogma of molecular biology.Formation of protein from DNA Fig. 4 Comparison of DNA and RNA convey genetic information from DNA to the ribosome, where they specify the amino acid sequence of the protein products of gene expression.Messenger RNA molecules have short life span beginning with transcription.A very simple description of transcription and translation which happens in eukaryotes is given below [6][7][8].
A copy of the gene from the DNA is written on to the mRNA by molecules called RNA polymerase which associates with mRNA-processing enzymes during transcription so that processing can start immediately after transcription.The short-lived, unprocessed/partially processed molecule is termed pre-cursor mRNA or pre-mRNA and when completely processed it is termed mature mRNA.The pre-mRNA/pre-cursor mRNA has both introns and exons of the DNA template strand.
After the DNA has been copied into the mRNA, the introns are removed by splicing and only the exons of the DNA strand are retained on the mRNA.This mRNA is termed reduced mRna.A process called 5 ′ capping occurs immediately after transcription commences.A modified guanine nucleotide is added to the front end or the 5 ′ end of the eukaryotic messenger RNA shortly after the start of transcription.The 5 ′ consists of a terminal 7 methylguanosine residue that is linked through a 5 ′ − 5 ′ tri-phosphate bond to the first transcribed nucleotide.5 ′ cap ensures recognition by the ribosome and protec- tion of the RNA molecule from ribonucleases (RNases).RNase is a type of nuclease which catalyses the degradation of RNA into smaller components.Transcription is represented in Fig. 6.
After transcription, polyadenylation occurs.Polyadenylation involves linking of the polyadenylyl moiety to the messenger RNA molecule.The polyadenylyl moiety is attached to the 3 ′ end of the mRNA molecule.As nucleic acid molecules are read from the5 ′ to the3 ′ end, polyadenylation is also termed tailing Polyadenylation is important for the termination of transcription, export of the mRNA from the nucleus and its translation to protein.Once transcription is terminated, mRNA chain is cleaved through the action of an endonuclease complex associated with the RNA polymerase.
The next step is transportation of the mature mRNA from the nucleus to the cytoplasm.Transportation is controlled by different signalling pathways.Once in the cytoplasm, the mature mRNA is translated into protein by the ribosome.Translation is the process by which a protein is synthesised from the information contained in a molecule of messenger RNA (mRNA).During translation, an mRNA sequence is read using the genetic code, which is a set of rules that defines how an mRNA sequence is to be translated into the 20-letter code of amino acids, which are the building blocks of proteins.Translation, represented in Fig. 7, takes place in specialised cellular structures called ribosomes.
Ribosomes are the sites at which the genetic code is actually read by a cell.The ribosome is a complex molecule made of ribosomal RNA (rRNA) molecules and proteins that form a factory for protein synthesis in cells.The ribosome translates each codon, or a set of three nucleotides, of the mRNA template and matches it with the appropriate amino acid.The amino acid is provided by the transfer RNA (tRNA) molecule.Transfer ribonucleic acid (tRNA) is a type of RNA molecule that helps decode a messenger RNA (mRNA) sequence into a protein.tRNAs function at specific sites in the ribosome during translation.Proteins are built from smaller units called amino acids, which are specified by threenucleotide mRNA sequences called codons.Each codon represents a particular amino acid, and each codon is recognised by a specific tRNA.The tRNA molecule has a distinctive folded structure with three hairpin loops that form the shape of a three-leafed clover.One of these hairpin loops contains a sequence called the anticodon, which can recognise and decode an mRNA codon.Each tRNA has its corresponding amino acid attached to its end.When a tRNA recognises and binds to its corresponding codon in the ribosome, it transfers the appropriate amino acid to the end of the growing amino acid Next, the large ribosomal subunit binds to form the complete initiation complex.During the elongation stage, the ribosome continues to translate each codon in turn.Each corresponding amino acid is added to the growing chain and linked via a peptide bond.Elongation continues until all of the codons are read.Finally, termination occurs when the ribosome reaches a stop codon (UAA, UAG and UGA).Since there are no tRNA molecules that can recognise these codons, the ribosome recognises that translation is complete.The new protein is then released, and the translation complex comes apart.

Noncoding RNA
The central dogma of molecular biology has exerted a substantial influence on our understanding of the genetic activities in cells.Based on the central dogma, the prevailing assumption in the past was that genes are basically repositories for protein-coding information and that proteins are responsible for most of the important biological functions in all cells.Thus, RNA was seen as a passive intermediary that bridges the gap between DNA and protein.Examples of RNAs that do not directly participate in protein formation are tRNA and rRNA; they have other functions in the formation of protein.These two molecules could be called the classical functional ncRNA molecules [9].These are the most ubiquitous noncoding RNA species in the genome and these structural RNAs are highly evolutionarily conserved, and occur in all known forms of life [9,10].
Little was known about other functional RNAs.Besides tRNA and rRNA, functional RNAs were considered to be very rare.This view underwent a drastic change in the last decade of the twentieth century when screening of various genomes identified a wide variety of noncoding RNAs (ncRNAs) [11].There are many functional RNA molecules that do not directly take part in protein coding, but have other regulatory functions [12].These facts were not known and a majority of the genome was regarded as junk mainly because it was not well understood.It has come to light that these junk portions of the genome hold the keys to the functions that are vital to life including alternative splicing, control of epigenetic variations, etc. [11].
Francis Crick proposed the existence of adaptor RNA molecules that were able to bind to the nucleotide code of mRNA, thereby facilitating the transfer of amino acids to growing polypeptide chains [13].The work of Hoagland et al. (1958) confirmed that a specific fraction of cellular RNA was covalently bound to amino acids.Later, the fact that rRNA was found to be a structural component of ribosomes suggested that, like tRNA, rRNA was also noncoding.In addition to rRNA and tRNA, a number of other noncoding RNAs exist in eukaryotic cells.These molecules assist in many essential functions, which are still being enumerated and defined.As a group, these RNAs are frequently referred to as small regulatory RNAs (sRNAs).These regulatory RNAs exert their effects through a combination of complementary base pairing, complexing with proteins and their own enzymatic activities.RNAs can interact with other RNAs and DNAs in a sequence-specific manner and they are very relevant in tasks that require highly specific nucleotide recognition [12,14].Micro RNAs (miRNAs) that regulate gene expression, small interfering RNAs (siRNA) that take part in RNA interference (RNAi) pathways for gene silencing are just two examples [15][16][17][18].Micro RNA (miRNA), small nuclear RNA (snRNA) and small nucleolar RNA (snoRNA) are other examples of this category of small ncRNA.Functions of ncRNA include transcription, control of translation, translocation, RNA processing and Fig. 7 Translation a tRNA with the START anticodon binding onto the mRNA to initiate translation modification, chromosome replication, to name a few [19,20].
The noncoding RNA sequences which have been called long noncoding RNAs are also described in this article.These are functional molecules like the small noncoding RNA [21].Long ncRNA sequences are transcribed from the nongene region of the DNA and they have many implications in cellular development and in diseases.Regulatory role of many classes of ncRNAs is broadly recognised; however, long intronic ncRNAs have received little attention.However in the past decade, it has come to light that intronic regions are key sources of regulatory ncRNAs [21][22][23].Most of the eukaryotic genome is transcribed, yielding a complex network of transcripts that includes tens of thousands of long noncoding RNAs with little or no protein-coding capacity.Initially, these were thought to be transcriptional noise but it is now clear that a significant number of these long noncoding transcripts have cell type-specific expression, localisation to sub-cellular compartments and are associated with human diseases [24,25].
A large number of noncoding RNA molecules have been identified in organisms and the list is growing constantly.Many of the newly discovered ncRNAs could not be assigned a function.In the rare cases when the function is known, the underlying molecular mechanisms are often poorly understood.In this article, an overview of the current knowledge on ncRNAs is presented.It is also to be noted that studies have shown that it is difficult to unequivocally classify RNAs as protein coding or noncoding [26,27].Protein-coding and noncoding transcripts may overlap, certain transcripts can function intrinsically at the RNA level and also code for proteins.Such facts lead us to conclude that the functionality of any transcript should not be discounted at the RNA level.

Some commonly studied RNA molecules
Figure 8 gives a brief look at the various types of RNA molecules that are involved in transcription and translation.During transcription, DNA is used as a template to produce an RNA transcript as shown.

rRNA
Ribosomal RNA (rRNA) is a noncoding RNA molecule which could be called the classical ncRNA [23].It was discovered during 1951-1965 right during the inception of Molecular Biology and hence is a well-studied one.Though length of rRNA molecules is in the region 600-900 nucleotides, they still are studied along with small noncoding RNA [12].The functions of rRNA are largely dependent on the tertiary 3D structure which in turn is derived from the secondary structure [19,28].miRNA Micro RNA (miRNA) is a small noncoding RNA molecule which has about twenty-two nucleotides which are found in animals, plants and certain viruses [16,29].These molecules negatively regulate gene expression post-transcriptionally [30].Many genes in eukaryotic cells are silenced by not being transcribed into the mRNA.But in some other cases, even transcribed genes are silenced by post-transcriptional mechanisms which prevent them from being translated.Micro RNAs (miRNAs) are one set of molecules that control translation.In simple words, miRNAs are produced by cleavage of double-stranded RNA arising from small hairpins within RNA which is mostly single stranded.miRNAs combine with proteins to form a complex that binds rather imperfectly to mRNA molecules and inhibits translation.miRNAs function by base-pairing with complimentary sequences within mRNA molecules.Once this happens, the mRNA strand splits into two pieces or gets de-stabilised because its polyadenylyl tail (at the 3 ′ end) gets shortened.In some other situations, silencing of the mRNA occurs by its lesser efficient translation into proteins by the ribosomes [15,16,29].Besides post-transcriptional control of gene expression, micro RNAs have been found to play crucial roles in cancer, metabolic diseases, viral infections and so on [31,32].This means that miRNAs represent a class of molecules which have the potential to be used as drug targets for these diseases by therapeutic modulation of their activity.Different miRNAs have been found to be deregulated in kidney and bladder cancers [33].These are a class of small RNA molecules which are found within the splicing speckles and Cajal bodies of the cell nucleus in eukaryotic cells.snRNA is also referred to as U-RNA (U for Uridine rich).Uridine is a glycosylated pyrimidine-analogue containing uracil attached to a ribose ring.The length of snRNA molecules is around 80 to 350 nucleotides [34] in higher eukaryotes (Fig. 9).They are transcribed by either RNA polymerase II or RNA polymerase III, and studies have shown that their primary function is in the processing of pre-messenger RNA in the nucleus.They have also been shown to aid in the regulation of transcription factors (7SK RNA) or RNA polymerase II (B2 RNA), and maintaining the telomeres [35].
Biochemical, genetic and structural evidences suggest that snRNAs are the key components of the catalytic centre of the spliceosomes [34].snRNAs form complexes with proteins to form snRNPs (small nuclear ribonucleoproteins) which are made use of in the unspliced primary RNA (the pre-mRNA) transcript to form spliceosomes.Thus, small nuclear RNAs are essential parts of spliceosomes.

snoRNA
Small nucleolar RNAs (snoRNAs) are a class of small ncRNAs that carry out a fundamental role in modification and processing of ribosomal RNA.They guide sitespecific rRNA modification [36] and their function is similar in all types of organisms they are found in.These molecules have been found to have extensive similarities with other types of small noncoding RNA, in particular miRNA.snoRNAs can be roughly divided into two classes depending on their content of one of the two conserved structural elements known as the C/D and H/ACA boxes; the two classes are the box C/D and box H/ACA snoRNAs, that function differently in rRNA maturation.Generally, C/D box snoRNAs are ∼70-120 nucleotides (nt) and guide the methylation of target RNAs, while H/ ACA box snoRNAs are ∼100-200 nt and guide pseudou- ridylation [35].Besides site-specific rRNA modification, snoRNAs also target spliceosomal rRNA [37].
As already mentioned, it is now been accepted that the noncoding genome is a region which contains many functional surprises.But the intronic area within genes too has their own surprises.snoRNAs have been found to be coded from intronic regions of eukaryotes [38].The sequences encoding H/ACA and C/D box snoRNAs are generally located in introns of their host gene, in the same orientation.One intron usually carries one snoRNA gene, but a host gene can carry several snoRNA genes in different introns.Intronic snoRNAs are produced by exonucleolytic degradation of the de-branched lariat after splicing (Fig. 10).The stable (which goes on to form protein) part is protected by the binding of snoRNP core proteins, and/or of ancillary proteins, to the pre-mRNA [38].snoRNAs are further processed into smaller molecules similar to miRNA some of which display functionality.Small nucleolar RNAs (snoRNAs) guide RNA modification and are localised in nucleoli and Cajal bodies in eukaryotic cells.Components of the RNA silencing pathway associate with these structures, and studies reveal that a human and a protozoan snoRNA can be processed into miRNA-like RNAs [39].

Long noncoding RNA
The possibility of the genome having a noncoding component was not even thought of in the early days of genome studies.But with the completion of the human genome project in 2003, the number of protein-coding genes has come down to around 20,000 from the estimated 60,000 in the mid 1990 s.With screening of genomes of different organisms, it is becoming more clear that noncoding transcripts are vital to almost all stages of biogenesis of the cell.
Long noncoding RNA molecules, in simple terms, (long ncRNAs, lncRNA) are non-protein-coding transcripts longer than 200 nucleotides [23,40].This is more or less an arbitrary feature to distinguish long ncRNAs from small regulatory RNAs such as microRNAs (miRNAs), short interfering RNAs (siRNAs), small nucleolar RNAs (snoRNAs) and other short RNAs.They are an abundant component in the human transcriptome and have been implicated in several cellular functions, including the regulation of gene transcription [39] and the development of tissues [41][42][43][44] they have also been implied in many diseases [45][46][47][48].
These molecules have gained much attention in the recent years as a possible new layer of biological regulation and their possible roles in drug discovery are also explored [49][50][51].It could be said that a new lncRNA is found to be up-or down-regulated in a particular disease almost on a weekly basis.Though a wide range of lncR-NAs have been implicated in a range of developmental processes and diseases, much is to be learnt yet about their mechanism of action.They have been found to be a diverse class of RNAs that engage in numerous biological processes across every branch of life [52,53].In this section, we discuss an overview of the salient features, and functions of lncRNA which are known.

Discovery of long ncRNA
Even in the genomics era, lncRNAs continue to remain more or less in the dark due to their low expression levels, their presence in specific cell types only or their existence during narrow time frames [54,55].These molecules were identified as a class of RNA molecules in 2002 [56] even though some lncRNAs such as H19 and Xist have been known since the early 1990 s [57,58].Long ncR-NAs are usually transcribed by RNA polymerase II (Pol II), spliced and mostly polyadenylated [59,60].They are thought to be comparable to protein-coding genes but with seemingly low coding potential [23,61].The development of novel DNA sequencing technologies [62,63] has revolutionised our understanding of the human genome and its transcriptome complexity.Only 1-2% of the whole genome codes for proteins and it is understood that 80% of the remaining is actively transcribed [59,64].
These noncoding portions of the genome produce a variety of regulatory RNAs which have been found to range from miRNA to the heterogeneous category of long noncoding RNA.Long noncoding RNAs have been found to differ in their biogenesis, properties and functions [65,66].

The human genome and lncRNA
In the study of vertebrate genome, thousands of genes that code for lncRNAs have been identified.Eukaryotic genomes transcribe [67] a wide spectrum of RNA molecules which include long protein-coding mRNAs to short noncoding transcripts.
2024 January version of the GENCODE project release 45 has annotated a total of 63,187 genes, out of which 19395 are protein-coding genes (30.694% ), 20424 are long noncoding RNA genes (32.3231% ), 7565 are small noncoding RNA genes (11.9723% ), 14719 are pseudogenes (23.2943% ), and 648 are immunoglobulin and T cell receptor genes (1.0255% ).A pie chart representing the coding and noncoding components of the human genome based on statistics available in GENCODE version 45 is shown in Fig. 11.It is very evident that long noncoding RNA genes are the single largest entity in the human genome as we know it of date, and thereby warrants deeper study.
One of the striking observations made from transcriptome studies is that a much larger fraction of the genome is represented as exons in mature RNAs than what would be predicted from the amount of DNA covered by exons of protein-coding genes.Long ncRNAs are the major component of this all-encompassing transcription [67].Early studies revealed that only around 5 % -10% of the human genome is accounted for, by mRNA sequences and spliced noncoding RNAs that are transcribed in cell lines.It means that only around 1 % of the human genome encodes proteins, leaving around 4 % -9% that is tran- scribed but whose functions are largely unknown [67].Recent studies suggest that out of the human genome transcribed, only 2 % accounts for protein-coding exons [68].The exonic portion of human lncRNAs accounts 1 % of the genome which is about the same amount of DNA as protein-coding exons [25].
Evolutionary studies prove that there is a large amount of apparently functional, yet noncoding DNA contained in the human genome, the volume of which was estimated to be four times the amount of protein-coding sequences [69].Long noncoding RNAs (lncRNAs) are a part of these functional yet noncoding sequences.Mammalian genomes have been found to contain thousands of loci that transcribe long ncRNAs [70].Although there have also been claims that almost the entire mammalian genome is transcribed into functional noncoding transcripts, such claims still remain contentious [23,25].
Long ncRNAs are implicated as gene regulators and maybe they are more numerous than protein-coding genes in the human genome.However, they have lower and tighter tissue-specific expression compared to mRNAs and hence their reference annotations are incomplete [71].lncRNAs are found abundantly in the human genome [54] and in other vertebrates and plants.

Why study long noncoding RNA?
lncRNAs have attracted much attention with the availability of increasing evidences that these molecules play critical roles in multiple processes.Epigenetic regulation, chromatin modelling, gene transcription, protein transport, protein trafficking, cell differentiation, organ or tissue development, cellular transport, metabolic processes and chromosome dynamics are just a few examples [47,[72][73][74].
A brief discussion of the various functions of lncRNA in disease and development is presented in this section.Long ncRNAs are functionally heterogeneous.They interact with DNA, proteins and other RNAs to take part in all processes from transcription, and intracellular trafficking to chromosome remodelling [47,51,73].It has been observed that lncRNAs control complex cellular behaviours like growth, differentiation and establishment of cell identity which are often deregulated in cancers [74][75][76][77].

Involvement of lncRNAs in transcriptional and post-transcriptional modification, and other stages of cell biogenesis
Many lncRNA act as key regulators of transcription and translation and thus influence cell identity and function to a great extent [78][79][80].lncRNA targeting mechanisms are diverse.Based on these mechanisms, lncRNAs may play critical regulatory roles in diverse cellular processes such as chromatin remodelling, transcription, posttranscriptional processing and intracellular trafficking [81][82][83][84].A few examples of possible lncRNA targeting mechanisms are represented in Fig. 12.
lncRNAs have been known to be involved in post-transcriptional regulation.That is, regulation of gene expression after transcription of the DNA into the mRNA molecules has occurred.As already discussed, miRNA are molecules which are involved in gene post-transcriptional expression.miRNAs combine with proteins to form a complex that binds rather imperfectly to mRNA molecules and inhibits translation.Certain long ncRNAs have been found to interfere with the microRNA pathways involved in different cellular processes [72].
Epigenetics is the study of potentially heritable changes in gene expression (active versus inactive genes) that does not involve changes to the underlying DNA sequence, a change in phenotype without a change in genotype, which in turn affects how cells read the genes.Epigenetic change is a regular and natural occurrence but can also be influenced by several factors including age, the environment/lifestyle and disease state [85].In general the term refers to modifications within a cell due to variation in gene expression focussing on genome or proteome of one cell [86].Epigenetic control is thought to occur at the chromatin-level [87][88][89].Chromatin is the combination of DNA and proteins which together make up the cell nucleus [90].Chromatin is in charge of DNA packaging, gene expression and DNA replication [91,92].lncRNAs have been found to interfere with acetylation, methylation and SUMOylation of histones.(Histones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into structural units called nucleosomes).
Through their interactions with DNA, RNA and protein molecules or their combinations, lncRNA acts as an essential regulator in chromatin organisation, transcriptional and post-transcriptional regulation.And aberrations in their expressions have been seen to confer pro-cancer features to the cell [93].

lncRNAs and cancers in humans
A prominent role of lncRNAs is in the development and progress of cancers, which makes the study of these sequences quite relevant [94,95].A wide variety of lncRNAs have been implied in cancers.Figure 13 shows examples of various lncRNAs which are implicated in different types of cancer.lncRNAs have different expression levels in cancerous tissues as compared to normal tissues.lncRNAs MALAT1 (Metastasis associated lung adenocarcinoma transcript 1) and PCA3 (Homo sapiens prostate cancer-associated 3 (non-protein coding) RNA) are examples of some of the lncR-NAs which were identified to be associated with cancer in the initial days [56].
Six properties required for cell transformation have been termed the six hallmarks of cancer by Hanahan and Weiberg in 2000 [96].These properties are (1) selfsustained growth signalling, (2) insensitivity to growth inhibition, (3) avoidance of apoptosis (the death of cells which occurs as a normal and controlled part of an  5) angiogenesis (the development of new blood vessels) and ( 6) metastasis (the spread of cancer from one organ/part of the body to another organ/ part without being directly connected with it) [96,97].lncRNA molecules have been found to play regulatory roles in most of these functions [98].A simplified, brief discussion of the involvement of lncRNAs in the 'hallmarks of cancer' is given below.[103].Some lncRNAs counter growth inhibition by regulating the expression of tumour suppressors by influencing various stages of their transcription and translation.lncRNAs like gadd7 and cdk6 modify transcription elongation by destabilisation of mRNA transcripts [104].Transcript stability and translation are also found to be influenced by lncRNA molecules such that repression caused by miRNA is inhibited, and a tumour promoter protein is formed [105].• Apoptosis is controlled cell death maintained in all healthy tissues for the control of carcinogenesis (initiation of cancer formation).lncRNAs have been found to inhibit apoptosis, aiding carcinogenesis and, in some cases, aiding apoptosis of tumour suppressor proteins, which again helps carcinogenesis [106].• Cancer cells have limitless replication potential.This is achieved in cancer cells by maintaining long tel-omeres (a region of repetitive nucleotide sequences at each end of a chromosome) as nucleoprotein structures that stabilise the ends of chromosomes.
In the natural process, telomeres shorten in dividing cells.Hence, there is a need for a ribonucleoprotein complex telomerase in order to elongate telomeric repeats through reverse transcription of an internal template RNA.The shortening of telomerase induces the production of lncRNA molecules called TERRA (telomeric repeat-containing RNA) [107].When TERRA is activated, they recruit protein complexes, which bring about homology-directed repair of the shortened /damaged telomeric sequences [108].• lncRNAs have been found to regulate nutrient supply to tumours by regulating transcription of the protein VEGF (vascular endothelial growth factor) that is essential for the formation of blood vessels.lncRNAs HOTAIR (HOX transcript antisense RNA) and MIAT (myocardial infarction-associated transcript) have been found to regulate the transcription of VEGF [109,110].LncRNA MALAT1 (metastasisassociated lung adenocarcinoma transcript 1), when expressed in endothelial cells, has been found to promote angiogenic sprouting and migration [111] in cancer of the lung.• Many lncRNAs have been implied in the invasive nature of cancer cells that promotes metastasis.MALAT1 in colorectal and nasopharyngeal carcinoma [112], and CCAT2 (colon cancer-associated transcript 2) in lung cancer are just some examples of lncRNA involvement in cancer metastasis.

Involvement of lncRNAs in other diseases in humans
Some of the diseases besides cancer with which lncRNAs have been found to be associated include cardiovascular diseases, neurological disorders, diabetes, AIDS, Alzheimer's disease (AD), cardiovascular diseases and neurodegenerative diseases.The comprehensive lncRNA-Disease database (http://www.cuilab.cn/lncrnadisease)gives more than 200 lncRNA-disease associations.There are more than 200 diseases associated with various lncR-NAs and more than 300 lncRNAs playing critical roles in various complex human diseases [113].One lncRNA is found to be involved in many types of diseases due to the functional diversity of these molecules.For example, the lncRNA MALAT1 is associated with cancer of the lung, cancer of the colon and nasopharyngeal cancers.However, the general features of most lncRNAs, such as structure, transcriptional regulation, functions and molecular mechanisms in the biological processes, still remain largely unknown, and so, annotations of their disease associations are not complete [114,115].A brief overview of a couple of known diseases which have high social relevance in the current times and their lncRNA associations is presented here.
1. Alzheimer's disease (AD) causes dementia or shortterm memory loss, which is rapidly increasing among all populations throughout the world.AD is a chronic, progressive neurodegenerative disorder which is caused by the loss of synapses between neurons in specific brain regions (such as the CA1 region of the hippocampus [116,117].Research has shown that the lncRNA BACE1-AS (beta-secretase 1 antisense) is involved in AD [118].toma variant translocation 1) has been found to be linked with the development and progress of diabetic nephropathy [127], besides its active involvement in breast and ovarian cancers [128].Pvt1 is the oncogene which codes for lncRNA of the same name.
(Homo sapiens Pvt1 oncogene (non-protein cod-ing), long noncoding RNA.NCBI accession number of the lncRNA is NR _ 003367.3)lncRNAs have been found to have roles in neurodegenerative disorders [129,130] and brain development [131].Studies of patients with alcohol addiction reveal up-regulated MALAT1 in the cerebellum, hippocampus and brain stem [132], which suggests that the lncRNA network may have key roles in neurodegenerative processes in Huntington's disease [133].Long ncRNAs have been found to be involved in many other diseases as well.
Discussing each case is beyond the scope of this article.What is intended in this paper is to convey to the reader why the study of ncRNA, in all its aspects is beneficial to humanity.

Conclusion
The noncoding gene which was largely ignored in the initial days of molecular biology has come to the centre space after the prime role it occupies in the various stages of the biogenesis of organisms has come to light.The noncoding RNA molecules which were known from the early days of molecular biology are molecules like tRNA and rRNA.The central roles of these molecules in the formation of proteins are well understood.The noncoding portion of the eukaryotic genome has been found to be responsible for the formation of a large variety of noncoding RNA molecules.Small noncoding RNA sequences like miRNA, siRNA, snRNA and snoRNA, which were discovered later have been found to play pivotal regulatory roles in protein formation as well.An indepth study of these molecules is vital in understanding the gene expression and suppression which controls the occurrences of genetic diseases.Aberrations or mutations in these molecules which inhibit their proper functioning could also lead to pathologic situations which are not genetic but confined to the individual in whom this aberration occurs.Genomic studies have demonstrated that although less than 2 % of the mammalian genome encodes proteins, at least two-thirds are transcribed.The noncoding portion of the genome, especially the human genome, encodes another wide range of noncoding RNA molecules which are called long ncRNA.These were dismissed as transcriptional noise even in the genomic era.However, they have been found to play critical roles in various biological processes.These molecules have been found to act as regulators at different levels of gene expression including chromatin organisation, transcriptional regulation and post-transcriptional control.This means that long ncR-NAs control all stages of cell biogenesis and have critical roles in development and diseases.As much as they are vital to the development, evidence from research proves that mutations and dysregulations of these long ncNA molecules are linked to diverse human diseases ranging from neuro-degeneration to cancers.From these facts, it is evident why the study of such molecules is important.The study of noncoding RNA molecules is central in molecular biology today and they are immensely researched in drug discovery too.
The intention of writing this review paper was to bring to the reader the myriad roles played by long noncoding RNA ranging from cell genesis to disease development with special stress to the involvement of lncRNAs in cancer.The terrain of study of the genome and hence noncoding RNA is a highly dynamic one which means that the statistics and functions of the subject of study are also dynamic and writing a comprehensive review was a fulfilling yet challenging task and hence it has its limitations.

Fig. 1
Fig. 1 The double-helical structure of DNA Fig. 2 DNA sequence, introns and exons

Fig. 8
Fig. 8 RNA molecules involved in gene expression/suppression

Fig. 11
Fig. 11 Representation of coding and noncoding components of the human genome, based on values in GENCODE v45