Regulation of lncRNA expression

Long non-coding RNAs (lncRNAs) are series of transcripts with important biological functions. Various diseases have been associated with aberrant expression of lncRNAs and the related dysregulation of mRNAs. In this review, we highlight the mechanisms of dynamic lncRNA expression. The chromatin state contributes to the low and specific expression of lncRNAs. The transcription of non-coding RNA genes is regulated by many core transcription factors applied to protein-coding genes. However, specific DNA sequences may allow their unsynchronized transcription with their location-associated mRNAs. Additionally, there are multiple mechanisms involved in the post-transcriptional regulation of lncRNAs. Among these, microRNAs might have indispensible regulatory effects on lncRNAs, based on recent discoveries.


INTRODUCTION
In recent years, thanks to in-depth analyses of the genome and transcriptome, a set of transcripts termed long non-coding RNAs (lncRNAs) have been revealed to play key roles as epigenetic modulators. They range from 200 bp to several kilobases in size, but they possess little ability to code proteins. For the most part, the transcription and processing of most lncRNAs appear to be similar to those of protein-coding genes, including RNA polymerase II transcription, 5'-capping, poly-adenylation and alternative splicing [1,2]. In addition, their regulatory roles in maintaining an active or inactive state of gene expression are acquired through interactions with chromatin-modifying proteins or transcription factors with their specific protein-binding motifs. They also directly bind to the DNA sequence that is complementary with them to form an RNA-DNA triplex structure that can block the transcriptional process [3]. It has been reported that this triplex structure can cause the methylation of a specific sequence in the promoters of the target genes. The act of lncRNA transcription itself can even trigger the expression of genes nearby by providing a more "open" chromatin domain [4]. It can also silence nearby genes by occupying the transcription factor binding sites [5]. Increasing evidence indicates that lncRNAs almost influence every step of the life cycle in a cis-or trans-manner. The lncRNA HOTAIR, which is derived from HoxC cluster, is responsible for transcriptional silencing of the HoxD cluster in a trans-manner. Relying on its specific protein-binding motif, it serves as a vehicle to guide polycomb repressive complex 2 (PRC2), a chromatin repressor, to the HoxD cluster and to other additional sites around the genome, and then facilitates the H3K27me3 of these genes to be transcriptionally silenced [6]. This helps to maintain the distal and posterior identities of cells. By contrast, the lncRNA Airn is a canonical member that is involved in paternal-specific silencing in a cis-manner. Airn is expressed from the imprinted loci and is required for silencing several paternal genes, including Slc22a2, Slc22a3 and Igf2r, which are in a 400-kb region adjacent to Airn [7]. This occurs via various mechanisms. Interestingly, some lncRNAs can fully realize their functions in a cis-and transmanner simultaneously. For instance, there is an lncRNA derived from the upstream of the promoter of the human DHFR locus. It can not only bind the DHFR promoter to form an RNA-DNA triplex structure but also bind the TFIIB directly to prevent the formation of a preinitiation complex, thereby interfering with DHFR transcription [8].
The lncRNAs play such important roles in epigenetic modulation that their dysregulation is generally associated with some kind of disease. Emerging data indicate that many lncRNAs are involved in tumor biology because abnormal lncRNA profiles have been found in tumors. The lncRNA HEIH was found to keep higher transcript levels in hepatitis B virus-related hepatocellular carcinoma (HCC), and was proven to be oncogenic for HCC [9]. The lncRNA JADE, another novel ncRNA, has markedly higher levels in human breast tumors compared with normal tissues. Moreover, knockdown of this non-coding gene significantly inhibited breast tumor growth in vivo [10]. Furthermore, the lncRNA DQ786227 was suggested to mediate cell malignant transformation induced by benzopyrene [11]. Simultaneously, aberrant transcript profiles of lncRNAs were observed in non-neoplastic diseases, such as ventricular septal defect and acute renal rejection [12,13]. Recently, the majority of studies have focused on the mechanisms of lncRNA regulation of their target coding genes. However, as lncRNAs are potential therapeutic targets, it is of equal importance to examine the regulatory mechanism of the lncRNA expression. This review attempts to discuss the characteristics of lncRNA expression and to elaborate on their possible regulatory mechanisms.

THE EXPRESSION CHARACTERISTICS OF lncRNAs
Important biological function does not mean high expression levels. On the contrary, lncRNAs show far lower expression levels than protein-coding genes, and they express in a cell and tissue type -specific fashion [14,15]. Both of these properties give an inadequate picture of the regulatory role of lncRNAs in the life cycle. Interestingly, lncRNAs have a similar ability with protein-coding genes to respond to environmental and developmental conditions [14,[16][17][18]. However, the conservation or evolution of lncRNAs is a much-debated issue. The conservation of lncRNAs has four aspects: their sequence, structure, function, and expression from syntenic loci. Numerous functional lncRNAs certainly indicate multifaceted levels of conservation [19,20], although the wellstudied XIST shows little sequence conservation throughout the eutherian lineage [21]. Intriguingly, some researchers reported that the promoters of lncRNAs are almost as conservative as those of protein-coding genes [22][23][24]. We assume that it may not be necessary to keep full-length sequences conserved because the partial sequences or local spatial structures of lncRNAs are enough to exert their crucial biological functions.

THE CHROMATIN STATE REGULATES THE EXPRESSION OF lncRNAs
Chromatin state, including the crosstalk between DNA methylation and histone modification, is one of the most important mechanisms to directly modulate differential gene expression profiles depending on the cell type. DNA methylation is a stable and heritable epigenetic mark. By contrast, histone modification is identified as a fast-changing mark by which the cell regulatory network can quickly change the transcriptional state of the genes twining around it. For example, genes that express in embryonic stem cells and contain both active and repressive histone methylation, such as H3K4me3 and H3K27me3 (termed the bivalent chromatin state), appear to be more flexible to acquire an active or silent state during differentiation and development [25]. Apart from core histone methylation, histone acetylation also plays an important role in the transcriptional regulation of gene expression. Histone acetylation prevents chromosomes from super-condensing, and therefore facilitates the expression of the genes twining around it. Some researchers demonstrated that the histone deacetylase inhibitor was responsible for reactivating methylated genes in Neurospora crassa and Brassica napus [26,27]. Many lines of evidence have clearly demonstrated that a modulation of the chromatin state on the dynamic expression of lncRNAs occurred during diseases. An lncRNA named maternally expressed gene 3 (MEG3) was markedly downregulated because of the hypermethylation in its promoter in four human hepatocellular carcinoma cell lines. Interfering in the act of DNA methyltransferase (DNMT) would lead to an increase in the expression of MEG3 ( Fig. 1A) [28]. Moreover, the methylated cytosine found within or near the functionally important regions of both HOTAIR and XIST implied its importance for lncRNAs to exert their biological functions [29]. Simultaneously, histone deacetylation is involved in lncRNA silencing. AK055007 (lncRNA LET), which is generally downregulated in hepatocellular carcinomas, colorectal cancers and squamous cell lung carcinomas, has been verified to be repressed by hypoxia-induced H3 and H4 deacetylation in its promoter region, mediated by histone deacetylase 3 (HDAC3) (Fig. 1B) [17]. The lncRNA uc002mb.2, which was observed to be significantly upregulated (~300-fold) in liver cancer cells after treatment with a histone deacetylase inhibitor (trichostatin A) was indicated to be restrained by histone acetylation [30]. The distinct properties of lncRNAs, their low expression levels, and their cell or tissue type specificity are directly related to DNA methylation and histone modification. A large proportion of lncRNAs are transcribed from promoters with low CpG dinucleotide contents. Based on the CpG dinucleotide content, mammalian promoters can be categorized into two classes, low CpG (LCG) and high CpG (HCG) [31]. Methylation occurs almost exclusively at the cytosine in CpG dinucleotides, while methylated cytosines usually undergo rapid mutation to become thymines. As a result, CpG dinucleotides are depleted from methylated regions during the process of evolution. LCG promoters originate from the mutational decay of CpG dinucleotides following DNA methylation and the remaining HCG promoters must possess specific mechanisms to avoid being methylated in a cell or tissue type-specific manner. In other words, the content of CpG dinucleotides is an indicator of the level of DNA methylation. LCG reflects hypermethylation and HCG reflects hypomethylation. Generally, promoter hypermethylation represses gene expression, so genes that belong to the LCG class may have lower expression levels than genes that belong to the HCG class. Additionally, the CpG dinucleotide content is negatively correlated with the potential of chromosome condensation [32]. A large proportion of lncRNAs are transcribed from LCG promoters and are thus usually expressed at low levels. On the other hand, merely 6.5% of LCG promoters are marked by H3K4me3 and there is almost no H3K27me3 in embryonic stem cells (ESC).  The majority of LCG promoters are not marked by H3K4me3 or H3K27me3 and they are associated with lower expression. In differentiated cells, the amount of H3K4me3 around the LCG promoters is lower, which explains the low expression levels of lncRNAs [33]. Notably, there are some promoters that can be marked by both H3K4me3 and H3K27me3 in ESC. These bivalent promoters tend to lose one of the marks in mature cell types or tissues, and therefore obtain their cell and tissue type-specific expression pattern [34]. Here, we conjecture that many lncRNAs derive from such bivalent promoters and take on a cell or tissue type-specific expression. Moreover, a large amount of large intergenic non-coding transcripts (lincRNAs) were identified in the intergenic K4-K36 domain, which is a distinctive structure that consists of a short region with histone H3 lysine 4 trimethylation (corresponding to the promoter) and a longer region with histone H3 lysine 36 trimethylation (corresponding to the transcribed region). These intergenic K4-K36 domains are usually associated with gene regions that are actively transcribed (Fig. 1C) [33,35]. In conclusion, the chromatin state determines the low and specific expression of lncRNAs and promotes the generation of lincRNAs. Additionally, the aberrant expression of lncRNAs in many diseases is due to the abnormal chromatin state.

lncRNAs MAY SHARE COMMON TRANSCRIPTION FACTORS WITH PROTEIN CODING-GENES
Recently, by mapping the binding sites for three DNA binding transcription factors, SP1, cMyc, and p53, along chromosomes 21 and 22 in an unbiased approach, researchers found that 36% of these transcription factor binding sites (TFBS) were located within or immediately 3' termini to well-characterized protein-coding genes. They were significantly correlated with non-coding RNAs [18]. Moreover, in the rat cerebral cortex, the promoters of stroke-responsive long non-coding genes and their homologous protein-coding genes showed highly overlapping transcription factor binding sites [36]. These two researches supported the possible sharing of common transcription factors between proteincoding and non-coding genes. Furthermore, the SP1 motif GGGGCGGGGT is abundant in bidirectional promoters that are the major source of non-coding RNAs in mammals [37]. This indicates that SP1 may be a crucial transcription factor for lncRNAs expressed from bidirectional promoters. RP1-473L15.2 and ENST00000513542 are two lncRNAs that are dysregulated in fetal cardiac tissues with ventricular septal defect. The prediction of TFBS indicated that they possessed motifs of SRF (serum response factor) and AP-1 (activating protein-1), respectively [12]. AK019103 was a novel lncRNA in regards to DNA damage. In its promoter region, there were five binding sites for transcription factor NF-κB. Significant inhibition of AK019103 was discovered to repress the activity of NF-κB [10]. Also, several experimental validations indicated that core transcription factors, such as p53, NF-κB, Sox2, Pou5f1 and Nanog, were sufficient to drive expression of several lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation [34]. It is quite evident from the above analysis that lncRNAs share the common key transcription factors with protein-coding genes.

A SPECIFIC DNA SEQUENCE REGULATES THE EXPRESSION OF pancRNAS
Although considerable evidence shows that lncRNAs may be controlled by transcriptional regulatory machinery similar to protein-coding genes, they must be bound to possess unique expression manners at the same time, because unsynchronized expression between protein-coding genes and their locationassociated non-coding genes always happens. The lncRNAs are transcribed within intergenic or intragenic regions in either a sense or antisense orientation (Fig. 2) [24], and a subset known as pancRNAs (promoter-associated ncRNAs derived from bidirectional promoters) is believed to have additional expression regulatory pattern. Bidirectional promoters represent approximately 10% of the genes in the human genome and 94% of them exhibit a strong bias for CpG islands (92.3% in mouse) [38]. A small proportion of lncRNAs are expressed from bidirectional promoters [39]. They initiate their transcription less than 1000 bp from the transcription start site (TSS) of protein-coding genes in the antisense orientation. This type of lncRNA is termed promoter-associated ncRNAs (pancRNAs). The transcription of pancRNAs is always prior to that of the paired protein-coding genes. It was reported that these pancRNAs activated the expression of the adjacent protein-coding genes via sequence-specific DNA demethylation. However, not all mRNAs are under pancRNA regulation, because some highly expressed mRNAs are not associated with corresponding antisense ncRNAs. Considering that CpG islands are the significant marks of bidirectional promoters, it is possible that there are several signature sequences in the CpG islands that direct the expression of pancRNAs. Recently, researchers found that several CCG repeats were located between -100 and +100 bp in the promoter regions of pancRNA-bearing protein-coding genes in mouse tissues such as the cerebral cortex, cerebellum and heart. Correspondingly, several CGG repeats were found in the downstream regions from +300 to +400 bp. It was considered that the CCG and CGG repeats may play important roles in the regulation of pancRNA expression [37].

miRNAS SERVE AS MODULATORS OF THE TRANSCRIPTIONAL INITIATION OF lncRNAs
MicroRNAs (miRNAs) are a set of small non-coding RNAs that regulate gene expression in both transcriptional and translational layers. Recent studies indicated that miRNAs can serve as modulators of the promoter activity of long non-coding genes. For instance, miRNA-29 was reported to upregulate the expression of the lncRNA named maternally expressed gene 3 (MEG3) by inhibiting DNA methyltransferase (DNMT) from methylating the promoter of MEG3 [28]. Certainly, the modulation of miRNA-29 on MEG3 seems not beyond the scope of the change in chromatin state. The miRNAs contribute to the complexity of dynamic expression of lncRNAs during diseases (Table 1). Furthermore, miRNAs are involved in post-transcriptional regulation of lncRNA expression. They will be described afterwards.

POST-TRANSCRIPTIONAL REGULATION OF lncRNAs
The global measurement of the half-lives of lncRNAs in mouse has revealed that only a small percentage of lncRNAs are unstable. In addition, spliced lncRNAs are more stable than unspliced lncRNAs (single exon), cytoplasmic lncRNAs are more stable than nuclear lncRNAs, and antisense-overlapping lncRNAs are more stable than those transcribed from the intron [19]. Furthermore, the mRNA of Sirtuin1 (a NAD-dependent deacetylase involved in myogenesis) was reported to be less stable than its antisense lncRNA [39]. However, it is supposed that the stability of these lncRNAs may not allow them to respond to the outside stimulus rapidly. Recently, Yoon et al. summarized the mechanisms that contribute to the post-transcriptional control of lncRNA expression [40]. In their report, a triple helix formed by lncRNA MALAT1 and MENβ (multiple endocrine neoplasia β) protected these two transcripts from the damage of 3'-5' exonucleases [41,42]. Furthermore, lncRNAs were degraded by RNAbinding proteins and miRNAs, as exemplified by the RNA-binding proteins HuR and argonaute 2 (Ago2) and the miRNA let-7b, which were involved in the degradation of lincRNA-p21 [43]. Here, we give more examples to show the contributions of miRNAs and RNA-binding proteins to the lncRNA decay mechanism and discuss the effects of RNA editing and nucleotide modification. An increasing number of studies have shown that lncRNA-miRNA interaction is a key point in the transcriptional regulation network. The lncRNAs were believed to undergo miRNA-mediated post-transcriptional regulation [44]. A circular ncRNA transcribed from the antisense of cerebellar degenerationrelated protein 1 (CDR1) was found to be cleaved directed by miRNA-671, which was nearly perfectly complementary with its target ncRNA. It was enough to demonstrate that the instability of this circular ncRNA was directly related to miRNA-671 [15]. Furthermore, the well-characterized lncRNA HOTAIR was reported to be repressed by miRNA-141 in the presence of the Ago2 complex in human cancer cells [45]. Another study also demonstrated that lncRNA HOTAIR was degraded jointly by HuR, miRNA-let7i and Ago2 [46], and lncRNA MALAT1 was testified to be downregulated by has-miR-125b during bladder cancer development [47]. In many cases, miRNAs exert their function with RNA-binding proteins (Table 1). In a study focused on the relationship between lncRNA urothelial carcinomaassociated 1 (UCA1) and heterogeneous nuclear ribonucleoprotein I (hnRNP I), control siRNA and hnRNPI were transferred into MCF-7 cells, and then MCF-7 cells were administrated with RNA synthesis inhibitor (actinomycin D). In the last step, RNAs were isolated at several time points. The results showed that the level of UCA1 was remarkably lower in hnRNPI-siRNA cells at all time points, which was compared the state with the control siRNA. This implied that hnRNP I could stabilize the UCA1. Later it was found that the binding motif of hnRNP I was located near the 5'-end of UCA1. In conclusion, hnRNPI could bind to UCA1 and contribute to its stability [48]. Adenosine to inosine (A-to-I) RNA editing, in which adenosine is converted to inosine in dsDNAs, is the most common editing event in animals. It is one of the most important forms in post-transcriptional processing catalyzed by adenosine deaminase (ADAR). Numerous studies have demonstrated that lncRNAs can fold into secondary structures or complement with their target mRNAs to form dsRNAs [49] and certainly be the candidate substrates of ADAR. The edited lncRNAs may then undergo quite different processes, such as being degraded by Tudor-SN, which is similar to edited miRNAs [50]. Besides the four core nucleotides, lncRNAs contain more than 100 distinct modified nucleotides, most of which can be found in other short non-coding RNAs, such as tRNAs, rRNAs and small nuclear RNAs. The modified nucleotides contributed to the stability of these non-coding RNAs, including lncRNAs [51].

CONCLUSION
The lncRNAs are emerging stars that display powerful biological functions. The regulatory mechanism of their expression is similar to that of protein-coding genes -for example, they share the same transcription factors and chromatin modifications. However, specific location with coding genes determines their special means of regulation. Furthermore, abundant evidence indicates that lncRNAs undergo intricate post-transcriptional processing, especially the miRNA-lncRNA interaction, which considerably enriches the RNA regulatory network.