Abstract
Life may have begun in an RNA world, which is supported by increasing evidence of the vital role that RNAs perform in biological systems. In the human genome, most genes actually do not encode proteins; they are noncoding RNA genes. The largest class of noncoding genes is known as long noncoding RNAs (lncRNAs), which are transcripts greater in length than 200 nucleotides, but with no protein-coding capacity. While some lncRNAs have been demonstrated to be key regulators of gene expression and 3D genome organization, most lncRNAs are still uncharacterized. We thus propose several data mining and machine learning approaches for the functional annotation of human lncRNAs by leveraging the vast amount of data from genetic and genomic studies. Recent results from our studies and those of other groups indicate that genomic data mining can give insights into lncRNA functions and provide valuable information for experimental studies of candidate lncRNAs associated with human disease.
概要
越来越多证据表明RNA 在生物系统中扮演着重 要的角色,而这些发现支持了生命起源于RNA 的假设。在人类基因组中,大部分的基因并不编 码蛋白质,被称为非编码RNA 基因。长非编码 RNA(lncRNA)是其中最大的一类,其转录本长 度大于200 个核苷酸。虽然一些lncRNA 已被证 明是调控基因表达和3D 基因组结构的重要元 件,但是大部分lncRNA 还未被研究和注释。本 课题组利用大量基因组数据,提出一些基于数据 挖掘和机器学习的方法,对人类lncRNA 进行功 能注释。我们与其他同领域课题组的近期研究结 果表明,基因组数据挖掘可帮助加深对lncRNA 功能的理解,并为与疾病相关lncRNA 的实验研 究提供重要信息。
Similar content being viewed by others
References
Achar A, Sætrom P, 2015. RNA motif discovery: a computational overview. Biol Direct, 10:61. https://doi.org/10.1186/s13062-015-0090-5
Brázda V, Hároniková L, Liao JCC, et al., 2014. DNA and RNA quadruplex-binding proteins. Int J Mol Sci, 15(10): 17493–17517. https://doi.org/10.3390/ijms151017493
Cabili MN, Dunagin MC, McClanahan PD, et al., 2015. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol, 16:20. https://doi.org/10.1186/s13059-015-0586-4
Cajigas I, Leib DE, Cochrane J, et al., 2015. Evf2 lncRNA/BRG1/DLX1 interactions reveal RNA-dependent inhibition of chromatin remodeling. Development, 142(15): 2641–2652. https://doi.org/10.1242/dev.126318
Cammas A, Millevoi S, 2017. RNA G-quadruplexes: emerging mechanisms in disease. Nucleic Acids Res, 45(4):1584–1595. https://doi.org/10.1093/nar/gkw1280
Cao HF, Wahlestedt C, Kapranov P, 2018. Strategies to annotate and characterize long noncoding RNAs: advantages and pitfalls. Trends Genet, 34(9):704–721. https://doi.org/10.1016/j.tig.2018.06.002
Cao Z, Pan XY, Yang Y, et al., 2018. The lncLocator: a sub-cellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics, 34(13):2185–2194. https://doi.org/10.1093/bioinformatics/bty085
Carlevaro-Fita J, Johnson R, 2019. Global positioning system: understanding long noncoding RNAs through subcellular localization. Mol Cell, 73(5):869–883. https://doi.org/10.1016/j.molcel.2019.02.008
Chaudhary R, Gryder B, Woods WS, et al., 2017. Prosurvival long noncoding RNA PINCR regulates a subset of p53 targets in human colorectal cancer cells by binding to Matrin 3. eLife, 6:e23244. https://doi.org/10.7554/eLife.23244
Chen LL, 2016. Linking long noncoding RNA localization and function. Trends Biochem Sci, 41(9):761–772. https://doi.org/10.1016/j.tibs.2016.07.003
Ching T, Himmelstein DS, Beaulieu-Jones BK, et al., 2018. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface, 15(141):20170387. https://doi.org/10.1098/rsif.2017.0387
Clark BS, Blackshaw S, 2014. Long non-coding RNA-dependent transcriptional regulation in neuronal development and disease. Front Genet, 5:164. https://doi.org/10.3389/fgene.2014.00164
Clemson CM, Hutchinson JN, Sara SA, et al., 2009. An architectural role for a nuclear noncoding RNA: NEAT1 RNA is essential for the structure of paraspeckles. Mol Cell, 33(6):717–726. https://doi.org/10.1016/j.molcel.2009.01.026
Cogill SB, Wang LJ, 2014. Co-expression network analysis of human lncRNAs and cancer genes. Cancer Inform, 13(Suppl 5):49–59. https://doi.org/10.4137/CIN.S14070
Cogill SB, Wang LJ, 2016. Support vector machine model of developmental brain gene expression data for prioritization of Autism risk gene candidates. Bioinformatics, 32(23):3611–3618. https://doi.org/10.1093/bioinformatics/btw498
Cogill SB, Srivastava AK, Yang MQ, et al., 2018. Co-expression of long non-coding RNAs and autism risk genes in the developing human brain. BMC Syst Biol, 12(Suppl 7):91. https://doi.org/10.1186/s12918-018-0639-x
Darnell JC, Fraser CE, Mostovetsky O, et al., 2005. Kissing complex RNAs mediate interaction between the Fragile-X mental retardation protein KH2 domain and brain polyribosomes. Genes Dev, 19(8):903–918. https://doi.org/10.1101/gad.1276805
Davidovich C, Cech TR, 2015. The recruitment of chromatin modifiers by long noncoding RNAs: lessons from PRC2. RNA, 21(12):2007–2022. https://doi.org/10.1261/rna.053918.115
de Rubeis S, He X, Goldberg AP, et al., 2014. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature, 515(7526):209–215. https://doi.org/10.1038/nature13772
Derrien T, Johnson R, Bussotti G, et al., 2012. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res, 22(9):1775–1789. https://doi.org/10.1101/gr.132159.111
ENCODE Project Consortium, 2012. An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414):57–74. https://doi.org/10.1038/nature11247
Ferrè F, Colantoni A, Helmer-Citterich M, 2016. Revealing protein-lncRNA interaction. Brief Bioinform, 17(1):106–116. https://doi.org/10.1093/bib/bbv031
Geisler S, Coller J, 2013. RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol, 14(11):699–712. https://doi.org/10.1038/nrm3679
Gudenas BL, Wang LJ, 2015. Gene coexpression networks in human brain developmental transcriptomes implicate the association of long noncoding RNAs with intellectual disability. Bioinform Biol Insights, 9(Suppl 1):21–27. https://doi.org/10.4137/BBI.S29435
Gudenas BL, Wang LJ, 2018. Prediction of lncRNA subcellular localization with deep learning from sequence features. Sci Rep, 8(1):16385. https://doi.org/10.1038/s41598-018-34708-w
Gudenas BL, Srivastava AK, Wang LJ, 2017. Integrative genomic analyses for identification and prioritization of long non-coding RNAs associated with autism. PLoS ONE, 12(5):e0178532. https://doi.org/10.1371/journal.pone.0178532
Guo Y, Chen X, Xing RX, et al., 2018. Interplay between FMRP and lncRNA TUG1 regulates axonal development through mediating SnoN-Ccd1 pathway. Hum Mol Genet, 27(3):475–485. https://doi.org/10.1093/hmg/ddx417
Guttman M, Rinn JL, 2012. Modular regulatory principles of large non-coding RNAs. Nature, 482(7385):339–346. https://doi.org/10.1038/nature10887
Hangauer MJ, Vaughn IW, McManus MT, 2013. Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet, 9(6):e1003569. https://doi.org/10.1371/journal.pgen.1003569
Huarte M, Guttman M, Feldser D, et al., 2010. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell, 142(3):409–419. https://doi.org/10.1016/j.cell.2010.06.040
Iyer MK, Niknafs YS, Malik R, et al., 2015. The landscape of long noncoding RNAs in the human transcriptome. Nat Genet, 47(3):199–208. https://doi.org/10.1038/ng.3192
Jackman JE, Alfonzo JD, 2013. Transfer RNA modifications: nature’s combinatorial chemistry playground. Wiley Interdiscip Rev RNA, 4(1):35–48. https://doi.org/10.1002/wrna.1144
Jin JJ, Lv W, Xia P, et al., 2018. Long noncoding RNA SYISL regulates myogenesis by interacting with polycomb repressive complex 2. Proc Natl Acad Sci USA, 115(42): E9802–E9811. https://doi.org/10.1073/pnas.1801471115
Ke SD, Alemu EA, Mertens C, et al., 2015. A majority of m6A residues are in the last exons, allowing the potential for 3′ UTR regulation. Genes Dev, 29(19):2037–2053. https://doi.org/10.1101/gad.269415.115
Kiser DP, Rivero O, Lesch KP, 2015. Annual research review: the (epi)genetics of neurodevelopmental disorders in the era of whole-genome sequencing—unveiling the dark matter. J Child Psychol Psychiatry, 56(3):278–295. https://doi.org/10.1111/jcpp.12392
Kumar V, Westra HJ, Karjalainen J, et al., 2013. Human disease-associated genetic variation impacts large intergenic non-coding RNA expression. PLoS Genet, 9(1):e1003201. https://doi.org/10.1371/journal.pgen.1003201
Kung JT, Kesner B, An JY, et al., 2015. Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF. Mol Cell, 57(2):361–375. https://doi.org/10.1016/j.molcel.2014.12.006
Li L, Zhuang YL, Zhao XS, et al., 2019. Long non-coding RNA in neuronal development and neurological disorders. Front Genet, 9:744. https://doi.org/10.3389/fgene.2018.00744
Li R, Zhu HL, Luo YB, 2016. Understanding the functions of long non-coding RNAs through their higher-order structures. Int J Mol Sci, 17(5):E702. https://doi.org/10.3390/ijms17050702
Liao Q, Liu CN, Yuan XY, et al., 2011. Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network. Nucleic Acids Res, 39(9): 3864–3878. https://doi.org/10.1093/nar/gkq1348
Linder B, Grozhik AV, Olarerin-George AO, et al., 2015. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat Methods, 12(8):767–772. https://doi.org/10.1038/nmeth.3453
Liu N, Dai Q, Zheng GQ, et al., 2015. N 6-methyladenosine-dependent RNA structural switches regulate RNA-protein interactions. Nature, 518(7540):560–564. https://doi.org/10.1038/nature14234
Lu QS, Ren SJ, Lu M, et al., 2013. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics, 14:651. https://doi.org/10.1186/1471-2164-14-651
Maurano MT, Humbert R, Rynes E, et al., 2012. Systematic localization of common disease-associated variation in regulatory DNA. Science, 337(6099):1190–1195. https://doi.org/10.1126/science.1222794
Morris KV, 2016. Long Non-coding RNAs in Human Disease. Springer International Publishing, Cham, Germany. https://doi.org/10.1007/978-3-319-23907-1
Muppirala UK, Honavar VG, Dobbs D, 2011. Predicting RNA-protein interactions using only sequence information. BMC Bioinformatics, 12:489. https://doi.org/10.1186/1471-2105-12-489
Necsulea A, Soumillon M, Warnefors M, et al., 2014. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature, 505(7485):635–640. https://doi.org/10.1038/nature12943
O’Roak BJ, Vives L, Girirajan S, et al., 2012. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature, 485(7397):246–250. https://doi.org/10.1038/nature10989
Pan XY, Fan YX, Yan JC, et al., 2016. IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction. BMC Genomics, 17:582. https://doi.org/10.1186/s12864-016-2931-8
Patil DP, Chen CK, Pickering BF, et al., 2016. m6A RNA methylation promotes XIST-mediated transcriptional repression. Nature, 537(7620):369–373. https://doi.org/10.1038/nature19342
Pertea M, Salzberg SL, 2010. Between a chicken and a grape: estimating the number of human genes. Genome Biol, 11(5):206. https://doi.org/10.1186/gb-2010-11-5-206
Pian C, Zhang GL, Chen Z, et al., 2016. LncRNApred: classification of long non-coding RNAs and protein-coding transcripts by the ensemble algorithm with a new hybrid feature. PLoS ONE, 11(5):e0154567. https://doi.org/10.1371/journal.pone.0154567
Ponting CP, Oliver PL, Reik W, 2009. Evolution and functions of long noncoding RNAs. Cell, 136(4):629–641. https://doi.org/10.1016/j.cell.2009.02.006
Quinn JJ, Chang HY, 2016. Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet, 17(1):47–62. https://doi.org/10.1038/nrg.2015.10
Rashid F, Shah A, Shan G, 2016. Long non-coding RNAs in the cytoplasm. Genomics Proteomics Bioinformatics, 14(2): 73–80. https://doi.org/10.1016/j.gpb.2016.03.005
Rcaño-Ponce I, Wijmenga C, 2013. Mapping of immune-mediated disease genes. Annu Rev Genomics Hum Genet, 14:325–353. https://doi.org/10.1146/annurev-genom-091212-153450
Song JH, Yi CQ, 2017. Chemical modifications to RNA: a new layer of gene expression regulation. ACS Chem Biol, 12(2):316–325. https://doi.org/10.1021/acschembio.6b00960
Srivastava AK, Schwartz CE, 2014. Intellectual disability and autism spectrum disorders: causal genes and molecular mechanisms. Neurosci Biobehav Rev, 46:161–174. https://doi.org/10.1016/j.neubiorev.2014.02.015
Su ZD, Huang Y, Zhang ZY, et al., 2018. iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics, 34(24):4196–4204. https://doi.org/10.1093/bioinformatics/bty508
Sun QY, Hao QY, Prasanth KV, 2018. Nuclear long noncoding RNAs: key regulators of gene expression. Trends Genet, 34(2):142–157. https://doi.org/10.1016/j.tig.2017.11.005
Sun S, del Rosario BC, Szanto A, et al., 2013. Jpx RNA activates Xist by evicting CTCF. Cell, 153(7):1537–1551. https://doi.org/10.1016/j.cell.2013.05.028
Tripathi V, Ellis JD, Shen Z, et al., 2010. The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell, 39(6):925–938. https://doi.org/10.1016/j.molcel.2010.08.011
van de Vondervoort IIGM, Gordebeke PM, Khoshab N, et al., 2013. Long non-coding RNAs in neurodevelopmental disorders. Front Mol Neurosci, 6:53. https://doi.org/10.3389/fnmol.2013.00053
Verpelli C, Montani C, Vicidomini C, et al., 2013. Mutations of the synapse genes and intellectual disability syndromes. Eur J Pharmacol, 719(1–3):112–116. https://doi.org/10.1016/j.ejphar.2013.07.023
Wang KC, Chang HY, 2011. Molecular mechanisms of long noncoding RNAs. Mol Cell, 43(6):904–914. https://doi.org/10.1016/j.molcel.2011.08.018
Wang X, He C, 2014. Dynamic RNA modifications in post-transcriptional regulation. Mol Cell, 56(1):5–12. https://doi.org/10.1016/j.molcel.2014.09.001
Wang X, Lu ZK, Gomez A, et al., 2014. N 6-methyladenosine-dependent regulation of messenger RNA stability. Nature, 505(7481):117–120. https://doi.org/10.1038/nature12730
Wang X, Zhao BS, Roundtree IA, et al., 2015. N 6-methyladenosine modulates messenger RNA translation efficiency. Cell, 161(6):1388–1399. https://doi.org/10.1016/j.cell.2015.05.014
Wang Y, Zhao X, Ju W, et al., 2015. Genome-wide differential expression of synaptic long noncoding RNAs in autism spectrum disorder. Transl Psychiatry, 5(10):e660. https://doi.org/10.1038/tp.2015.144
Werner MS, Ruthenburg AJ, 2015. Nuclear fractionation reveals thousands of chromatin-tethered noncoding RNAs adjacent to active genes. Cell Rep, 12(7):1089–1098. https://doi.org/10.1016/jxelrep.2015.07.033
Wu P, Zuo XL, Deng HL, et al., 2013. Roles of long noncoding RNAs in brain development, functional diversification and neurodegenerative diseases. Brain Res Bull, 97:69–80. https://doi.org/10.1016/j.brainresbull.2013.06.001
Xu X, Xu YC, Shi CQ, et al., 2017. A genome-wide comprehensively analyses of long noncoding RNA profiling and metastasis associated lncRNAs in renal cell carcinoma. Oncotarget, 8(50):87773–87781. https://doi.org/10.18632/oncotarget.21206
Yang LT, Tang YY, Xiong F, et al., 2018. LncRNAs regulate cancer metastasis via binding to functional proteins. Oncotarget, 9(1):1426–1443. https://doi.org/10.18632/oncotarget.22840
Yoon JH, Abdelmohsen K, Kim J, et al., 2013. Scaffold function of long non-coding RNA HOTAIR in protein ubiq-uitination. Nat Commun, 4:2939. https://doi.org/10.1038/ncomms3939
Zampetaki A, Albrecht A, Steinhofel K, 2018. Long-noncoding RNA structure and function: is there a link? Front Physiol, 9:1201. https://doi.org/10.3389/fphys.2018.01201
Zhang YQ, Hamada M, 2018. DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning. BMC Bioinformatics, 19(Suppl 19):524. https://doi.org/10.1186/s12859-018-2516-4
Zhang ZH, Jhaveri DJ, Marshall VM, et al., 2014. A comparative study of techniques for differential expression analysis on RNA-seq data. PLoS ONE, 9(8):e103207. https://doi.org/10.1371/journal.pone.0103207
Zheng GXY, Do BT, Webster DE, et al., 2014. Dicer-microRNA-Myc circuit promotes transcription of hundreds of long noncoding RNAs. Nat Struct Mol Biol, 21(7):585–590. https://doi.org/10.1038/nsmb.2842
Zhou Y, Zeng P, Li YH, et al., 2016. SRAMP: prediction of mammalian N 6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Res, 44(10):e91. https://doi.org/10.1093/nar/gkw104
Ziats MN, Rennert OM, 2013. Aberrant expression of long noncoding RNAs in autistic brain. J Mol Neurosci, 49(3): 589–593. https://doi.org/10.1007/s12031-012-9880-8
Zou Q, Xing PW, Wei LY, et al., 2019. Gene2vec: gene subsequence embedding for prediction of mammalian N 6-methyladenosine sites from mRNA. RNA, 25(2):205–218. https://doi.org/10.1261/rna.069112.118
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the Self Regional Healthcare Foundation, USA
Rights and permissions
About this article
Cite this article
Gudenas, B.L., Wang, J., Kuang, Sz. et al. Genomic data mining for functional annotation of human long noncoding RNAs. J. Zhejiang Univ. Sci. B 20, 476–487 (2019). https://doi.org/10.1631/jzus.B1900162
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.B1900162