High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue specificity

  1. John L. Rinn1,5,10,11,12
  1. 1Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA;
  2. 2Department of Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts 02115, USA;
  3. 3Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium;
  4. 4VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium;
  5. 5Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;
  6. 6Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge CB2 0QQ, United Kingdom;
  7. 7Genetics and Genome Biology Program, Sickkids Research Institute, Toronto, Ontario M5G 0A4, Canada;
  8. 8Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A1, Canada;
  9. 9Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Catalonia 08034, Spain;
  10. 10Department of Pathology, Beth Israel Deaconess Medical Center, Boston, Massachusetts 02115, USA;
  11. 11Department of Biochemistry, University of Colorado, BioFrontiers Institute, Boulder, Colorado 80301, USA
  1. 12 These authors contributed equally to this work.

  • Corresponding author: marta.mele.messeguer{at}gmail.com
  • Abstract

    Transcription initiates at both coding and noncoding genomic elements, including mRNA and long noncoding RNA (lncRNA) core promoters and enhancer RNAs (eRNAs). However, each class has a different expression profile with lncRNAs and eRNAs being the most tissue specific. How these complex differences in expression profiles and tissue specificities are encoded in a single DNA sequence remains unresolved. Here, we address this question using computational approaches and massively parallel reporter assays (MPRA) surveying hundreds of promoters and enhancers. We find that both divergent lncRNA and mRNA core promoters have higher capacities to drive transcription than nondivergent lncRNA and mRNA core promoters, respectively. Conversely, intergenic lncRNAs (lincRNAs) and eRNAs have lower capacities to drive transcription and are more tissue specific than divergent genes. This higher tissue specificity is strongly associated with having less complex transcription factor (TF) motif profiles at the core promoter. We experimentally validated these findings by testing both engineered single-nucleotide deletions and human single-nucleotide polymorphisms (SNPs) in MPRA. In both cases, we observe that single nucleotides associated with many motifs are important drivers of promoter activity. Thus, we suggest that high TF motif density serves as a robust mechanism to increase promoter activity at the expense of tissue specificity. Moreover, we find that 22% of common SNPs in core promoter regions have significant regulatory effects. Collectively, our findings show that high TF motif density provides redundancy and increases promoter activity at the expense of tissue specificity, suggesting that specificity of expression may be regulated by simplicity of motif usage.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.242222.118.

    • Freely available online through the Genome Research Open Access option.

    • Received July 29, 2018.
    • Accepted January 17, 2019.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server