Systematic interrogation of human promoters

  1. Eran Segal1,2
  1. 1Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel;
  2. 2Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
  1. 3 These authors contributed equally to this work.

  • Present addresses: 4Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; 5Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel; 6Department of Genetics, Stanford University, Stanford, CA 94305, USA; 7Department of Biology, Stanford University, Stanford, CA 94305, USA; 8IVF Laboratory and Wolfe PGD-Stem Cell Laboratory, Racine IVF Unit, Lis Maternity Hospital, Tel-Aviv Sourasky Medical Center, Tel Aviv 6423906, Israel

  • Corresponding authors: shirawg{at}broadinstitute.org; eran.segal{at}weizmann.ac.il
  • Abstract

    Despite much research, our understanding of the architecture and cis-regulatory elements of human promoters is still lacking. Here, we devised a high-throughput assay to quantify the activity of approximately 15,000 fully designed sequences that we integrated and expressed from a fixed location within the human genome. We used this method to investigate thousands of native promoters and preinitiation complex (PIC) binding regions followed by in-depth characterization of the sequence motifs underlying promoter activity, including core promoter elements and TF binding sites. We find that core promoters drive transcription mostly unidirectionally and that sequences originating from promoters exhibit stronger activity than those originating from enhancers. By testing multiple synthetic configurations of core promoter elements, we dissect the motifs that positively and negatively regulate transcription as well as the effect of their combinations and distances, including a 10-bp periodicity in the optimal distance between the TATA and the initiator. By comprehensively screening 133 TF binding sites, we find that in contrast to core promoters, TF binding sites maintain similar activity levels in both orientations, supporting a model by which divergent transcription is driven by two distinct unidirectional core promoters sharing bidirectional TF binding sites. Finally, we find a striking agreement between the effect of binding site multiplicity of individual TFs in our assay and their tendency to appear in homotypic clusters throughout the genome. Overall, our study systematically assays the elements that drive expression in core and proximal promoter regions and sheds light on organization principles of regulatory regions in the human genome.

    Footnotes

    • Received February 14, 2018.
    • Accepted December 5, 2018.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents

    Preprint Server