Identification and Characterization of the Potential Promoter Regions of 1031 Kinds of Human Genes

  1. Yutaka Suzuki1,2,3,9,
  2. Tatsuhiko Tsunoda2,3,
  3. Jun Sese4,
  4. Hirotoshi Taira5,
  5. Junko Mizushima-Sugano1,2,
  6. Hiroko Hata1,
  7. Toshio Ota6,
  8. Takao Isogai6,
  9. Toshihiro Tanaka2,
  10. Yusuke Nakamura2,
  11. Akira Suyama7,
  12. Yoshiyuki Sakaki2,3,
  13. Shinichi Morishita4,
  14. Kousaku Okubo8, and
  15. Sumio Sugano1,2
  1. 1Department of Virology and 2Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo 108–8639, Japan; 3Genome Science Center, Institute of Physical and Chemical Research (RIKEN); Wakoshi, Saitama 351–0106, Japan; 4Department of Complexity Science and Engineering Graduate School of Frontier Science, University of Tokyo, Bunkyo-ku, Tokyo 113–0033, Japan; 5Intelligent Communication Laboratory, Nippon Telegraph and Telephone Communication Science Laboratories, Seika-cho, Soraku-gun, Kyoto 619–0237, Japan; 6Helix Research Institute, Kisarazushi, Chiba 292–0812, Japan; 7Department of Life Sciences, University of Tokyo, Meguro-ku, Tokyo 153–0041, Japan; 8The Institute of Molecular and Cell Biology, Osaka University, Suita-shi, Osaka 565–0871, Japan

Abstract

To understand the mechanism of transcriptional regulation, it is essential to identify and characterize the promoter, which is located proximal to the mRNA start site. To identify the promoters from the large volumes of genomic sequences, we used mRNA start sites determined by a large-scale sequencing of the cDNA libraries constructed by the “oligo-capping” method. We aligned the mRNA start sites with the genomic sequences and retrieved adjacent sequences as potential promoter regions (PPRs) for 1031 genes. The PPR sequences were searched to determine the frequencies of major promoter elements. Among 1031 PPRs, 329 (32%) contained TATA boxes, 872 (85%) contained initiators, 999 (97%) contained GC box, and 663 (64%) contained CAAT box. Furthermore, 493 (48%) PPRs were located in CpG islands. This frequency of CpG islands was reduced in TATA+/Inr+PPRs and in the PPRs of ubiquitously expressed genes. In the PPRs of the CGM2 gene, the DRA gene, and theTM30pl genes, which showed highly colon specific expression patterns, the consensus sequences of E boxes were commonly observed. The PPRs were also useful for exploring promoter SNPs.

[The nucleotide sequences described in this paper have been deposited in the DDBJ, EMBL, and GenBank data libraries under accession nos.AU098358AU100608.]

Footnotes

  • 9 Corresponding author.

  • E-MAIL ysuzuki{at}ims.u-tokyo.ac.jp; FAX 81 3 5449 5416.

  • Article published on-line before print: Genome Res.,10.1101/gr.164001.

  • Article and publication are at www.genome.org/cgi/doi/10.1101/gr.164001.

    • Received September 5, 2000.
    • Accepted February 5, 2001.
| Table of Contents

Preprint Server