Abstract
Analysis of various predicted structural properties of promoter regions in prokaryotic as well as eukaryotic genomes had earlier indicated that they have several common features, such as lower stability, higher curvature and less bendability, when compared with their neighboring regions. Based on the difference in stability between neighboring upstream and downstream regions in the vicinity of experimentally determined transcription start sites, a promoter prediction algorithm has been developed to identify prokaryotic promoter sequences in whole genomes. The average free energy (E) over known promoter sequences and the difference (D) between E and the average free energy over the entire genome (G) are used to search for promoters in the genomic sequences. Using these cutoff values to predict promoter regions across entire Escherichia coli genome, we achieved a reliability of 70% when the predicted promoters were cross verified against the 960 transcription start sites (TSSs) listed in the Ecocyc database. Annotation of the whole E. coli genome for promoter region could be carried out with 49% accuracy. The method is quite general and it can be used to annotate the promoter regions of other prokaryotic genomes.
Similar content being viewed by others
Abbreviations
- nt:
-
Nucleotides
- RNAP:
-
RNA polymerase
- TSS:
-
transcription start site
References
Allawi H T and SantaLucia J Jr 1997 Thermodynamics and NMR of internal G.T mismatches in DNA; Biochemistry 36 10581–10594
Botchan P 1976 An Electron Microscopic Comparison of Transcription on Linear and Superhelical DNA; J. Mol. Biol. 105 161–176
Breslauer K J, Frank R, Blocker H and Marky L A 1986, Predicting DNA duplex stability from the base sequence; Proc. Natl. Acad. Sci. USA 83 3746–3750
Bucher P 1990 Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences; J. Mol. Biol. 212 563–578
Fickett J W and Hatzigeorgiou A G 1997, Eukaryotic promoter recognition; Genome Res. 7 861–878
Harley C B and Reynolds R P 1987 Analysis of E. coli promoter sequences; Nucleic Acids Res. 15 2343–2361
Hutchinson G B 1996 The prediction of vertebrate promoter regions using differential hexamer frequency analysis; Comput. Appl. Biosci. 12 391–398
Kanhere A and Bansal M 2003 Identification of additional ‘punctuation marks’ in genomic DNA; Proc. FAOBMB Bangalore 139 7–11
Kanhere A and Bansal M 2005a Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes; Nucleic Acids Res. 33 3165–3175
Kanhere A and Bansal M 2005b A novel method for prokaryotic promoter prediction based on DNA stability; BMC Bioinformatics 6 1471–2105
Keseler I M, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen IT, Peralta-Gil M and Karp PD 2005 EcoCyc: A comprehensive database resource for Escherichia coli; Nucleic Acids Res. 33 D334–D377
Kowalski D, Natale D and Eddy M 1988 Stable DNA unwinding, not “breathing,” accounts for single-strand-specific nuclease hypersensitivity of specific A+T-rich sequences; Proc. Natl. Acad. Sci. USA 85 9464–9468
Makita Y, Nakao M, Ogasawara N and Nakai K 2004 DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics; Nucleic Acids Res. 32 D75–D77
Margalit H, Shapiro B A, Nussinov R, Owens J and Jernigan RL 1988 Helix stability in prokaryotic promoter regions; Biochemistry 27 5179–5188
Ohler U and Niemann H 2001 Identification and analysis of eukaryotic promoters: recent computational approaches; Trends Genet. 17 56–60
Pedersen A G, Baldi P, Chauvin Y and Brunak S 1999 The biology of eukaryotic promoter prediction — a review; Comput. Chem. 23 191–207
Prestridge D S 1995 Predicting Pol II promoter sequences using transcriptional factor binding sites; J. Mol. Biol. 249 923–932
Reese M G 2001 Application of time-delay neural network to promoter annotation in the drosophila melanogaster genome; Comput. Chem. 26 51–56
Salgado H, Gama-Castro S, Martinez-Antonio A, Diaz-Peredo E, Sanchez-Solano F, Peralta-Gil M, Garcia-Alonso D, Jimenez-Jacinto V et al 2004 RegulonDB (version 4.0), Transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12; Nucleic Acids Res. 32 D303–D306
SantaLucia J Jr 1998 A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbour thermodynamics; Proc. Natl. Acad. Sci. USA 95 1460–1465
Staden R 1984 Computer methods to locate signals in nucleic acid sequences; Nucleic Acids Res. 12 789–800
Vollenweider H J, Fiandt M and Szybalski W 1979 A relationship between DNA helix stability and recognition sites for RNA polymerase; Science 205 508–511
Wang H and Benham C J 2006 Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress; BMC Bioinformatics 7 1471–2105
Werner T 1999 Models for prediction and recognition of eukaryotic promoters; Mammal. Genome 10 168–175
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rangannan, V., Bansal, M. Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability. J Biosci 32 (Suppl 1), 851–862 (2007). https://doi.org/10.1007/s12038-007-0085-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-007-0085-1