Skip to main content

Regulatory Motif Analysis

  • Chapter
  • First Online:
Bioinformatics

Abstract

The first complete genome sequences of eukaryotes revealed that much of the genetic material did not code for protein sequences (Lander et al. 2001; Venter et al. 2001). Although this noncoding DNA was once thought to be “junk” DNA, it is now appreciated that large portions of it are actively conserved over evolution (Waterston et al. 2002; Johnston and Stormo 2003), suggesting that these regions contain important functional elements.

A first hypothesis about the function of this noncoding DNA is that it is involved in the regulation of gene activity. One of the best-understood mechanisms of gene regulation is the modulation of transcriptional initiation by sequence specific DNA binding proteins (or transcription factors). These proteins recognize short sequences in noncoding DNA that fall into families or contain consensus patterns or motifs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJ et al (2008) Text-mining assisted regulatory annotation. Genome Biol 9(2):R31

    Article  PubMed  Google Scholar 

  • Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2:28–36

    CAS  PubMed  Google Scholar 

  • Bailey TL, Gribskov M (1998) Methods and statistics for combining motif match scores. J Comput Biol 5(2):211–221

    Article  CAS  PubMed  Google Scholar 

  • Barash Y, Bejerano G, Friedman N (2001) A simple hyper-geometric approach for discovering putative transcription factor binding sites. Proceedings of the first international workshop on algorithms in bioinformatics, Springer

    Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Royal Stat Soc B 57(1):289–300

    Google Scholar 

  • Berg OG, von Hippel PH (1987) Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol 193(4):723–750

    Article  CAS  PubMed  Google Scholar 

  • Bergman CM, Carlson JW, Celniker SE (2005) Drosophila DNase I footprint database: A systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics 21(8):1747–1749

    Article  CAS  PubMed  Google Scholar 

  • Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M et al (2002) Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci USA 99(2):757–762

    Article  CAS  PubMed  Google Scholar 

  • Blanchette M, Tompa M (2003) FootPrinter: A program designed for phylogenetic footprinting. Nucleic Acids Res 31(13):3840–3842

    Article  CAS  PubMed  Google Scholar 

  • Bussemaker HJ, Li H, Siggia ED (2000) Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci USA 97(18):10096–10100

    Article  CAS  PubMed  Google Scholar 

  • Bussemaker HJ, Li H, Siggia ED (2001) Regulatory element detection using correlation with expression. Nat Genet 27(2):167–171

    Article  CAS  PubMed  Google Scholar 

  • Chiang DY, Brown PO, Eisen MB (2001) Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles. Bioinformatics 17(Suppl 1):S49–S55

    PubMed  Google Scholar 

  • Down TA, Hubbard TJ (2005) NestedMICA: Sensitive inference of over-represented motifs in nucleic acid sequence. Nucleic Acids Res 33(5):1445–1453

    Article  CAS  PubMed  Google Scholar 

  • Dubchak I, Ryaboy DV (2006) VISTA family of computational tools for comparative analysis of DNA sequences and whole genomes. Methods Mol Biol 338:69–89

    CAS  PubMed  Google Scholar 

  • Durbin R, Eddy SR, Krogh A, Mitchison GJ (1998) Biological sequence analysis: Probalistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, UK

    Google Scholar 

  • Eden E, Lipson D, Yogev S, Yakhini Z (2007) Discovering motifs in ranked lists of DNA sequences. PLoS Comput Biol 3(3):e39

    Article  PubMed  Google Scholar 

  • Eskin E, Pevzner PA (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl 1):S354–S363

    PubMed  Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol 17(6):368–376

    Article  CAS  PubMed  Google Scholar 

  • Frith MC, Li MC, Weng Z (2003) Cluster-Buster: Finding dense clusters of motifs in DNA sequences. Nucleic Acids Res 31(13):3666–3668

    Article  CAS  PubMed  Google Scholar 

  • Gadiraju S, Vyhlidal CA, Leeder JS, Rogan PK (2003) Genome-wide prediction, display and refinement of binding sites with information theory-based models. BMC Bioinformatics 4:38

    Article  PubMed  Google Scholar 

  • Gallo SM, Li L, Hu Z, Halfon MS (2006) REDfly: A regulatory element database for Drosophila. Bioinformatics 22(3):381–383

    Article  CAS  PubMed  Google Scholar 

  • Halfon MS, Grad Y, Church GM, Michelson AM (2002) Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res 12(7):1019–1028

    CAS  PubMed  Google Scholar 

  • Heinemeyer T, Wingender E, Reuter I, Hermjakob H, Kel AE, Kel OV et al (1998) Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res 26(1):362–367

    Article  CAS  PubMed  Google Scholar 

  • Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577

    Article  CAS  PubMed  Google Scholar 

  • Johnston M, Stormo GD (2003) Evolution. Heirlooms in the attic. Science 302(5647):997–999

    Article  CAS  PubMed  Google Scholar 

  • Kechris KJ, van Zwet E, Bickel PJ, Eisen MB (2004) Detecting DNA regulatory motifs by incorporating positional trends in information content. Genome Biol 5(7):R50

    Article  PubMed  Google Scholar 

  • Kellis M, Patterson N, Birren B, Berger B, Lander ES (2004) Methods in comparative genomics: Genome correspondence, gene identification and regulatory motif discovery. J Comput Biol 11(2–3):319–355

    Article  CAS  PubMed  Google Scholar 

  • Kullback S, Leible RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  Google Scholar 

  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921

    Article  CAS  PubMed  Google Scholar 

  • Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262(5131):208–214

    Article  CAS  PubMed  Google Scholar 

  • Lawrence CE, Reilly AA (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7(1):41–51

    Article  CAS  PubMed  Google Scholar 

  • Levine M, Davidson EH (2005) Gene regulatory networks for development. Proc Natl Acad Sci USA 102(14):4936–4942

    Article  CAS  PubMed  Google Scholar 

  • Lifanov AP, Makeev VJ, Nazina AG, Papatsenko DA (2003) Homotypic regulatory clusters in Drosophila. Genome Res 13(4):579–588

    Article  CAS  PubMed  Google Scholar 

  • Liu JS, Neuwald AF, Lawrence CE (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J Am Stat Assoc 90(432):1156–1170

    Article  Google Scholar 

  • Mannervik M, Nibu Y, Zhang H, Levine M (1999) Transcriptional coregulators in development. Science 284(5414):606–609

    Article  CAS  PubMed  Google Scholar 

  • Markstein M, Levine M (2002) Decoding cis-regulatory DNAs in the Drosophila genome. Curr Opin Genet Dev 12(5):601–606

    Article  CAS  PubMed  Google Scholar 

  • Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED et al (2006) ORegAnno: An open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics 22(5):637–640

    Article  CAS  PubMed  Google Scholar 

  • Moses AM, Chiang DY, Eisen MB (2004a) Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. Pac Symp Biocomput:324–335

    Google Scholar 

  • Moses AM, Chiang DY, Pollard DA, Iyer VN, Eisen MB (2004b) MONKEY: Identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol 5(12):R98

    Article  PubMed  Google Scholar 

  • Münch R, Hiller K, Barg H, Heldt D, Linz S, Wingender E et al (2003) PRODORIC: Prokaryotic database of gene regulation. Nucleic Acids Res 31(1):266–269

    Article  PubMed  Google Scholar 

  • Ovcharenko I, Boffelli D, Loots GG (2004) eShadow: A tool for comparing closely related sequences. Genome Res 14(6):1191–1198

    Article  CAS  PubMed  Google Scholar 

  • Pavesi G, Mereghetti P, Mauri G, Pesole G (2004) Weeder Web: Discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res 32(Web Server issue):W199–W203

    Article  CAS  PubMed  Google Scholar 

  • Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B (2004a) JASPAR: An open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32(Database issue):D91–D94

    Article  CAS  PubMed  Google Scholar 

  • Sandelin A, Wasserman WW, Lenhard B (2004b) ConSite: Web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res 32(Web Server issue):W249–W252

    Article  CAS  PubMed  Google Scholar 

  • Schneider TD, Stormo GD, Gold L, Ehrenfeucht A (1986) Information content of binding sites on nucleotide sequences. J Mol Biol 188(3):415–431

    Article  CAS  PubMed  Google Scholar 

  • Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, Gaul U (2008) Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451(7178):535–540

    Article  CAS  PubMed  Google Scholar 

  • Segal E, Yelensky R, Koller D (2003) Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19(Suppl 1):i273–i282

    Article  PubMed  Google Scholar 

  • Siddharthan R, Siggia ED, van Nimwegen E (2005) PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol 1(7):e67

    Article  PubMed  Google Scholar 

  • Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K et al (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8):1034–1050

    Article  CAS  PubMed  Google Scholar 

  • Sinha S, Blanchette M, Tompa M (2004) PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 5:170

    Article  PubMed  Google Scholar 

  • Sinha S, Liang Y, Siggia E (2006) Stubb: A program for discovery and analysis of cis-regulatory modules. Nucleic Acids Res 34(Web Server issue):W555–W559

    Article  CAS  PubMed  Google Scholar 

  • Sinha S, Tompa M (2000) A statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol 8:344–354

    CAS  PubMed  Google Scholar 

  • Smith AD, Sumazin P, Zhang MQ (2005) Identifying tissue-selective transcription factor binding sites in vertebrate promoters. Proc Natl Acad Sci USA 102(5):1560–1565

    Article  CAS  PubMed  Google Scholar 

  • Staden R (1989) Methods for calculating the probabilities of finding patterns in sequences. Comput Appl Biosci 5(2):89–96

    CAS  PubMed  Google Scholar 

  • Stormo GD (2000) DNA binding sites: Representation and discovery. Bioinformatics 16(1):16–23

    Article  CAS  PubMed  Google Scholar 

  • Stormo GD, Hartzell GW III (1989) Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci USA 86(4):1183–1187

    Article  CAS  PubMed  Google Scholar 

  • Tompa M (1999) An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. Proc Int Conf Intell Syst Mol Biol:262–271

    Google Scholar 

  • Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144

    Article  CAS  PubMed  Google Scholar 

  • van Helden J, Andre B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5):827–842

    Article  PubMed  Google Scholar 

  • Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG et al (2001) The sequence of the human genome. Science 291(5507):1304–1351

    Article  CAS  PubMed  Google Scholar 

  • Wasserman WW, Fickett JW (1998) Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol 278(1):167–181

    Article  CAS  PubMed  Google Scholar 

  • Wasserman WW, Sandelin A (2004) Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5(4):276–287

    Article  CAS  PubMed  Google Scholar 

  • Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915):520–562

    Article  CAS  PubMed  Google Scholar 

  • Wingender E, Dietze P, Karas H, Knuppel R (1996) TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res 24(1):238–241

    Article  CAS  PubMed  Google Scholar 

  • Zhu J, Zhang MQ (1999) SCPD: A promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15(7–8):607–611

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan Moses .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Moses, A., Sinha, S. (2009). Regulatory Motif Analysis. In: Edwards, D., Stajich, J., Hansen, D. (eds) Bioinformatics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-92738-1_7

Download citation

Publish with us

Policies and ethics