Skip to main content

CMF: A Combinatorial Tool to Find Composite Motifs

  • Conference paper
  • First Online:
Learning and Intelligent Optimization (LION 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7997))

Included in the following conference series:

Abstract

Controlling the differential expression of many thousands genes at any given time is a fundamental task of metazoan organisms and this complex orchestration is controlled by the so-called regulatory genome encoding complex regulatory networks. Cis-Regulatory Modules are fundamental units of such networks. To detect Cis-Regulatory Modules “in-silico” a key step is the discovery of recurrent clusters of DNA binding sites for sets of cooperating Transcription Factors. Composite motif is the term often adopted to refer to these clusters of sites. In this paper we describe CMF, a new efficient combinatorial method for the problem of detecting composite motifs, given in input a description of the binding affinities for a set of transcription factors. Testing with known benchmark data, we attain statistically significant better performance against nine state-of-the-art competing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Assuming the number \(N\) of sequences is clearly understood, we silently equate the fraction \(q\in (0,1]\) and the absolute number of sequences \(\lceil q\cdot N\rceil \).

  2. 2.

    Even if not taken into consideration in this paper, CMF is also able to run a number of third-party motif discovery tools to “synthesize” PWMs.

  3. 3.

    Currently, CMF invokes RSAT’s utility compare-matrices for this purpose [21], which uses pairwise normalized correlation

  4. 4.

    Sometimes referred to as weak signals in the literature.

  5. 5.

    In the following, we refer to [8] as to the assessment paper.

  6. 6.

    Note that in the already cited paper by Tompa et al. [31], true negative predictions at the motif level are not considered.

References

  1. Davidson, E.H.: The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, 1st edn. Academic Press, San Diego (2006)

    Google Scholar 

  2. Pavesi, G., Mauri, G., Pesole, G.: In silico representation and discovery of transcription factor binding sites. Brief. Bioinform. 5, 217–236 (2004)

    Article  Google Scholar 

  3. Sandve, G.K., Drabløs, F.: A survey of motif discovery methods in an integrated framework. Biol. Direct. 1, 11 (2006)

    Article  Google Scholar 

  4. Häußler, M., Nicolas, J.: Motif discovery on promotor sequences. Research report RR-5714, INRIA (2005)

    Google Scholar 

  5. Zambelli, F., Pesole, G., Pavesi, G.: Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief. Bioinf. (2012)

    Google Scholar 

  6. Wingender, E., et al.: Transfac: a database on transcription factors and their DNA binding sites. Nucl. Acids Res. 24, 238–241 (1996)

    Article  Google Scholar 

  7. Sandelin, A., Alkema, W., Engström, P.G., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucl. Acids Res. 32, 91–94 (2004)

    Article  Google Scholar 

  8. Klepper, K., Sandve, G., Abul, O., Johansen, J., Drabløs, F.: Assessment of composite motif discovery methods. BMC Bioinform. 9, 123 (2008)

    Article  Google Scholar 

  9. Sinha, S.: Finding regulatory elements in genomic sequences. Ph.D. thesis, University of Washington (2002)

    Google Scholar 

  10. Van Loo, P., Marynen, P.: Computational methods for the detection of cis-regulatory modules. Brief. Bioinform. 10, 509–524 (2009)

    Article  Google Scholar 

  11. Ivan, A., Halfon, M., Sinha, S.: Computational discovery of cis-regulatory modules in drosophila without prior knowledge of motifs. Genome Biol. 9, R22 (2008)

    Article  Google Scholar 

  12. Federico, M., Leoncini, M., Montangero, M., Valente, P.: Direct vs 2-stage approaches to structured motif finding. Algorithms Mol. Biol. 7, 20 (2012)

    Article  Google Scholar 

  13. Sandve, G., Abul, O., Drablos, F.: Compo: composite motif discovery using discrete models. BMC Bioinform. 9, 527 (2008)

    Article  Google Scholar 

  14. Hu, J., Hu, H., Li, X.: Mopat: a graph-based method to predict recurrent cis-regulatory modules from known motifs. Nucl. Acids Res. 36, 4488–4497 (2008)

    Article  Google Scholar 

  15. Nikulova, A.A., Favorov, A.V., Sutormin, R.A., Makeev, V.J., Mironov, A.A.: Coreclust: identification of the conserved CRM grammar together with prediction of gene regulation. Nucl. Acids Res. 40, e93 (2012). doi:10.1093/nar/gks235

    Article  Google Scholar 

  16. Vavouri, T., Elgar, G.: Prediction of cis-regulatory elements using binding site matrices - the successes, the failures and the reasons for both. Curr. Opin. Genet. Develop. 15, 395–402 (2005)

    Article  Google Scholar 

  17. Kel, A., Gößling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O., Wingender, E.: Matchtm: a tool for searching transcription factor binding sites in DNA sequences. Nucl. Acids Res. 31, 3576–3579 (2003)

    Article  Google Scholar 

  18. Chen, Q.K., Hertz, G.Z., Stormo, G.D.: Matrix search 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comp. Appl. Biosci.: CABIOS 11, 563–566 (1995)

    Google Scholar 

  19. Prestridge, D.S.: Signal scan: a computer program that scans DNA sequences for eukaryotic transcriptional elements. Comp. Appl. Biosci.: CABIOS 7, 203–206 (1991)

    Google Scholar 

  20. Matys, V., et al.: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucl. Acids Res. 34, D108–D110 (2006)

    Article  Google Scholar 

  21. Thomas-Chollier, M., et al.: RSAT: regulatory sequence analysis tools. Nucl. Acids Res. 36, W119–W127 (2008)

    Article  Google Scholar 

  22. Uno, T.: Pce: Pseudo clique enumerator, ver. 1.0 (2006)

    Google Scholar 

  23. Zhou, Q., Wong, W.H.: Cismodule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. 101, 12114–12119 (2004)

    Article  Google Scholar 

  24. Frith, M.C., Hansen, U., Weng, Z.: Detection of cis -element clusters in higher eukaryotic dna. Bioinformatics 17, 878–889 (2001)

    Article  Google Scholar 

  25. Frith, M.C., Li, M.C., Weng, Z.: Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucl. Acids Res. 31, 3666–3668 (2003)

    Article  Google Scholar 

  26. Kel, A., Konovalova, T., Waleev, T., Cheremushkin, E., Kel-Margoulis, O., Wingender, E.: Composite module analyst: a fitness-based tool for identification of transcription factor binding site combinations. Bioinformatics 22, 1190–1197 (2006)

    Article  Google Scholar 

  27. Bailey, T.L., Noble, W.S.: Searching for statistically significant regulatory modules. Bioinformatics 19, ii16–ii25 (2003)

    Article  Google Scholar 

  28. Aerts, S., Van Loo, P., Thijs, G., Moreau, Y., De Moor, B.: Computational detection of cis -regulatory modules. Bioinformatics 19, ii5–ii14 (2003)

    Article  Google Scholar 

  29. Johansson, Ö., Alkema, W., Wasserman, W.W., Lagergren, J.: Identification of functional clusters of transcription factor binding motifs in genome sequences: the mscan algorithm. Bioinformatics 19, i169–i176 (2003)

    Article  Google Scholar 

  30. Sinha, S., van Nimwegen, E., Siggia, E.D.: A probabilistic method to detect regulatory modules. Bioinformatics 19, i292–i301 (2003)

    Article  Google Scholar 

  31. Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)

    Article  Google Scholar 

  32. García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180, 2044–2064 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

The present work is partially supported by the Flagship project InterOmics (PB.P05), funded by the Italian MIUR and CNR organizations, and by the joint IIT-IFC Lab for Integrative System Medicine (LISM).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mauro Leoncini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leoncini, M., Montangero, M., Pellegrini, M., Tillán, K.P. (2013). CMF: A Combinatorial Tool to Find Composite Motifs. In: Nicosia, G., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2013. Lecture Notes in Computer Science(), vol 7997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44973-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-44973-4_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-44972-7

  • Online ISBN: 978-3-642-44973-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics