Abstract
Controlling the differential expression of many thousands genes at any given time is a fundamental task of metazoan organisms and this complex orchestration is controlled by the so-called regulatory genome encoding complex regulatory networks. Cis-Regulatory Modules are fundamental units of such networks. To detect Cis-Regulatory Modules “in-silico” a key step is the discovery of recurrent clusters of DNA binding sites for sets of cooperating Transcription Factors. Composite motif is the term often adopted to refer to these clusters of sites. In this paper we describe CMF, a new efficient combinatorial method for the problem of detecting composite motifs, given in input a description of the binding affinities for a set of transcription factors. Testing with known benchmark data, we attain statistically significant better performance against nine state-of-the-art competing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Assuming the number \(N\) of sequences is clearly understood, we silently equate the fraction \(q\in (0,1]\) and the absolute number of sequences \(\lceil q\cdot N\rceil \).
- 2.
Even if not taken into consideration in this paper, CMF is also able to run a number of third-party motif discovery tools to “synthesize” PWMs.
- 3.
Currently, CMF invokes RSAT’s utility compare-matrices for this purpose [21], which uses pairwise normalized correlation
- 4.
Sometimes referred to as weak signals in the literature.
- 5.
In the following, we refer to [8] as to the assessment paper.
- 6.
Note that in the already cited paper by Tompa et al. [31], true negative predictions at the motif level are not considered.
References
Davidson, E.H.: The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, 1st edn. Academic Press, San Diego (2006)
Pavesi, G., Mauri, G., Pesole, G.: In silico representation and discovery of transcription factor binding sites. Brief. Bioinform. 5, 217–236 (2004)
Sandve, G.K., Drabløs, F.: A survey of motif discovery methods in an integrated framework. Biol. Direct. 1, 11 (2006)
Häußler, M., Nicolas, J.: Motif discovery on promotor sequences. Research report RR-5714, INRIA (2005)
Zambelli, F., Pesole, G., Pavesi, G.: Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief. Bioinf. (2012)
Wingender, E., et al.: Transfac: a database on transcription factors and their DNA binding sites. Nucl. Acids Res. 24, 238–241 (1996)
Sandelin, A., Alkema, W., Engström, P.G., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucl. Acids Res. 32, 91–94 (2004)
Klepper, K., Sandve, G., Abul, O., Johansen, J., Drabløs, F.: Assessment of composite motif discovery methods. BMC Bioinform. 9, 123 (2008)
Sinha, S.: Finding regulatory elements in genomic sequences. Ph.D. thesis, University of Washington (2002)
Van Loo, P., Marynen, P.: Computational methods for the detection of cis-regulatory modules. Brief. Bioinform. 10, 509–524 (2009)
Ivan, A., Halfon, M., Sinha, S.: Computational discovery of cis-regulatory modules in drosophila without prior knowledge of motifs. Genome Biol. 9, R22 (2008)
Federico, M., Leoncini, M., Montangero, M., Valente, P.: Direct vs 2-stage approaches to structured motif finding. Algorithms Mol. Biol. 7, 20 (2012)
Sandve, G., Abul, O., Drablos, F.: Compo: composite motif discovery using discrete models. BMC Bioinform. 9, 527 (2008)
Hu, J., Hu, H., Li, X.: Mopat: a graph-based method to predict recurrent cis-regulatory modules from known motifs. Nucl. Acids Res. 36, 4488–4497 (2008)
Nikulova, A.A., Favorov, A.V., Sutormin, R.A., Makeev, V.J., Mironov, A.A.: Coreclust: identification of the conserved CRM grammar together with prediction of gene regulation. Nucl. Acids Res. 40, e93 (2012). doi:10.1093/nar/gks235
Vavouri, T., Elgar, G.: Prediction of cis-regulatory elements using binding site matrices - the successes, the failures and the reasons for both. Curr. Opin. Genet. Develop. 15, 395–402 (2005)
Kel, A., Gößling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O., Wingender, E.: Matchtm: a tool for searching transcription factor binding sites in DNA sequences. Nucl. Acids Res. 31, 3576–3579 (2003)
Chen, Q.K., Hertz, G.Z., Stormo, G.D.: Matrix search 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comp. Appl. Biosci.: CABIOS 11, 563–566 (1995)
Prestridge, D.S.: Signal scan: a computer program that scans DNA sequences for eukaryotic transcriptional elements. Comp. Appl. Biosci.: CABIOS 7, 203–206 (1991)
Matys, V., et al.: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucl. Acids Res. 34, D108–D110 (2006)
Thomas-Chollier, M., et al.: RSAT: regulatory sequence analysis tools. Nucl. Acids Res. 36, W119–W127 (2008)
Uno, T.: Pce: Pseudo clique enumerator, ver. 1.0 (2006)
Zhou, Q., Wong, W.H.: Cismodule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. 101, 12114–12119 (2004)
Frith, M.C., Hansen, U., Weng, Z.: Detection of cis -element clusters in higher eukaryotic dna. Bioinformatics 17, 878–889 (2001)
Frith, M.C., Li, M.C., Weng, Z.: Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucl. Acids Res. 31, 3666–3668 (2003)
Kel, A., Konovalova, T., Waleev, T., Cheremushkin, E., Kel-Margoulis, O., Wingender, E.: Composite module analyst: a fitness-based tool for identification of transcription factor binding site combinations. Bioinformatics 22, 1190–1197 (2006)
Bailey, T.L., Noble, W.S.: Searching for statistically significant regulatory modules. Bioinformatics 19, ii16–ii25 (2003)
Aerts, S., Van Loo, P., Thijs, G., Moreau, Y., De Moor, B.: Computational detection of cis -regulatory modules. Bioinformatics 19, ii5–ii14 (2003)
Johansson, Ö., Alkema, W., Wasserman, W.W., Lagergren, J.: Identification of functional clusters of transcription factor binding motifs in genome sequences: the mscan algorithm. Bioinformatics 19, i169–i176 (2003)
Sinha, S., van Nimwegen, E., Siggia, E.D.: A probabilistic method to detect regulatory modules. Bioinformatics 19, i292–i301 (2003)
Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180, 2044–2064 (2010)
Acknowledgments
The present work is partially supported by the Flagship project InterOmics (PB.P05), funded by the Italian MIUR and CNR organizations, and by the joint IIT-IFC Lab for Integrative System Medicine (LISM).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leoncini, M., Montangero, M., Pellegrini, M., Tillán, K.P. (2013). CMF: A Combinatorial Tool to Find Composite Motifs. In: Nicosia, G., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2013. Lecture Notes in Computer Science(), vol 7997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44973-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-44973-4_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44972-7
Online ISBN: 978-3-642-44973-4
eBook Packages: Computer ScienceComputer Science (R0)