Abstract
Over the last several years there has been an explosion in the number of computational methods for the detection of transcription factor binding sites in DNA sequences. Although there has been some success in this field, the existing tools are still neither sensitive nor specific enough, usually suffering from the detection of a large number of false positive signals. Given the properties of genomic sequences this is not unexpected, but one can still find interesting features worthy of further computational and laboratory bench study. We present an efficient algorithm developed to find all significant variable motifs in given sequences. In our view, it is important that we generate complete data, upon which separate selection criteria can be applied depending on the nature of the sites one wants to locate, and their biological properties. We discuss our algorithm and our supplementary software, and conclude with an illustration of their application on two eukaryotic data sets.
Keywords
- Transcription Factor Binding Site
- Upstream Sequence
- Mixed Lineage Leukemia
- Variable Motif
- Positional Conservation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adebiyi, E.F., Jiang, T., Kaufmann, M.: An efficient algorithm for finding short approximate non–tandem repeats. Bioinformatics 17, S5–S12 (2001)
Apostolico, A., Bock, M.E., Lonardi, S., Xu, X.: Efficient detection of unusual words. J. Comput. Biol. 7, 71–94 (2000)
Balhoff, J.P., Wray, G.A.: Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites. PNAS 102, 8591–8596 (2005)
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: Efficiently finding regulatory elements using correlation with gene expression. J. Bioinform. Comput. Biol. 2, 273–288 (2004)
Birney, E., Andrews, D., Caccamo, M., et al.: Ensembl 2006. Nucleic Acids Res 34, D453–D561 (2006)
Burgermeister, E., Tencer, L., Liscovitch, M.: Peroxisome proliferator–activated receptor-γ upregulates Caveolin-1 and Caveolin-2 in human carcinoma cells. Oncogene 22, 3888–3900 (2003)
Che, D., Jensen, S., Cai, L., Liu, J.S.: BEST: Binding–site Estimation Suite of Tools. Bioinformatics 21, 2909–2911 (2005)
Corcoran, D.L., Feingold, E., Dominick, J., Wright, M., Harnaha, J., Trucco, M., Giannoukakis, N., Benos, P.V.: Footer: A quantitative comparative genomics method for efficient recognition of cis–regulatory elements. Genome Res 15, 840–847 (2005)
Hess, J.L.: MLL: a histone methyltransferase disrupted in leukemia. Trends Mol. Med. 10, 500–507 (2004)
Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis–regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)
Jegga, A.G., Sherwood, S.P., Carman, J.W., Pinski, A.T., Phillips, J.L., Pestian, J.P., Aronow, B.J.: Detection and visualization of compositionally similar cis–regulatory element clusters in orthologous and coordinately controlled genes. Genome Res 12, 1408–1417 (2002)
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs Sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Matys, V., Kel–Margoulis, O.V., Fricke, E., et al.: TRANSFAC®and its module TRANSCompel®: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34, D108–D110 (2006)
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W., Lenhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32, D91–D94 (2004)
Sharan, R., Ovcharenko, I., Ben–Hur, A., Karp, R.M.: CREME: a framework for identifying cis–regulatory modules in human–mouse conserved segments. In: Proc. of the 11th International Conf. on Intelligent Systems in Mol. Biol, pp. 283–291 (2003)
Singh, A., Stojanovic, N.: Computational Analysis of the Distribution of Short Repeated Motifs in Human Genomic Sequences. In: Proc. BIOT 2006 (to appear)
Stojanovic, N., Florea, L., Riemer, C., Gumucio, D., Slightom, J., Goodman, M., Miller, W., Hardison, R.: Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. Nucleic Acids Res 27, 3899–3910 (1999)
The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306 636–640 (2004)
Tompa, M., Li, N., Bailey, T.L., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)
van Helden, J., Andre, B., Collado–Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998)
van Helden, J.: Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics 20, 399–406 (2004)
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, A., Stojanovic, N. (2006). An Efficient Algorithm for the Identification of Repetitive Variable Motifs in the Regulatory Sequences of Co-expressed Genes. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds) Computer and Information Sciences – ISCIS 2006. ISCIS 2006. Lecture Notes in Computer Science, vol 4263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11902140_21
Download citation
DOI: https://doi.org/10.1007/11902140_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47242-1
Online ISBN: 978-3-540-47243-8
eBook Packages: Computer ScienceComputer Science (R0)