Abstract
A common way to characterise important and conserved signals in nucleotide sequences, such as transcription factor binding sites, is via the use of so-called consensus sequences or consensus patterns. A well-known example is the so-called “TATA-box” commonly found in eukaryotic core promoters. Such patterns are valuablein that they offer an insight into basic molecular biology processes, and can support reasoning regarding the understanding, design and control of these processes. However it is rare for such patterns to be accurate; instead they represent a very approximate characterisation of the signal under study. At the opposite extreme, we may instead characterise such a signal via a neural network, or a high-order Markov model, and so on. These have better sensitivity and specificity, but are unreadable, and consequently unhelpful for conveying an understanding of the underlying molecular biology processes that could support insight or reasoning. We describe a simple pattern language, called crisp hypermotifs (CHMs), that leads to highly readable patterns that can support understanding and reasoning, yet achieve greater sensitivity and specificity than the commonly used approaches to crisply characterise a signal. We use evolutionary computation to discover high-performance CHMs from data, and we argue that CHMs be used in place of classical consensus motifs, and justify that by presenting examples derived from a large dataset of mammalian core promoters. We provide CHM alternatives to the well-known core promoter TATA-box and Initiator patterns that have better sensitivity and specificity than their classical counterparts.
Keywords
- Core Promoter
- Matthews Correlation Coefficient
- Core Promoter Region
- Consensus Pattern
- Core Promoter Sequence
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Edgington, E.S: Randomisation Testing. Marcel Dekker, New York (1995)
Eskin, E., Keich, U., Gelfand, M.S., Pevzner, P.: Genome-wide analysis of bacterial promoter regions. In: Proc. 8th Pac. Symp. Biocomp., Kauai, Hawaii, January 3-7 2003, pp. 29–40. ISCB (2003)
Fogel, L.J., Owens, A.J., Walsh, M.J.: Artificial Intelligence Through Simulated Evolution. John Wiley, New York (1966)
Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading (1989)
Henderson, J., Salzberg, S., Fasman, K.H.: Finding Genes in DNA with a Hidden Markov Model. Journal of Computational Biology 4(2), 127–142 (1997)
Holland, J.H: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975)
De Jong, K.A.: An analysis of the bevavior of a class of genetic adaptive systems. PhD thesis, University of Michigan (1975)
Kanhere, A., Bansal, M.: A novel method for prokaryotic promoter prediction based on DNA stability. BMC Bioinformatics 6(1) (2005)
Matthews, B.W.: Biochim. Biophys. Acta 405, 442–451 (1975)
Ohler, U., Niemann, H., Liao, G., Rubin, G.M.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(Suppl. 1), S199–206 (2001)
Pridgeon, C., Corne, D.: Novel Discriminatory Patterns for Nucleotide Sequences and their Application to Core Promoter Prediction in Eukaryotes. In: Proc. CIBCB 05, pp. 1–7. IEEE Computer Society Press, Los Alamitos (2005)
Reese, M.G.: Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput. Chem. 26, 51–56 (2001)
Salzberg, S.L., Delcher, A.L., Kasif, S., White, O.: Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26(2), 544–548 (1998)
Schwefel, H.-P.: Numerical Optimization of Computer Models. John Wiley, Chichester (1981)
Syswerda, G.: A Study of Reproduction in Generational and Steady State Genetic Algorithms. In: FOGA, pp. 94–101 (1990)
Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lengauer, T., Muller, K.-R.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16(9), 799–807 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Pridgeon, C., Corne, D. (2007). Characterising DNA/RNA Signals with Crisp Hypermotifs: A Case Study on Core Promoters. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds) Evolutionary Computation,Machine Learning and Data Mining in Bioinformatics. EvoBIO 2007. Lecture Notes in Computer Science, vol 4447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71783-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-71783-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71782-9
Online ISBN: 978-3-540-71783-6
eBook Packages: Computer ScienceComputer Science (R0)