Abstract
Evolutionary based data mining techniques are increasingly applied to problems in the bioinformatics domain. We investigate an important aspect of predicting the folded 3D structure of proteins from their unfolded residue sequence using evolutionary based machine learning techniques. Our approach is to predict specific features of residues in folded protein chains, in particular features derived from the Delaunay tessellations, Gabriel graphs and relative neighborhood graphs as well as minimum spanning trees. Several standard machine learning algorithms were compared to a state-of-the-art learning method, a learning classifier system (LCS), that is capable of generating compact and interpretable rule sets. Predictions were performed for various degrees of precision using a range of experimental parameters. Examples of the rules obtained are presented. The LCS produces results with good predictive performance and generates competent yet simple and interpretable classification rules.
Similar content being viewed by others
References
Angelov B, Sadoc JF, Jullien R, Soyer A, Mornon JP, Chomilier J (2002) Nonatomic solvent-driven voronoi tessellation of proteins: an open tool to analyze protein folds. Proteins 49: 446–456
Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23: 345–405
Bacardit J (2004) Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. Ph.D. thesis, Ramon Llull University, Barcelona, Catalonia, Spain
Bacardit J (2005) Analysis of the initialization stage of a pittsburgh approach learning classifier system. In: GECCO 2005: proceedings of the genetic and evolutionary computation conference, vol 2. ACM Press, New York, pp 1843–1850
Bacardit J, Krasnogor N (2006) Empirical evaluation of ensemble techniques for a pitssburgh learning classifier system. In: Proceedings of the 2006 international workshop on learning classifier systems
Bacardit J, Goldberg D, Butz M, Llorà X, Garrell JM (2004) Speeding-up pittsburgh learning classifier systems: modeling time and accuracy. In: Parallel problem solving from nature—PPSN 2004. LNCS, vol 3242. Springer, Heidelberg, pp 1021–1031
Bacardit J, Stout M, Krasnogor N, Hirst JD, Blazewicz J (2006) Coordination number prediction using learning classifier systems: performance and interpretability. In: Proceedings of the 8th annual conference on genetic and evolutionary computation (GECCO ’06). ACM Press, New York, NY, pp 247–254
Baldi P, Pollastri G (2002) A machine-learning strategy for protein analysis. IEEE Intell Syst 17: 28–35
Baldi P, Pollastri G (2003) The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. J Mach Learn Res 4: 575–602
Barber C, Dobkin D, Huhdanpaa H (1996) . ACM Trans Math Softw 22: 469–483
Birzele F, Gewehr JE, Csaba G, Zimmer R (2007) Vorolign-fast structural alignment using voronoi contacts. Bioinformatics 23: e205–e211
Bostick D, Vaisman II (2003) A new topological method to measure protein structure similarity. Biochem Biophys Res Commun 304: 320–325
Bostick DL, Shen M, Vaisman II (2004) A simple topological representation of protein structure: implications for new, fast, and robust structural classification. Proteins 56: 487–501
Branden C, Tooze J (1999) Introduction to protein structure, 2nd edn. Garland Publishers, New York
Cazals F, Proust F, Bahadur RP, Janin J (2006) Revisiting the voronoi description of protein–protein interfaces. Protein Sci 15: 2082–2092
Cortes J (2006) Characterizing robust coordination algorithms via proximity graphs and set-valued maps. In: American Control Conference 2006, p 6
DeJong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Mach Learn 13: 161–188
Delaunay B (1934) Sur la sphere vide, izvestia akademii nauk sssr. Otdelenie Matematicheskikh i Estestvennykh Nauk 7
Dupuis F, Sadoc JF, Mornon JP (2004) Protein secondary structure assignment through voronoi tessellation. Proteins 55: 519–528
Dupuis F, Sadoc JF, Jullien R, Angelov B, Mornon JP (2005) Voro3d: 3d voronoi tessellations applied to protein structures. Bioinformatics 21: 1715–1716
Erwig M (2001) Inductive graphs and functional graph algorithms. J Funct Program 11: 467–492
Gore SP, Burke DF, Blundell TL (2005) Provat: a tool for voronoi tessellation analysis of protein structures and complexes. Bioinformatics 21: 3316–3317
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms. In: Hayes-Roth D, Waterman F(eds) Pattern-directed inference systems. Academic Press, New York, pp 313–329
Huan J, Bandyopadhyay D, Wang W, Snoeyink J, Prins J, Tropsha A (2005) Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J Comput Biol 12: 657–671
Ilyin VA, Abyzov A, Leslin CM (2004) Structural alignment of proteins by a novel topofit method, as a superimposition of common volumes at a topomax point. Protein Sci 13: 1865–1874
Jaromczyk J, Toussaint G (1992) Relative neighborhood graphs and their relatives. P-IEEE 80: 1502–1517
Jonassen I, Klose D, Taylor WR (2006) Protein model refinement using structural fragment tessellation. Comput Biol Chem 30: 360–366
Jones D (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292: 195–202
Kinjo AR, Horimoto K, Nishikawa K (2005) Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58: 158–165
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the international joint conference on artificial intelligence, Morgan Kaufmann, pp 1137–1145
Livingstone CD, Barton GJ (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci 9: 745–756
MacCallum R (2004) Striped sheets and protein contact prediction. Bioinformatics 20: I224–I231
Miller RG (1981) Simultaneous statistical inference. Springer, New York
Miller S, Janin J, Lesk AM, Chothia C (1987) Interior and surface of monomeric proteins. J Mol Biol 196: 641–656
Munson PJ, Singh RK (1997) Statistical significance of hierarchical multi-body potentials based on delaunay tessellation and their application in sequence-structure alignment. Protein Sci 6: 1467–1481
Noguchi T, Matsuda H, Akiyama Y (2001) Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Res 29: 219–220
Orriols A, Bernado-Mansilla E (2005) The class imbalance problem in learning classifier systems: a preliminary study. In: GECCO ’05: proceedings of the 2005 workshops on genetic and evolutionary computation, ACM Press, New York, pp 74–78
Poupon A (2004) Voronoi and voronoi-related tessellations in studies of protein structure and interaction. Curr Opin Struct Biol 14: 233–241
Preparata FP (1985) Computational geometry: an introduction. In: Preparata FP, Shamos MI(eds) Texts and monographs in computer science. Springer, Heidelberg
Punta M, Rost B (2005) Profcon: novel prediction of long-range contacts. Bioinformatics 21: 2960–2968
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Richards FM (1974) The interpretation of protein structures: total volume, group volume distributions and packing density. J Mol Biol 82: 1–14
Rissanen J (1978) Modeling by shortest data description. Automatica 14: 465–471
Roach J, Sharma S, Kapustina M, Carter CWJ (2005) Structure alignment via delaunay tetrahedralization. Proteins 60: 66–81
Sander C, Schneider R (1991) Database of homology-derived protein structures. Proteins 9: 56–68
Singh RK, Tropsha A, Vaisman II (1996) Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J Comput Biol 3: 213–221
Stout M, Bacardit J, Hirst JD, Krasnogor N, Blazewicz J (2006) From hp lattice models to real proteins: coordination number prediction using learning classifier systems. In: Rothlauf F, Branke J, Cagnoni S, Costa E, Cotta C, Drechsler R, Lutton E, Machado P, Moore J, Romero J, Smith G, Squillero G, Takagi H(eds) 4th European workshop on evolutionary computation and machine learning in bioinformatics. Springer, Berlin, pp 208–220
Taylor TJ, Vaisman II (2006) Graph theoretic properties of networks formed by the delaunay tessellation of protein structures. Phys Rev E Stat Nonlin Soft Matter Phys 73: 041925
Taylor T, Rivera M, Wilson G, Vaisman II (2005) New method for protein secondary structure assignment based on a simple topological descriptor. Proteins 60: 513–524
Toussaint G (1980) The relative neighbourhood graph of a finite planar set. Pattern Recogn 12: 261–268
Voronoi GF (1908) Nouvelles applications des parametres continus a la theorie de formes quadratiques. J Reine Angew Math 134
Witten IH, Frank E (2000) Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco
Zimmer R, Wohler M, Thiele R (1998) New scoring schemes for protein fold recognition based on voronoi contacts. Bioinformatics 14: 295–308
Zhao Y, Karypis G (2003) Prediction of contact maps using support vector machines. In: Proceedings of the IEEE symposium on bioinformatics and bioengineering, IEEE Computer Society, pp 26–36
Zheng W, Cho SJ, Vaisman II, Tropsha A (1997) A new approach to protein fold recognition based on delaunay tessellation of protein structure. Pac Symp Biocomput 2: 486–497
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Stout, M., Bacardit, J., Hirst, J.D. et al. Prediction of topological contacts in proteins using learning classifier systems. Soft Comput 13, 245–258 (2009). https://doi.org/10.1007/s00500-008-0318-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-008-0318-8