Skip to main content
Log in

Prediction of topological contacts in proteins using learning classifier systems

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Evolutionary based data mining techniques are increasingly applied to problems in the bioinformatics domain. We investigate an important aspect of predicting the folded 3D structure of proteins from their unfolded residue sequence using evolutionary based machine learning techniques. Our approach is to predict specific features of residues in folded protein chains, in particular features derived from the Delaunay tessellations, Gabriel graphs and relative neighborhood graphs as well as minimum spanning trees. Several standard machine learning algorithms were compared to a state-of-the-art learning method, a learning classifier system (LCS), that is capable of generating compact and interpretable rule sets. Predictions were performed for various degrees of precision using a range of experimental parameters. Examples of the rules obtained are presented. The LCS produces results with good predictive performance and generates competent yet simple and interpretable classification rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Angelov B, Sadoc JF, Jullien R, Soyer A, Mornon JP, Chomilier J (2002) Nonatomic solvent-driven voronoi tessellation of proteins: an open tool to analyze protein folds. Proteins 49: 446–456

    Article  Google Scholar 

  • Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23: 345–405

    Article  Google Scholar 

  • Bacardit J (2004) Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. Ph.D. thesis, Ramon Llull University, Barcelona, Catalonia, Spain

  • Bacardit J (2005) Analysis of the initialization stage of a pittsburgh approach learning classifier system. In: GECCO 2005: proceedings of the genetic and evolutionary computation conference, vol 2. ACM Press, New York, pp 1843–1850

  • Bacardit J, Krasnogor N (2006) Empirical evaluation of ensemble techniques for a pitssburgh learning classifier system. In: Proceedings of the 2006 international workshop on learning classifier systems

  • Bacardit J, Goldberg D, Butz M, Llorà X, Garrell JM (2004) Speeding-up pittsburgh learning classifier systems: modeling time and accuracy. In: Parallel problem solving from nature—PPSN 2004. LNCS, vol 3242. Springer, Heidelberg, pp 1021–1031

  • Bacardit J, Stout M, Krasnogor N, Hirst JD, Blazewicz J (2006) Coordination number prediction using learning classifier systems: performance and interpretability. In: Proceedings of the 8th annual conference on genetic and evolutionary computation (GECCO ’06). ACM Press, New York, NY, pp 247–254

  • Baldi P, Pollastri G (2002) A machine-learning strategy for protein analysis. IEEE Intell Syst 17: 28–35

    Google Scholar 

  • Baldi P, Pollastri G (2003) The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. J Mach Learn Res 4: 575–602

    Article  Google Scholar 

  • Barber C, Dobkin D, Huhdanpaa H (1996) . ACM Trans Math Softw 22: 469–483

    Article  MATH  MathSciNet  Google Scholar 

  • Birzele F, Gewehr JE, Csaba G, Zimmer R (2007) Vorolign-fast structural alignment using voronoi contacts. Bioinformatics 23: e205–e211

    Article  Google Scholar 

  • Bostick D, Vaisman II (2003) A new topological method to measure protein structure similarity. Biochem Biophys Res Commun 304: 320–325

    Article  Google Scholar 

  • Bostick DL, Shen M, Vaisman II (2004) A simple topological representation of protein structure: implications for new, fast, and robust structural classification. Proteins 56: 487–501

    Article  Google Scholar 

  • Branden C, Tooze J (1999) Introduction to protein structure, 2nd edn. Garland Publishers, New York

    Google Scholar 

  • Cazals F, Proust F, Bahadur RP, Janin J (2006) Revisiting the voronoi description of protein–protein interfaces. Protein Sci 15: 2082–2092

    Article  Google Scholar 

  • Cortes J (2006) Characterizing robust coordination algorithms via proximity graphs and set-valued maps. In: American Control Conference 2006, p 6

  • DeJong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Mach Learn 13: 161–188

    Article  Google Scholar 

  • Delaunay B (1934) Sur la sphere vide, izvestia akademii nauk sssr. Otdelenie Matematicheskikh i Estestvennykh Nauk 7

  • Dupuis F, Sadoc JF, Mornon JP (2004) Protein secondary structure assignment through voronoi tessellation. Proteins 55: 519–528

    Article  Google Scholar 

  • Dupuis F, Sadoc JF, Jullien R, Angelov B, Mornon JP (2005) Voro3d: 3d voronoi tessellations applied to protein structures. Bioinformatics 21: 1715–1716

    Article  Google Scholar 

  • Erwig M (2001) Inductive graphs and functional graph algorithms. J Funct Program 11: 467–492

    Article  MATH  MathSciNet  Google Scholar 

  • Gore SP, Burke DF, Blundell TL (2005) Provat: a tool for voronoi tessellation analysis of protein structures and complexes. Bioinformatics 21: 3316–3317

    Article  Google Scholar 

  • Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  • Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms. In: Hayes-Roth D, Waterman F(eds) Pattern-directed inference systems. Academic Press, New York, pp 313–329

    Google Scholar 

  • Huan J, Bandyopadhyay D, Wang W, Snoeyink J, Prins J, Tropsha A (2005) Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J Comput Biol 12: 657–671

    Article  Google Scholar 

  • Ilyin VA, Abyzov A, Leslin CM (2004) Structural alignment of proteins by a novel topofit method, as a superimposition of common volumes at a topomax point. Protein Sci 13: 1865–1874

    Article  Google Scholar 

  • Jaromczyk J, Toussaint G (1992) Relative neighborhood graphs and their relatives. P-IEEE 80: 1502–1517

    Article  Google Scholar 

  • Jonassen I, Klose D, Taylor WR (2006) Protein model refinement using structural fragment tessellation. Comput Biol Chem 30: 360–366

    Article  MATH  Google Scholar 

  • Jones D (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292: 195–202

    Article  Google Scholar 

  • Kinjo AR, Horimoto K, Nishikawa K (2005) Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58: 158–165

    Article  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the international joint conference on artificial intelligence, Morgan Kaufmann, pp 1137–1145

  • Livingstone CD, Barton GJ (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci 9: 745–756

    Google Scholar 

  • MacCallum R (2004) Striped sheets and protein contact prediction. Bioinformatics 20: I224–I231

    Article  Google Scholar 

  • Miller RG (1981) Simultaneous statistical inference. Springer, New York

    MATH  Google Scholar 

  • Miller S, Janin J, Lesk AM, Chothia C (1987) Interior and surface of monomeric proteins. J Mol Biol 196: 641–656

    Article  Google Scholar 

  • Munson PJ, Singh RK (1997) Statistical significance of hierarchical multi-body potentials based on delaunay tessellation and their application in sequence-structure alignment. Protein Sci 6: 1467–1481

    Article  Google Scholar 

  • Noguchi T, Matsuda H, Akiyama Y (2001) Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Res 29: 219–220

    Article  Google Scholar 

  • Orriols A, Bernado-Mansilla E (2005) The class imbalance problem in learning classifier systems: a preliminary study. In: GECCO ’05: proceedings of the 2005 workshops on genetic and evolutionary computation, ACM Press, New York, pp 74–78

  • Poupon A (2004) Voronoi and voronoi-related tessellations in studies of protein structure and interaction. Curr Opin Struct Biol 14: 233–241

    Article  Google Scholar 

  • Preparata FP (1985) Computational geometry: an introduction. In: Preparata FP, Shamos MI(eds) Texts and monographs in computer science. Springer, Heidelberg

    Google Scholar 

  • Punta M, Rost B (2005) Profcon: novel prediction of long-range contacts. Bioinformatics 21: 2960–2968

    Article  Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco

  • Richards FM (1974) The interpretation of protein structures: total volume, group volume distributions and packing density. J Mol Biol 82: 1–14

    Article  Google Scholar 

  • Rissanen J (1978) Modeling by shortest data description. Automatica 14: 465–471

    Article  MATH  Google Scholar 

  • Roach J, Sharma S, Kapustina M, Carter CWJ (2005) Structure alignment via delaunay tetrahedralization. Proteins 60: 66–81

    Article  Google Scholar 

  • Sander C, Schneider R (1991) Database of homology-derived protein structures. Proteins 9: 56–68

    Article  Google Scholar 

  • Singh RK, Tropsha A, Vaisman II (1996) Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J Comput Biol 3: 213–221

    Google Scholar 

  • Stout M, Bacardit J, Hirst JD, Krasnogor N, Blazewicz J (2006) From hp lattice models to real proteins: coordination number prediction using learning classifier systems. In: Rothlauf F, Branke J, Cagnoni S, Costa E, Cotta C, Drechsler R, Lutton E, Machado P, Moore J, Romero J, Smith G, Squillero G, Takagi H(eds) 4th European workshop on evolutionary computation and machine learning in bioinformatics. Springer, Berlin, pp 208–220

    Google Scholar 

  • Taylor TJ, Vaisman II (2006) Graph theoretic properties of networks formed by the delaunay tessellation of protein structures. Phys Rev E Stat Nonlin Soft Matter Phys 73: 041925

    Google Scholar 

  • Taylor T, Rivera M, Wilson G, Vaisman II (2005) New method for protein secondary structure assignment based on a simple topological descriptor. Proteins 60: 513–524

    Article  Google Scholar 

  • Toussaint G (1980) The relative neighbourhood graph of a finite planar set. Pattern Recogn 12: 261–268

    Article  MATH  MathSciNet  Google Scholar 

  • Voronoi GF (1908) Nouvelles applications des parametres continus a la theorie de formes quadratiques. J Reine Angew Math 134

  • Witten IH, Frank E (2000) Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Zimmer R, Wohler M, Thiele R (1998) New scoring schemes for protein fold recognition based on voronoi contacts. Bioinformatics 14: 295–308

    Article  Google Scholar 

  • Zhao Y, Karypis G (2003) Prediction of contact maps using support vector machines. In: Proceedings of the IEEE symposium on bioinformatics and bioengineering, IEEE Computer Society, pp 26–36

  • Zheng W, Cho SJ, Vaisman II, Tropsha A (1997) A new approach to protein fold recognition based on delaunay tessellation of protein structure. Pac Symp Biocomput 2: 486–497

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalio Krasnogor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stout, M., Bacardit, J., Hirst, J.D. et al. Prediction of topological contacts in proteins using learning classifier systems. Soft Comput 13, 245–258 (2009). https://doi.org/10.1007/s00500-008-0318-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-008-0318-8

Keywords

Navigation