Prediction of topological contacts in proteins using learning classifier systems

Stout, Michael; Bacardit, Jaume; Hirst, Jonathan D.; Smith, Robert E.; Krasnogor, Natalio

doi:10.1007/s00500-008-0318-8

Prediction of topological contacts in proteins using learning classifier systems

Focus
Published: 03 July 2008

Volume 13, pages 245–258, (2009)
Cite this article

Soft Computing Aims and scope Submit manuscript

Michael Stout¹,
Jaume Bacardit¹,
Jonathan D. Hirst²,
Robert E. Smith³ &
…
Natalio Krasnogor¹

148 Accesses
21 Citations
3 Altmetric
Explore all metrics

Abstract

Evolutionary based data mining techniques are increasingly applied to problems in the bioinformatics domain. We investigate an important aspect of predicting the folded 3D structure of proteins from their unfolded residue sequence using evolutionary based machine learning techniques. Our approach is to predict specific features of residues in folded protein chains, in particular features derived from the Delaunay tessellations, Gabriel graphs and relative neighborhood graphs as well as minimum spanning trees. Several standard machine learning algorithms were compared to a state-of-the-art learning method, a learning classifier system (LCS), that is capable of generating compact and interpretable rule sets. Predictions were performed for various degrees of precision using a range of experimental parameters. Examples of the rules obtained are presented. The LCS produces results with good predictive performance and generates competent yet simple and interpretable classification rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

References

Angelov B, Sadoc JF, Jullien R, Soyer A, Mornon JP, Chomilier J (2002) Nonatomic solvent-driven voronoi tessellation of proteins: an open tool to analyze protein folds. Proteins 49: 446–456
Article Google Scholar
Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23: 345–405
Article Google Scholar
Bacardit J (2004) Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. Ph.D. thesis, Ramon Llull University, Barcelona, Catalonia, Spain
Bacardit J (2005) Analysis of the initialization stage of a pittsburgh approach learning classifier system. In: GECCO 2005: proceedings of the genetic and evolutionary computation conference, vol 2. ACM Press, New York, pp 1843–1850
Bacardit J, Krasnogor N (2006) Empirical evaluation of ensemble techniques for a pitssburgh learning classifier system. In: Proceedings of the 2006 international workshop on learning classifier systems
Bacardit J, Goldberg D, Butz M, Llorà X, Garrell JM (2004) Speeding-up pittsburgh learning classifier systems: modeling time and accuracy. In: Parallel problem solving from nature—PPSN 2004. LNCS, vol 3242. Springer, Heidelberg, pp 1021–1031
Bacardit J, Stout M, Krasnogor N, Hirst JD, Blazewicz J (2006) Coordination number prediction using learning classifier systems: performance and interpretability. In: Proceedings of the 8th annual conference on genetic and evolutionary computation (GECCO ’06). ACM Press, New York, NY, pp 247–254
Baldi P, Pollastri G (2002) A machine-learning strategy for protein analysis. IEEE Intell Syst 17: 28–35
Google Scholar
Baldi P, Pollastri G (2003) The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. J Mach Learn Res 4: 575–602
Article Google Scholar
Barber C, Dobkin D, Huhdanpaa H (1996) . ACM Trans Math Softw 22: 469–483
Article MATH MathSciNet Google Scholar
Birzele F, Gewehr JE, Csaba G, Zimmer R (2007) Vorolign-fast structural alignment using voronoi contacts. Bioinformatics 23: e205–e211
Article Google Scholar
Bostick D, Vaisman II (2003) A new topological method to measure protein structure similarity. Biochem Biophys Res Commun 304: 320–325
Article Google Scholar
Bostick DL, Shen M, Vaisman II (2004) A simple topological representation of protein structure: implications for new, fast, and robust structural classification. Proteins 56: 487–501
Article Google Scholar
Branden C, Tooze J (1999) Introduction to protein structure, 2nd edn. Garland Publishers, New York
Google Scholar
Cazals F, Proust F, Bahadur RP, Janin J (2006) Revisiting the voronoi description of protein–protein interfaces. Protein Sci 15: 2082–2092
Article Google Scholar
Cortes J (2006) Characterizing robust coordination algorithms via proximity graphs and set-valued maps. In: American Control Conference 2006, p 6
DeJong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Mach Learn 13: 161–188
Article Google Scholar
Delaunay B (1934) Sur la sphere vide, izvestia akademii nauk sssr. Otdelenie Matematicheskikh i Estestvennykh Nauk 7
Dupuis F, Sadoc JF, Mornon JP (2004) Protein secondary structure assignment through voronoi tessellation. Proteins 55: 519–528
Article Google Scholar
Dupuis F, Sadoc JF, Jullien R, Angelov B, Mornon JP (2005) Voro3d: 3d voronoi tessellations applied to protein structures. Bioinformatics 21: 1715–1716
Article Google Scholar
Erwig M (2001) Inductive graphs and functional graph algorithms. J Funct Program 11: 467–492
Article MATH MathSciNet Google Scholar
Gore SP, Burke DF, Blundell TL (2005) Provat: a tool for voronoi tessellation analysis of protein structures and complexes. Bioinformatics 21: 3316–3317
Article Google Scholar
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Google Scholar
Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms. In: Hayes-Roth D, Waterman F(eds) Pattern-directed inference systems. Academic Press, New York, pp 313–329
Google Scholar
Huan J, Bandyopadhyay D, Wang W, Snoeyink J, Prins J, Tropsha A (2005) Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J Comput Biol 12: 657–671
Article Google Scholar
Ilyin VA, Abyzov A, Leslin CM (2004) Structural alignment of proteins by a novel topofit method, as a superimposition of common volumes at a topomax point. Protein Sci 13: 1865–1874
Article Google Scholar
Jaromczyk J, Toussaint G (1992) Relative neighborhood graphs and their relatives. P-IEEE 80: 1502–1517
Article Google Scholar
Jonassen I, Klose D, Taylor WR (2006) Protein model refinement using structural fragment tessellation. Comput Biol Chem 30: 360–366
Article MATH Google Scholar
Jones D (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292: 195–202
Article Google Scholar
Kinjo AR, Horimoto K, Nishikawa K (2005) Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58: 158–165
Article Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the international joint conference on artificial intelligence, Morgan Kaufmann, pp 1137–1145
Livingstone CD, Barton GJ (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci 9: 745–756
Google Scholar
MacCallum R (2004) Striped sheets and protein contact prediction. Bioinformatics 20: I224–I231
Article Google Scholar
Miller RG (1981) Simultaneous statistical inference. Springer, New York
MATH Google Scholar
Miller S, Janin J, Lesk AM, Chothia C (1987) Interior and surface of monomeric proteins. J Mol Biol 196: 641–656
Article Google Scholar
Munson PJ, Singh RK (1997) Statistical significance of hierarchical multi-body potentials based on delaunay tessellation and their application in sequence-structure alignment. Protein Sci 6: 1467–1481
Article Google Scholar
Noguchi T, Matsuda H, Akiyama Y (2001) Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Res 29: 219–220
Article Google Scholar
Orriols A, Bernado-Mansilla E (2005) The class imbalance problem in learning classifier systems: a preliminary study. In: GECCO ’05: proceedings of the 2005 workshops on genetic and evolutionary computation, ACM Press, New York, pp 74–78
Poupon A (2004) Voronoi and voronoi-related tessellations in studies of protein structure and interaction. Curr Opin Struct Biol 14: 233–241
Article Google Scholar
Preparata FP (1985) Computational geometry: an introduction. In: Preparata FP, Shamos MI(eds) Texts and monographs in computer science. Springer, Heidelberg
Google Scholar
Punta M, Rost B (2005) Profcon: novel prediction of long-range contacts. Bioinformatics 21: 2960–2968
Article Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Richards FM (1974) The interpretation of protein structures: total volume, group volume distributions and packing density. J Mol Biol 82: 1–14
Article Google Scholar
Rissanen J (1978) Modeling by shortest data description. Automatica 14: 465–471
Article MATH Google Scholar
Roach J, Sharma S, Kapustina M, Carter CWJ (2005) Structure alignment via delaunay tetrahedralization. Proteins 60: 66–81
Article Google Scholar
Sander C, Schneider R (1991) Database of homology-derived protein structures. Proteins 9: 56–68
Article Google Scholar
Singh RK, Tropsha A, Vaisman II (1996) Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J Comput Biol 3: 213–221
Google Scholar
Stout M, Bacardit J, Hirst JD, Krasnogor N, Blazewicz J (2006) From hp lattice models to real proteins: coordination number prediction using learning classifier systems. In: Rothlauf F, Branke J, Cagnoni S, Costa E, Cotta C, Drechsler R, Lutton E, Machado P, Moore J, Romero J, Smith G, Squillero G, Takagi H(eds) 4th European workshop on evolutionary computation and machine learning in bioinformatics. Springer, Berlin, pp 208–220
Google Scholar
Taylor TJ, Vaisman II (2006) Graph theoretic properties of networks formed by the delaunay tessellation of protein structures. Phys Rev E Stat Nonlin Soft Matter Phys 73: 041925
Google Scholar
Taylor T, Rivera M, Wilson G, Vaisman II (2005) New method for protein secondary structure assignment based on a simple topological descriptor. Proteins 60: 513–524
Article Google Scholar
Toussaint G (1980) The relative neighbourhood graph of a finite planar set. Pattern Recogn 12: 261–268
Article MATH MathSciNet Google Scholar
Voronoi GF (1908) Nouvelles applications des parametres continus a la theorie de formes quadratiques. J Reine Angew Math 134
Witten IH, Frank E (2000) Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco
Google Scholar
Zimmer R, Wohler M, Thiele R (1998) New scoring schemes for protein fold recognition based on voronoi contacts. Bioinformatics 14: 295–308
Article Google Scholar
Zhao Y, Karypis G (2003) Prediction of contact maps using support vector machines. In: Proceedings of the IEEE symposium on bioinformatics and bioengineering, IEEE Computer Society, pp 26–36
Zheng W, Cho SJ, Vaisman II, Tropsha A (1997) A new approach to protein fold recognition based on delaunay tessellation of protein structure. Pac Symp Biocomput 2: 486–497
Google Scholar

Download references

Author information

Authors and Affiliations

Automated Scheduling, Optimization and Planning Research Group, School of Computer Science and IT, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK
Michael Stout, Jaume Bacardit & Natalio Krasnogor
School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
Jonathan D. Hirst
Intelligent Systems Group, Computer Science Department, University College London, Gower Street, London, WC1E 6BT, UK
Robert E. Smith

Authors

Michael Stout
View author publications
You can also search for this author in PubMed Google Scholar
Jaume Bacardit
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan D. Hirst
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Natalio Krasnogor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalio Krasnogor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stout, M., Bacardit, J., Hirst, J.D. et al. Prediction of topological contacts in proteins using learning classifier systems. Soft Comput 13, 245–258 (2009). https://doi.org/10.1007/s00500-008-0318-8

Download citation

Published: 03 July 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s00500-008-0318-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of topological contacts in proteins using learning classifier systems

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Prediction of topological contacts in proteins using learning classifier systems

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation