HMMs in Protein Fold Classification

Lampros, Christos; Papaloukas, Costas; Exarchos, Themis; Fotiadis, Dimitrios I.

doi:10.1007/978-1-4939-6753-7_2

HMMs in Protein Fold Classification

Christos Lampros⁴,
Costas Papaloukas⁵,
Themis Exarchos⁴ &
…
Dimitrios I. Fotiadis⁴

Protocol
First Online: 22 February 2017

2496 Accesses
2 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1552))

Abstract

The limitation of most HMMs is their inherent high dimensionality. Therefore we developed several variations of low complexity models that can be applied even to protein families with a few members. In this chapter we present these variations. All of them include the use of a hidden Markov model (HMM), with a small number of states (called reduced state-space HMM), which is trained with both amino acid sequence and secondary structure of proteins whose 3D structure is known and it is used for protein fold classification. We used data from Protein Data Bank and annotation from SCOP database for training and evaluation of the proposed HMM variations for a number of protein folds that belong to major structural classes. Results indicate that the variations have similar performance, or even better in some cases, on classifying proteins than SAM, which is a widely used HMM-based method for protein classification. The major advantage of the proposed variations is that we employed a small number of states and the algorithms used for training and scoring are of low complexity and thus relatively fast. The main variations examined include a version of the reduced state-space HMM with seven states (7-HMM), a version of the reduced state-space HMM with three states (3-HMM) and an optimized version of the reduced state-space HMM with three states, where an optimization process is applied to its scores (optimized 3-HMM).

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Whitford D (2005) Proteins: structure and function. John Wiley & Sons, NJ, USA
Google Scholar
Lee SY, Lee JY, Jung KS, Ryu KH (2009) A 9-state hidden Markov model using protein secondary structure information for protein fold recognition. Comp Biol Med 39(6):527–534
Article CAS Google Scholar
Camproux A, Guyon F, Gautier R, Laffray J, Tuffery P (2005) A hidden Markov model applied to the analysis of protein 3D-structures. in: Proc. int. symp. applied stochastic models and data analysis
Google Scholar
Orengo CA, Jones DT, Thornton JM (2003) Bioinformatics: genes, proteins and computers. Bios Scientific Pub. Ltd, Oxford
Google Scholar
Zhang Y, Skolnick J (2005) The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci U S A 102(4):1029–1034
Article CAS PubMed PubMed Central Google Scholar
Hargbo J, Elofsson A (1999) Hidden Markov models that use predicted secondary structures for fold recognition. Proteins 36(1):68–76
Article CAS PubMed Google Scholar
Lampros C, Simos T, Exarchos TP, Exarchos KP, Papaloukas C, Fotiadis DI (2014) Assessment of optimized Markov models in protein fold classification. J Bioinform Comput Biol 12(4):1450016
Article PubMed Google Scholar
Murzin AG (1999) Structure classification based assessment of CASP3 predictions for the fold recognition targets. Proteins (Suppl 3):88–103
Google Scholar
Orengo CA, Bray JE, Hubbard T, LoConte L, Sillitoe I (1999) Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction. Proteins 37:149–170
Article Google Scholar
Zhang Y (2008) Progress and challenges in protein structure prediction. Curr Opin Struct Biol 18(3):342–348
Article CAS PubMed PubMed Central Google Scholar
Zhou Y, Duan Y, Yang Y, Farragi E, Lei H (2011) Trends in template/fragment-free protein structure prediction. Theor Chem Acc 128(1):3–16
Article CAS PubMed Google Scholar
Maurice KJ et al (2014) SSThread: template-free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs. J Comput Chem 35(8):644–656
Article CAS PubMed Google Scholar
Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequence that fold into a known three-dimensional structure. Science 253:164–170
Article CAS PubMed Google Scholar
Flockner H, Domingues F, Sippl MJ (1997) Proteins folds from pair interactions: a blind test into fold recognition. Proteins 1:129–133
Article PubMed Google Scholar
Xu J (2005) Fold recognition by predicted alignment accuracy. IEEE/ACM Trans Comput Biol Bioinform 2(2):157–165
Article CAS PubMed Google Scholar
Sander O, Sommer I, Lengauer T (2006) Local protein structure prediction using discriminative models. BMC Bioinformatics (7):14
Google Scholar
Hu Y, Dong X, Wu A, Cao Y, Tian L, Jiang T (2011) Incorporation of local structural preference potential improves fold recognition. PLoS One 6(2):e17215
Article CAS PubMed PubMed Central Google Scholar
Mahajan S, De Brevern AG, Sanejouand YH, Srinivasan N, Offmann B (2015) Use of a structural alphabet to find compatible folds for amino acid sequences. Protein Sci 24(1):145–153
Article CAS PubMed Google Scholar
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39(Suppl 2):W29–W37
Article CAS PubMed PubMed Central Google Scholar
Karplus K, Karchin R, Shackelford G, Hughey R (2005) Calibrating E-values for hidden Markov models using reverse-sequence null models. Bioinformatics 21:4107–4115
Article CAS PubMed Google Scholar
Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K (2003) Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins 51:504–514
Article CAS PubMed Google Scholar
Dandekar T, Argos P (1996) Identifying the tertiary fold of small proteins with diferent topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions. J Mol Biol 256:645–660
Article CAS PubMed Google Scholar
Zangoei MH, Jalili S (2013) Protein fold recognition with a two-layer method based on SVM–SA, WP–NN and C4. 5 (TLM–SNC). Int J Data Mining Bioinform 8(2):203–223
Article Google Scholar
Deschavanne P, Tuffery P (2009) Enhanced protein fold recognition using a structural alphabet. Proteins 76:129–137
Article CAS PubMed Google Scholar
Chmielnicki W, Stapor K (2012) A hybrid discriminative/generative approach to protein fold recognition. Neurocomputing 75(1):194–198
Article Google Scholar
Exarchos TP, Papaloukas C, Lampros C, Fotiadis DI (2008) Mining sequential patterns for protein fold recognition. J Biomed Inform 41(1):165–179
Article CAS PubMed Google Scholar
Tsai CY, Chen CJ, (2015) A PSOAB classifier for solving sequence classification problems. Appl Soft Comput 27(C):11–27
Google Scholar
Valavanis I, Spyrou G, Nikita K (2010) A similarity network approach for the analysis and comparison of protein sequence/structure sets. J Biomed Inform 43(2):257–267
Article CAS PubMed Google Scholar
Abbasi E, Mehdi G, Shiri ME (2013) FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds. Comput Biol Med 43(9):1182–1191
Article CAS PubMed Google Scholar
Lampros C, Papaloukas C, Exarchos TP, Goletsis Y, Fotiadis DI (2007) Sequence-based protein structure prediction using a reduced state-space hidden Markov model. Comput Biol Med 37:1211–1224
Article CAS PubMed Google Scholar
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, New York
Book Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637
Article CAS PubMed Google Scholar
Baum LE (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3:1–8
Google Scholar
Lampros C, Papaloukas C, Exarchos K, Fotiadis DI, Tsalikakis D (2009) Improving the protein fold recognition accuracy of a reduced state-space hidden Markov model. Comput Biol Med 39:907–914
Article CAS PubMed Google Scholar
Lagarias JC, Reeds JA, Wright MH, Wright PE (1998) Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J Optim 9(1):112–147
Article Google Scholar
Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE (2004) The ASTRAL compendium in 2004. Nucleic Acids Res 32(Database issue):D189–D192
Article CAS PubMed PubMed Central Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
Article CAS PubMed PubMed Central Google Scholar
Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32(Database issue):D226–D229
Article CAS PubMed PubMed Central Google Scholar
Machado-Lima A, Kashiwabara AY, Durham AM (2010) Decreasing the number of false positives in sequence classification. BMC Genomics 22(11 Suppl 5):S10
Article Google Scholar
Jones DT (1999) Protein secondary structure prediction based on position specific scoring matrices. J Mol Biol 292:195–202
Article CAS PubMed Google Scholar
Lin HN, Sung TY, Ho SY, Hsu WL (2010) Improving protein secondary structure prediction based on short subsequences with local structure similarity. BMC Genomics 2(Suppl 4):S4
Article Google Scholar

Download references

Author information

Authors and Affiliations

Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, University Campus of Ioannina, GR45110, Ioannina, Greece
Christos Lampros, Themis Exarchos & Dimitrios I. Fotiadis
Department of Biological Applications and Technology, University of Ioannina, Ioannina, Greece
Costas Papaloukas

Authors

Christos Lampros
View author publications
You can also search for this author in PubMed Google Scholar
Costas Papaloukas
View author publications
You can also search for this author in PubMed Google Scholar
Themis Exarchos
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios I. Fotiadis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitrios I. Fotiadis .

Editor information

Editors and Affiliations

University of Leeds School of Molecular and Cellular Biology, Leeds, United Kingdom
David R. Westhead
University of Leeds School of Cellular and Molecular Biology, Leeds, United Kingdom
M. S. Vijayabaskar

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Lampros, C., Papaloukas, C., Exarchos, T., Fotiadis, D.I. (2017). HMMs in Protein Fold Classification. In: Westhead, D., Vijayabaskar, M. (eds) Hidden Markov Models. Methods in Molecular Biology, vol 1552. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6753-7_2

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6753-7_2
Published: 22 February 2017
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6751-3
Online ISBN: 978-1-4939-6753-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics