Abstract
Numerous methods for predicting γ-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC) ≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for γ-turn prediction. Based on this perception, we have developed a new method for γ-turn prediction. After training and testing the benchmark dataset of 320 non-homologous protein chains using a fivefold cross-validation technique, the present method achieves excellent performance. The overall prediction accuracy Q total can achieve 92.2% and the MCC is 0.38, which outperform the existing γ-turn prediction methods. Our results indicate that the protein shape string is useful for predicting protein tight turns and it is reasonable to use the dihedral angle information as a variable for machine learning to predict protein folding. The dataset used in this work and the software to generate predicted shape string from structure database can be obtained from anonymous ftp site ftp://cheminfo.tongji.edu.cn/GammaTurnPrediction/ freely.
Similar content being viewed by others
References
Alkorta I, Suarez ML, Herranz R, GonzalezMuniz R, GarciaLopez MT (1996) Similarity study on peptide gamma-turn conformation mimetics. J Mol Model 2:16–25
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res 25:3389–3402
Anand A, Pugalenthi G, Fogel GB, Suganthan PN (2010) An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids 39:1385–1391
Barandela R, Sanchez JS, Garcia V, Rangel E (2003) Strategies for learning in class imbalance problems. Pattern Recogn 36:849–851
Bystrov VF, Portnova SL, Tsetlin VI, Ivanov VT, Ovchinnikov YA (1969) Conformational studies of peptide systems. The rotational states of the NH–CH fragment of alanine dipeptides by nuclear magnetic resonance. Tetrahedron 25:493–515
Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for the classification and prediction of beta-turn types. J Pept Sci 8:297–301
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chou KC (1997a) Prediction and classification of alpha-turn types. Biopolymers 42:837–853
Chou KC (1997b) Prediction of beta-turns. J Pept Res 49:120–144
Chou KC (2000) Prediction of tight turns and their types in proteins. Anal Biochem 286:1–16
Chou KC, Blinn JR (1997) Classification and prediction of beta-turn types. J Protein Chem 16:575–595
Chou KC, Cai YD (2002) Using functional domain composition and support vectormachines for prediction of protein subcellular location. J Biol Chem 277:45765–45769
DiFrancesco V, Garnier J, Munson PJ (1996) Improving protein secondary structure prediction with aligned homologous sequences. Protein Sci 5:106–113
Garnier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120:97–120
Gibrat JF, Garnier J, Robson B (1987) Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol 198:425–443
Guruprasad K, Rajkumar S (2000) Beta-and gamma-turns in proteins revisited: a new set of amino acid turn-type dependent positional preferences and potentials. J Biosci 25:143–156
Guruprasad K, Shukla S, Adindla S, Guruprasad L (2003) Prediction of gamma-turns from amino acid sequences. J Pept Res 61:243–251
Hovmöller S, Zhou T (2004) Protein shape strings and DNA sequences [Online]. Available: http://www.fos.su.se/~pdbdna/pdb_shape_dna.html
Hovmöller S, Zhou T, Ohlson T (2002) Conformations of amino acids in proteins. Acta Crystallogr D 58:768–776
Hu XZ, Li QZ (2008) Using support vector machine to predict beta- and gamma-turns in proteins. J Comput Chem 29:1867–1875
Hutchinson EG, Thornton JM (1996) PROMOTIF—a program to identify and analyze structural motifs in proteins. Protein Sci 5:212–220
Ison RE, Hovmöller S, Kretsinger RH (2005) Proteins and their shape strings. An exemplary computer representation of protein structure. IEEE Eng Med Biol Mag 24:41–49
Jahandideh S, Sarvestani AS, Abdolmaleki P, Jahandideh M, Barfeie M (2007) gamma-turn types prediction in proteins using the support vector machines. J Theor Biol 249:785–790
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Kaur H, Raghava GPS (2002) An evaluation of {beta}-turn prediction methods. Bioinformatics 18:1508–1514
Kaur H, Raghava GPS (2003) A neural-network based method for prediction of gamma-turns in proteins from multiple sequence alignment. Protein Sci 12:923–929
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. Morgan Kaufmann 179–186. doi:10.1.1.43.4487
Li WZ, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659
Montgomerie S, Sundararaj S, Gallin WJ, Wishart DS (2006) Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinform 7:301
Pham TH, Satou K, Ho TB (2005) Support vector machines for prediction and analysis of beta and gamma-turns in proteins. J Bioinform Comput Biol 3:343–358
Richardson JS (1981) The anatomy and taxonomy of protein structure. Adv Protein Chem 34:167–339
Robert MK, Holte R, Matwin S (1997) Learning when negative examples abound. Springer, Berlin, pp 146–153. doi: 10.1.1.36.88
Rose GD, Gierasch LM, Smith JA (1985) Turns in peptides and proteins. Adv Protein Chem 37:1–109
Shepherd AJ, Gorse D, Thornton JM (1999) Prediction of the location and type of beta-turns in proteins using neural networks. Protein Sci 8:1045–1055
Wang L, Wu LY, Wang Y, Zhang XS, Chen LN (2010) SANA: an algorithm for sequential and non-sequential protein structure alignment. Amino Acids 39:417–425
Wrtten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco
Wu G, Chang EY (2003) Class-boundary alignment for imbalanced dataset learning. doi:10.1.1.94.9007
Zell A, Mamier G (1997) Neural Network Simulator, Version 4.2. University of Stuttgart, Stuttgart
Zhang Q, Yoon S, Welsh WJ (2005) Improved method for predicting beta-turn using support vector machine. Bioinformatics 21:2370–2374
Zhou TP, Shu NJ, Hövmoller S (2010) A novel method for accurate one-dimensional protein structure prediction based on fragment matching. Bioinformatics 26:470–477
Acknowledgments
The authors would like to thank the National Natural Science Foundation of China (20675057, 20705024) for financial support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, Y., Li, T., Li, D. et al. Using predicted shape string to enhance the accuracy of γ-turn prediction. Amino Acids 42, 1749–1755 (2012). https://doi.org/10.1007/s00726-011-0889-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-011-0889-z