Skip to main content

Advertisement

Log in

Secondary structure-based assignment of the protein structural classes

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Structural class categorizes proteins based on the amount and arrangement of the constituent secondary structures. The knowledge of structural classes is applied in numerous important predictive tasks that address structural and functional features of proteins. We propose novel structural class assignment methods that use one-dimensional (1D) secondary structure as the input. The methods are designed based on a large set of low-identity sequences for which secondary structure is predicted from their sequence (PSSAsc model) or assigned based on their tertiary structure (SSAsc). The secondary structure is encoded using a comprehensive set of features describing count, content, and size of secondary structure segments, which are fed into a small decision tree that uses ten features to perform the assignment. The proposed models were compared against seven secondary structure-based and ten sequence-based structural class predictors. Using the 1D secondary structure, SSAsc and PSSAsc can assign proteins to the four main structural classes, while the existing secondary structure-based assignment methods can predict only three classes. Empirical evaluation shows that the proposed models are quite promising. Using the structure-based assignment performed in SCOP (structural classification of proteins) as the golden standard, the accuracy of SSAsc and PSSAsc equals 76 and 75%, respectively. We show that the use of the secondary structure predicted from the sequence as an input does not have a detrimental effect on the quality of structural class assignment when compared with using secondary structure derived from tertiary structure. Therefore, PSSAsc can be used to perform the automated assignment of structural classes based on the sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Andreeva A, Howorth D, Brenner S, Hubbard T, Chothia C, Murzin A (2004) SCOP Database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32:D226–D229

    Article  PubMed  CAS  Google Scholar 

  • Bahar I, Atilgan AR, Jernigan RL, Erman B (1997) Understanding the recognition of protein structural classes by amino acid composition. Proteins 29:172–185

    Article  PubMed  CAS  Google Scholar 

  • Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242

    Article  PubMed  CAS  Google Scholar 

  • Birzele F, Kramer S (2006) A new representation for protein secondary structure prediction based on frequent patterns. Bioinformatics 22:2628–34

    Article  PubMed  CAS  Google Scholar 

  • Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at university college London. Nucleic Acids Res 33:W36–38

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Zhou GP (2000) Prediction of protein structural classes by neural network. Biochimie 82:783–85

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Liu XJ, Xu X, Zhou GP (2001) Support vector machines for predicting protein structural class. BMC Bioinformatics 2:3

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Liu XJ, Xu XB, Chou KC (2002a) Prediction of protein structural classes by support vector machines. Comput Chem 26:293–296

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Hu J, Liu XJ, Chou KC (2002b) Prediction of protein structural classes by neural network method. J Mol Des 1:332–338

    CAS  Google Scholar 

  • Cai YD, Liu XJ, Xu XB, Chou KC (2003) Support vector machines for prediction of protein domain structural class. J Theor Biol 221:115–20

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Feng KY, Lu WC, Chou KC (2006) Using logitboost classifier to predict protein structural classes. J Theor Biol 238:172–6

    Article  PubMed  CAS  Google Scholar 

  • Cao Y, Liu S, Zhang L, Qin J, Wang J, Tang K (2006) Prediction of protein structural class with rough sets. BMC Bioinformatics 7:20

    Article  PubMed  CAS  Google Scholar 

  • Carlacci L, Chou KC, Maggiora GM (1991) A heuristic approach to predicting the tertiary structure of bovine somatotropin. Biochemistry 30:4389–4398

    Article  PubMed  CAS  Google Scholar 

  • Cedano J, Aloy P, P’erez-Pons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600

    Article  PubMed  CAS  Google Scholar 

  • Chen K, Kurgan L (2007) PFRES: protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics 23:2843–2850

    Article  PubMed  CAS  Google Scholar 

  • Chen K, Kurgan L, Ruan J (2008) Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem. doi:10.1002/jcc.20918

  • Chou KC (1992) Energy-optimized structure of antifreeze protein and its binding mechanism. J Mol Biol 223:509–517

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (1995) A novel approach to predicting protein structural classes in a (20–1)-d amino acid composition space. Proteins 21:319–344

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2005a) Prediction of G-protein-coupled receptor classes. J Proteome Res 4:1413–1418

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2005b) Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Curr Protein Pept Sci 6:423–436

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2004) Predicting protein structural class by functional domain composition. Biochem Biophys Res Commun 321:1007–1009

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12:107–118

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Maggiora GM (1998) Domain structural class prediction. Protein Eng 11:523–538

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007a) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007b) Recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007c) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun 357:633–640

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2008) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Protoc3:153–162

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Zhang CT (1994) Predicting protein folding types by distance functions that make allowances for amino acid interactions. J Biol Chem 269:22014–20

    PubMed  CAS  Google Scholar 

  • Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Liu W, Maggiora GM, Zhang CT (1998) Prediction and classification of domain structural classes. Proteins 31:97–103

    Article  PubMed  CAS  Google Scholar 

  • Chou JJ, Zhang CT (1993) A joint prediction of the folding types of 1,490 human proteins from their genetic codons. J Theor Biol 161:251–262

    Article  PubMed  CAS  Google Scholar 

  • Chou PY (1989) Prediction of protein structural classes from amino acid composition. In: Fasman GD (ed) Prediction of protein structure. Plenum Press, New York, pp 549–586

    Google Scholar 

  • Dong L, Yuan Y, Cai T (2006) Using bagging classifier to predict protein domain structural class. J Biomol Struct Dyn 24:239–42

    PubMed  CAS  Google Scholar 

  • Du QS, Jiang ZQ, He WZ, Li DP, Chou KC (2006) Amino acid principal component analysis (AAPCA) and its applications in protein structural class prediction. J Biomol Struct Dyn 23:635–640

    PubMed  CAS  Google Scholar 

  • Eisenhaber F, Frömmel C, Argos P (1996) Prediction of secondary structural content of proteins from their amino acid composition alone. II The paradox with secondary structural class. Proteins 25:169–179

    Article  PubMed  CAS  Google Scholar 

  • Feng KY, Cai YD, Chou KC (2005) Boosting classifier for predicting protein domain structural class. Biochem Biophys Res Commun 334:213–7

    Article  PubMed  CAS  Google Scholar 

  • Fuchs PF, Alix AJ (2005) High accuracy prediction of beta-turns and their types using propensities and multiple alignments. Proteins 59:828–39

    Article  PubMed  CAS  Google Scholar 

  • Garg A, Kaur H, Raghava GP (2005) Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure. Proteins 61:318–24

    Article  PubMed  CAS  Google Scholar 

  • Gromiha MM (2005a) Motifs in outer membrane protein sequences: applications for discrimination. Biophys Chem 117(1):65–71

    Article  PubMed  CAS  Google Scholar 

  • Gromiha MM (2005b) A statistical model for predicting protein folding rates from amino acid sequence with structural class information. J Chem Inf Model 45(2):494–501

    Article  PubMed  CAS  Google Scholar 

  • Gromiha M, Selvaraj S (1998) Protein secondary structure prediction in different structural classes. Protein Eng 11:249–251

    Article  PubMed  CAS  Google Scholar 

  • Gromiha MM, Suwa M (2005) A simple statistical method for discriminating outer membrane proteins with better accuracy. Bioinformatics 21:961–8

    Article  PubMed  CAS  Google Scholar 

  • Gromiha MM, Selvaraj S, Thangakani AM (2006) A statistical method for predicting protein unfolding rates from amino acid sequence. J Chem Inf Model 46:1503–1508

    Article  PubMed  CAS  Google Scholar 

  • He H, McAllister G, Smith TF (2002) Triage protein fold prediction. Proteins 48:654–63

    Article  PubMed  CAS  Google Scholar 

  • Hobohm U, Sander C (1994) Enlarged representative set of protein structures. Protein Sci 3:522

    Article  PubMed  CAS  Google Scholar 

  • Ivankov DN, Finkelstein AV (2004) Prediction of protein folding rates from the amino acid sequence-predicted secondary structure. Proc Natl Acad Sci USA 101:8942–4

    Article  PubMed  CAS  Google Scholar 

  • Jahandideh S, Abdolmaleki P, Jahandideh M, Sadat Hayatshahi SH (2007) Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. J Theor Biol 244:275–81

    Article  PubMed  CAS  Google Scholar 

  • Jin L, Fang W, Tang H (2003) Prediction of protein structural classes by a new measure of information discrepancy. Comput Biol Chem 27:373–80

    Article  PubMed  CAS  Google Scholar 

  • Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:95–202

    Article  Google Scholar 

  • Kabsch W, Sander C (1983) Dictionary of protein secondary structures: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637

    Article  PubMed  CAS  Google Scholar 

  • Kedarisetti KD, Kurgan L, Dick S (2006a) A comment on ‘prediction of protein structural classes by a new measure of information discrepancy’. Comput Biol Chem 30:393–4

    Article  PubMed  CAS  Google Scholar 

  • Kedarisetti KD, Kurgan L, Dick S (2006b) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–8

    Article  PubMed  CAS  Google Scholar 

  • Klein P, DeLisi C (1986) Prediction of protein structural class from the amino acid sequence. Biopolymers 25:1659–1672

    Article  PubMed  CAS  Google Scholar 

  • Kneller DG, Cohen FE, Langridge R (1990) Improvements in secondary structure prediction by enhanced neural networks. J Mol Biol 214:171–182

    Article  PubMed  CAS  Google Scholar 

  • Kurgan L, Homaeian L (2006) Prediction of structural classes for protein sequences and domains—impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recognit 39:2323–43

    Article  Google Scholar 

  • Kurgan L, Chen K (2007) Prediction of protein structural class for the twilight zone sequences. Biochem Biophys Res Commun 357:453–60

    Article  PubMed  CAS  Google Scholar 

  • Kuznetsov IB, Gou Z, Li R, Hwang S (2006) Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 64:19–27

    Article  PubMed  CAS  Google Scholar 

  • Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261:552–557

    Article  PubMed  CAS  Google Scholar 

  • Lin K, Simossis V, Taylor W, Heringa J (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21:152–9

    Article  PubMed  CAS  Google Scholar 

  • Liu W, Chou KC (1998) Prediction of protein structural classes by modified Mahalanobis discriminant algorithm. Protein Chem 17:209–217

    Article  CAS  Google Scholar 

  • Martin J, Letellier G, Marin A, Taly J-F, de Brevern AG, Gibrat J-F (2005) Protein secondary structure assignment revisited: a detailed analysis of different assignment methods. BMC Struct Biol 5:17

    Article  PubMed  CAS  Google Scholar 

  • Moreland JL, Gramada A, Buzko OV, Zhang Q, Bourne PE (2005) The molecular biology toolkit (mbt): a modular platform for developing molecular visualization applications. BMC Bioinformatics 6:21

    Article  PubMed  CAS  Google Scholar 

  • Murzin A, Brenner S, Hubbard T, Chothia C (1995) SCOP: a structural classification of protein database for the investigation of sequence and structures. J Mol Biol 247:536–540

    PubMed  CAS  Google Scholar 

  • Nakashima H, Nishikawa K, Ooi T (1986) The folding type of a protein is relevant to the amino acid composition. J Biochem 99:153–162

    PubMed  CAS  Google Scholar 

  • Niu B, Cai YD, Lu WC, Zheng GY, Chou KC (2006) Predicting protein structural class with AdaBoost learner. Protein Pept Lett 13:489–492

    Article  PubMed  CAS  Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Shen HB, Yang J, Liu X-J, Chou KC (2005) Using supervised fuzzy clustering to predict protein structural classes. Biochem Biophys Res Commun 334:577–81

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007a) EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 364:53–59

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007b) Signal-3L: a 3-layer approach for predicting signal peptide. Biochem Biophys Res Comm 363:297–303

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007c) Using ensemble classifier to identify membrane protein types. Amino Acids 32:483–488

    Article  PubMed  CAS  Google Scholar 

  • Song J, Burrage K (2006) Predicting residue-wise contact orders in proteins by support vector regression. BMC Bioinformatics 7:425

    Article  PubMed  CAS  Google Scholar 

  • Song J, Yuan Z, Tan H, Huber T, Burrage K (2007) Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure. Bioinformatics 23:3147–54

    Article  PubMed  CAS  Google Scholar 

  • Sun XD, Huang RB (2006) Prediction of protein structural classes using support vector machines. Amino Acids 30:469–475

    Article  PubMed  CAS  Google Scholar 

  • Wang Y, Xue Z, Xu J (2006) Better prediction of the location of alpha-turns in proteins with support vector machine. Proteins 65:49–54

    Article  PubMed  CAS  Google Scholar 

  • Wang Z-X, Yuan Z (2000) How good is the prediction of protein structural class by the component-coupled method? Proteins 38:165–175

    Article  PubMed  CAS  Google Scholar 

  • Wen Z, Li M, Li Y, Guo Y, Wang K (2007) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids 32:277–283

    Article  PubMed  CAS  Google Scholar 

  • Witten IH, Frank E (2005) Data mining. Practical machine learning tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Xiao X, Shao S, Ding Y, Huang Z, Huang Y, Chou KC (2005) Using complexity measure factor to predict protein subcellular location. Amino Acids 28:57–61

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao S, Huang Z, Chou KC (2006a) Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor. J Comp Chem 27:478–82

    Article  CAS  Google Scholar 

  • Xiao X, Shao SH, Ding YS, Huang ZD, Chou KC (2006b) Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30:49–54

    Article  PubMed  CAS  Google Scholar 

  • Zhang CT, Zhang Z, He Z (1998) Prediction of the secondary structure contents of globular proteins based on three structural classes. J Protein Chem 17:261–72

    Article  PubMed  CAS  Google Scholar 

  • Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids 33:623–629

    Article  PubMed  CAS  Google Scholar 

  • Zhang TL, Ding YS, Chou KC (2008) Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. J Theor Biol 250:186–193

    Article  PubMed  CAS  Google Scholar 

  • Zhang Z, Sun ZR, Zhang CT (2001) A new approach to predict the helix/strand content of globular proteins. J Theor Biol 208:65–78

    Article  PubMed  CAS  Google Scholar 

  • Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–38

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This research was funded in part by NSERC Canada. TZ and HZ would like to acknowledge financial support provided by National Education Committee of China. S. Shen and J Ruan were supported by NSFC (grant no. 10671100), Liuhui Center for applied mathematics, and the joint program of Tianjin and Nankai Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukasz A. Kurgan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kurgan, L.A., Zhang, T., Zhang, H. et al. Secondary structure-based assignment of the protein structural classes. Amino Acids 35, 551–564 (2008). https://doi.org/10.1007/s00726-008-0080-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-008-0080-3

Keywords

Navigation