Skip to main content
Log in

Determination of protein folding kinetic types using sequence and predicted secondary structure and solvent accessibility

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Ahmad S, Gromiha MM, Sarai A (2003) Real value prediction of solvent accessibility from amino acid sequence. Proteins 50:629–635

    Article  PubMed  CAS  Google Scholar 

  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  PubMed  CAS  Google Scholar 

  • Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181:223–230

    Article  PubMed  CAS  Google Scholar 

  • Bogatyreva NS, Osypov AA, Ivankov DN (2009) KineticDB: a database of protein folding kinetics. Nucleic Acids Res 37:D342–D346

    Article  PubMed  CAS  Google Scholar 

  • Borgia A, Bonivento D, Travaglini-Allocatelli C, Di Matteo A, Brunori M (2006) Unveiling a hidden folding intermediate in c-type cytochromes by protein engineering. J Biol Chem 281:9331–9336

    Article  PubMed  CAS  Google Scholar 

  • Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33:W36–W38

    Article  PubMed  CAS  Google Scholar 

  • Callender RH, Dyer RB, Gilmanshin R, Woodruff WH (1998) Fast events in protein folding: the time evolution of primary processes. Annu Rev Phys Chem 49:173–202

    Article  PubMed  CAS  Google Scholar 

  • Calloni G, Taddei N, Plaxco KW, Ramponi G, Stefani M, Chiti F (2003) Comparison of the folding processes of distantly related proteins. Importance of hydrophobic content in folding. J Mol Biol 330:577–591

    Article  PubMed  CAS  Google Scholar 

  • Capriotti E, Casadio R (2007) K-Fold: a tool for the prediction of the protein folding kinetic order and rate. Bioinformatics 23:385–386

    Article  PubMed  CAS  Google Scholar 

  • Chen K, Kurgan LA (2007) PFRES: protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics 23(21):2843–2850

    Article  PubMed  CAS  Google Scholar 

  • Chen K, Kurgan LA, Ruan J (2007) Prediction of flexible/rigid regions in proteins from sequences using collocated amino acid pairs. BMC Struct Biol 7:25

    Article  PubMed  Google Scholar 

  • Cranz-Mileva S, Friel CT, Radford SE (2005) Helix stability and hydrophobicity in the folding mechanism of the bacterial immunity protein Im9. Protein Eng Des Sel 18:41–50

    Article  PubMed  CAS  Google Scholar 

  • Cronin MT, Aptula AO, Dearden JC, Duffy JC, Netzeva TI, Patel H, Rowe PH, Schultz TW, Worth AP, Voutzoulidis K, Schüürmann G (2002) Structure-based classification of antibacterial activity. J Chem Inf Comput Sci 42:869–878

    Article  PubMed  CAS  Google Scholar 

  • Dill KA, Ozkan SB, Shell MS, Weikl TR (2008) The protein folding problem. Annu Rev Biophys 37:289–316

    Article  PubMed  CAS  Google Scholar 

  • Dong H, Mukaiyama A, Tadokoro T, Koga Y, Takano K, Kanaya S (2008) Hydrophobic effect on the stability and folding of a hyperthermophilic protein. J Mol Biol 378:264–272

    Article  PubMed  CAS  Google Scholar 

  • Dor O, Zhou Y (2007) Real-SPINE: an integrated system of neural networks for real-value prediction of protein structural properties. Proteins 68:76–81

    Article  PubMed  CAS  Google Scholar 

  • Esposito G, Ricagno S, Corazza A, Rennella E, Gümral D, Mimmi MC, Betto E, Pucillo CE, Fogolari F, Viglino P, Raimondi S, Giorgetti S, Bolognesi B, Merlini G, Stoppini M, Bolognesi M, Bellotti V (2008) The controlling roles of Trp60 and Trp95 in beta2-microglobulin function, folding and amyloid aggregation properties. J Mol Biol 378:887–897

    Article  PubMed  Google Scholar 

  • Feng H, Zhou Z, Bai Y (2005) A protein folding pathway with multiple folding intermediates at atomic resolution. Proc Natl Acad Sci USA 102:5026–5031

    Article  PubMed  CAS  Google Scholar 

  • Ferguson N, Capaldi AP, James R, Kleanthous C, Radford SE (1999) Rapid folding with and without populated intermediates in the homologous four-helix proteins Im7 and Im9. J Mol Biol 286:1597–1608

    Article  PubMed  CAS  Google Scholar 

  • Fersht AR (2000) Transition-state structure as a unifying basis in protein-folding mechanisms: contact order, chain topology, stability, and the extended nucleus mechanism. Proc Natl Acad Sci USA 97:1525–1529

    Article  PubMed  CAS  Google Scholar 

  • Finkelshtein AV, Galzitskaya OV (2004) Physics of protein folding. Phys Life Rev 1:23–56

    Article  Google Scholar 

  • Friel CT, Beddard GS, Radford SE (2004) Switching two-state to three-state kinetics in the helical protein Im9 via the optimization of stabilizing non-native interactions by design. J Mol Biol 342:261–273

    Article  PubMed  CAS  Google Scholar 

  • Fulton KF, Bate MA, Faux NG, Mahmood K, Betts C, Buckle AM (2007) Protein Folding Database (PFD 2.0): an online environment for the International Foldeomics Consortium. Nucleic Acids Res 35:D304–D307

    Article  PubMed  CAS  Google Scholar 

  • Galzitskaya OV, Garbuzynskiy SO, Ivankov DN, Finkelstein AV (2003) Chain length is the main determinant of the folding rate for proteins with three-state folding kinetics. Proteins 51:162–166

    Article  PubMed  CAS  Google Scholar 

  • Galzitskaya OV, Bogatyreva NS, Ivankov DN (2008a) Compactness determines protein folding type. J Bioinform Comput Biol 6:667–680

    Article  PubMed  CAS  Google Scholar 

  • Galzitskaya OV, Danielle C, Reifsnyder DC, Bogatyreva NS, Ivankov DN, Garbuzynskiy SO (2008b) More compact protein globules exhibit slower folding rates. Proteins 70:329–332

    Article  PubMed  CAS  Google Scholar 

  • Gong H, Isom DG, Srinivasan R, Rose GD (2003) Local secondary structure content predicts folding rates for simple, two-state folding proteins. J Mol Biol 327:1149–1154

    Article  PubMed  CAS  Google Scholar 

  • Gromiha MM (2009) Multiple contact network is a key determinant to protein folding rates. J Chem Inf Model 49:1130–1135

    Article  PubMed  CAS  Google Scholar 

  • Gromiha MM, Selvaraj S (2001) Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: application of long-range order to folding rate prediction. J Mol Biol 310:27–32

    Article  PubMed  CAS  Google Scholar 

  • Hsu CW, Lin CJ (2002) A comparison on methods for multi-class support vector machines. IEEE Trans Neural Netw 13:415–425

    Article  PubMed  Google Scholar 

  • Huang JT, Cheng JP (2008) Differentiation between two-state and multi-state folding proteins based on sequence. Proteins 72:44–49

    Article  PubMed  CAS  Google Scholar 

  • Huang LT, Gromiha MM (2008) Analysis and prediction of protein folding rates using quadratic response surface models. J Comp Chem 29:1675–1683

    Article  CAS  Google Scholar 

  • Huang JT, Cheng JP, Chen H (2007) Secondary structure length as a determinant of folding rate of proteins with two- and three-state kinetics. Proteins 67:12–17

    Article  PubMed  CAS  Google Scholar 

  • Inaba K, Kobayashi N, Fersht AR (2000) Conversion of two-state to multi-state folding kinetics on fusion of two protein foldons. J Mol Biol 302:219–233

    Article  PubMed  CAS  Google Scholar 

  • Ivankov DN, Finkelstein AV (2004) Prediction of protein folding rates from the amino acid sequence-predicted secondary structure. Proc Natl Acad Sci USA 101:8942–8944

    Article  PubMed  CAS  Google Scholar 

  • Ivankov DN, Garbuzynskiy SO, Alm E, Plaxco K, Baker D, Finkelstein AV (2003) Contact order revisited: influence of protein size on the folding rate. Protein Sci 12:2057–2062

    Article  PubMed  CAS  Google Scholar 

  • Ivankov DN, Bogatyreva NS, Lobanov MY, Galzitskaya OV (2009) Coupling between properties of the protein shape and the rate of protein folding. PLoS One 4:e6476

    Article  PubMed  Google Scholar 

  • Jackson SE (1998) How do small single-domain proteins fold? Fold Des 3:R81–R91

    Article  PubMed  CAS  Google Scholar 

  • Jiang Y, Iglinski P, Kurgan L (2009) Prediction of protein folding rates from primary sequences using hybrid sequence representation. J Comput Chem 30:772–783

    Article  PubMed  CAS  Google Scholar 

  • Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637

    Article  PubMed  CAS  Google Scholar 

  • Kamagata K, Arai M, Kuwajima K (2004) Unification of the folding mechanisms of non-two-state and two-state proteins. J Mol Biol 339:951–965

    Article  PubMed  CAS  Google Scholar 

  • Klein-Seetharaman J, Oikawa M, Grimshaw SB, Wirmer J, Duchardt E, Ueda T, Imoto T, Smith LJ, Dobson CM, Schwalbe H (2002) Long-range interactions within a nonnative protein. Science 295:1719–1722

    Article  PubMed  CAS  Google Scholar 

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  Google Scholar 

  • Krishna MM, Hoang L, Lin Y, Englander SW (2004) Hydrogen exchange methods to study protein folding. Methods 34:51–64

    Article  PubMed  CAS  Google Scholar 

  • Laurents DV, Corrales S, Elias-Arnanz M, Sevilla P, Rico M, Padmanabhan S (2000) Folding kinetics of Phage 434 Cro proteins. Biochemistry 39:13963–13973

    Article  PubMed  CAS  Google Scholar 

  • Lindorff-Larsen K, Best RB, Depristo MA, Dobson CM, Vendruscolo M (2005) Simultaneous determination of protein structure and dynamics. Nature 433:128–132

    Article  PubMed  CAS  Google Scholar 

  • Linke D, Frank J, Pope MS, Soll J, Ilkavets I, Fromme P, Burstein EA, Reshetnyak YK, Emelyanenko VI (2004) Folding kinetics and structure of OEP16. Biophys J 86:1479–1487

    Article  PubMed  CAS  Google Scholar 

  • Ma BG, Guo JX, Zhang HY (2006) Direct correlation between proteins’ folding rates and their amino acid compositions: an ab initio folding rate prediction. Proteins 65:362–372

    Article  PubMed  CAS  Google Scholar 

  • Ma BG, Chen LL, Zhang HY (2007) What determines protein folding type? An investigation of intrinsic structural properties and its implications for understanding folding mechanisms. J Mol Biol 370:439–448

    Article  PubMed  CAS  Google Scholar 

  • Maity H, Maity M, Krishna MM, Mayne L, Englander SW (2005) Protein folding: the stepwise assembly of foldon units. Proc Natl Acad Sci USA 102:4741–4746

    Article  PubMed  CAS  Google Scholar 

  • Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451

    PubMed  CAS  Google Scholar 

  • Ouyang Z, Liang J (2008) Predicting protein folding rates from geometric contact and amino acid sequence. Protein Sci 17:1256–1263

    Article  PubMed  CAS  Google Scholar 

  • Park SH, Shastry MC, Roder H (1999) Folding dynamics of the B1 domain of protein G explored by ultrarapid mixing. Nature Struct Biol 6:943–947

    Article  PubMed  CAS  Google Scholar 

  • Plaxco KW, Simons KT, Baker D (1998) Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 277:985–994

    Article  PubMed  CAS  Google Scholar 

  • Punta M, Rost B (2005) Protein folding rates estimated from contact predictions. J Mol Biol 348:507–512

    Article  PubMed  CAS  Google Scholar 

  • Ricagno S, Raimondi S, Giorgetti S, Bellotti V, Bolognesi M (2009) Human beta-2 microglobulin W60 V mutant structure: Implications for stability and amyloid aggregation. Biochem Biophys Res Commun 380:543–547

    Article  PubMed  CAS  Google Scholar 

  • Scheraga HA, Khalili M, Liwo A (2007) Protein-folding dynamics: overview of molecular simulation techniques. Annu Rev Phys Chem 58:57–83

    Article  PubMed  CAS  Google Scholar 

  • Schuler B, Lipman EA, Eaton WA (2002) Probing the free-energy surface for protein folding with single-molecule fluorescence spectroscopy. Nature 419:743–747

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Song JN, Chou KC (2009) Prediction of protein folding rates from primary sequence by fusing multiple sequential features. J Biomed Sci Eng 2:136–143

    Article  CAS  Google Scholar 

  • Song J, Burrage K (2006) Predicting residue-wise contact orders in proteins by support vector regression. BMC Bioinform 7:425

    Article  Google Scholar 

  • Sosnick TR, Dothager RS, Krantz BA (2004) Differences in the folding transition state of ubiquitin indicated by φ and ψ analyses. Proc Natl Acad Sci USA 101:17377–17382

    Article  PubMed  CAS  Google Scholar 

  • Udgaonkar JB (2008) Multiple routes and structural heterogeneity in protein folding. Annu Rev Biophys 37:489–510

    Article  PubMed  CAS  Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley, New York

    Google Scholar 

  • Viguera AR, Serrano L (2003) Hydrogenexchange stability analysis of Bergerac-Src homology 3 variants allows the characterization of a folding intermediate in equilibrium. Proc Natl Acad Sci USA 100:5730–5735

    Article  PubMed  CAS  Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 10th international conference on machine learning, pp 856–863

  • Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan LA (2008) Sequence based residue depth prediction using evolutionary information and predicted secondary structure. BMC Bioinform 9:388

    Article  Google Scholar 

  • Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L (2009) On the relation between residue flexibility and local solvent accessibility in proteins. Proteins 76:617–636

    Article  PubMed  CAS  Google Scholar 

  • Zhou R, Eleftheriou M, Royyuru AK, Berne BJ (2007) Destruction of long-range interactions by a single mutation in lysozyme. Proc Natl Acad Sci USA 104:5824–5829

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Emidio Capriotti, Michael Gromiha, Ji-Tao Huang, and Bin-Guang Ma for providing their datasets. This work was supported in part by the National Natural Science Foundation of China (Grant No. 61003187), Zhejiang Provincial Natural Science Foundation of China (Grant No. Y1090688), and NSERC Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukasz Kurgan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Zhang, T., Gao, J. et al. Determination of protein folding kinetic types using sequence and predicted secondary structure and solvent accessibility. Amino Acids 42, 271–283 (2012). https://doi.org/10.1007/s00726-010-0805-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-010-0805-y

Keywords

Navigation