Abstract
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.
Similar content being viewed by others
References
Ahmad S, Gromiha MM, Sarai A (2003) Real value prediction of solvent accessibility from amino acid sequence. Proteins 50:629–635
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181:223–230
Bogatyreva NS, Osypov AA, Ivankov DN (2009) KineticDB: a database of protein folding kinetics. Nucleic Acids Res 37:D342–D346
Borgia A, Bonivento D, Travaglini-Allocatelli C, Di Matteo A, Brunori M (2006) Unveiling a hidden folding intermediate in c-type cytochromes by protein engineering. J Biol Chem 281:9331–9336
Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33:W36–W38
Callender RH, Dyer RB, Gilmanshin R, Woodruff WH (1998) Fast events in protein folding: the time evolution of primary processes. Annu Rev Phys Chem 49:173–202
Calloni G, Taddei N, Plaxco KW, Ramponi G, Stefani M, Chiti F (2003) Comparison of the folding processes of distantly related proteins. Importance of hydrophobic content in folding. J Mol Biol 330:577–591
Capriotti E, Casadio R (2007) K-Fold: a tool for the prediction of the protein folding kinetic order and rate. Bioinformatics 23:385–386
Chen K, Kurgan LA (2007) PFRES: protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics 23(21):2843–2850
Chen K, Kurgan LA, Ruan J (2007) Prediction of flexible/rigid regions in proteins from sequences using collocated amino acid pairs. BMC Struct Biol 7:25
Cranz-Mileva S, Friel CT, Radford SE (2005) Helix stability and hydrophobicity in the folding mechanism of the bacterial immunity protein Im9. Protein Eng Des Sel 18:41–50
Cronin MT, Aptula AO, Dearden JC, Duffy JC, Netzeva TI, Patel H, Rowe PH, Schultz TW, Worth AP, Voutzoulidis K, Schüürmann G (2002) Structure-based classification of antibacterial activity. J Chem Inf Comput Sci 42:869–878
Dill KA, Ozkan SB, Shell MS, Weikl TR (2008) The protein folding problem. Annu Rev Biophys 37:289–316
Dong H, Mukaiyama A, Tadokoro T, Koga Y, Takano K, Kanaya S (2008) Hydrophobic effect on the stability and folding of a hyperthermophilic protein. J Mol Biol 378:264–272
Dor O, Zhou Y (2007) Real-SPINE: an integrated system of neural networks for real-value prediction of protein structural properties. Proteins 68:76–81
Esposito G, Ricagno S, Corazza A, Rennella E, Gümral D, Mimmi MC, Betto E, Pucillo CE, Fogolari F, Viglino P, Raimondi S, Giorgetti S, Bolognesi B, Merlini G, Stoppini M, Bolognesi M, Bellotti V (2008) The controlling roles of Trp60 and Trp95 in beta2-microglobulin function, folding and amyloid aggregation properties. J Mol Biol 378:887–897
Feng H, Zhou Z, Bai Y (2005) A protein folding pathway with multiple folding intermediates at atomic resolution. Proc Natl Acad Sci USA 102:5026–5031
Ferguson N, Capaldi AP, James R, Kleanthous C, Radford SE (1999) Rapid folding with and without populated intermediates in the homologous four-helix proteins Im7 and Im9. J Mol Biol 286:1597–1608
Fersht AR (2000) Transition-state structure as a unifying basis in protein-folding mechanisms: contact order, chain topology, stability, and the extended nucleus mechanism. Proc Natl Acad Sci USA 97:1525–1529
Finkelshtein AV, Galzitskaya OV (2004) Physics of protein folding. Phys Life Rev 1:23–56
Friel CT, Beddard GS, Radford SE (2004) Switching two-state to three-state kinetics in the helical protein Im9 via the optimization of stabilizing non-native interactions by design. J Mol Biol 342:261–273
Fulton KF, Bate MA, Faux NG, Mahmood K, Betts C, Buckle AM (2007) Protein Folding Database (PFD 2.0): an online environment for the International Foldeomics Consortium. Nucleic Acids Res 35:D304–D307
Galzitskaya OV, Garbuzynskiy SO, Ivankov DN, Finkelstein AV (2003) Chain length is the main determinant of the folding rate for proteins with three-state folding kinetics. Proteins 51:162–166
Galzitskaya OV, Bogatyreva NS, Ivankov DN (2008a) Compactness determines protein folding type. J Bioinform Comput Biol 6:667–680
Galzitskaya OV, Danielle C, Reifsnyder DC, Bogatyreva NS, Ivankov DN, Garbuzynskiy SO (2008b) More compact protein globules exhibit slower folding rates. Proteins 70:329–332
Gong H, Isom DG, Srinivasan R, Rose GD (2003) Local secondary structure content predicts folding rates for simple, two-state folding proteins. J Mol Biol 327:1149–1154
Gromiha MM (2009) Multiple contact network is a key determinant to protein folding rates. J Chem Inf Model 49:1130–1135
Gromiha MM, Selvaraj S (2001) Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: application of long-range order to folding rate prediction. J Mol Biol 310:27–32
Hsu CW, Lin CJ (2002) A comparison on methods for multi-class support vector machines. IEEE Trans Neural Netw 13:415–425
Huang JT, Cheng JP (2008) Differentiation between two-state and multi-state folding proteins based on sequence. Proteins 72:44–49
Huang LT, Gromiha MM (2008) Analysis and prediction of protein folding rates using quadratic response surface models. J Comp Chem 29:1675–1683
Huang JT, Cheng JP, Chen H (2007) Secondary structure length as a determinant of folding rate of proteins with two- and three-state kinetics. Proteins 67:12–17
Inaba K, Kobayashi N, Fersht AR (2000) Conversion of two-state to multi-state folding kinetics on fusion of two protein foldons. J Mol Biol 302:219–233
Ivankov DN, Finkelstein AV (2004) Prediction of protein folding rates from the amino acid sequence-predicted secondary structure. Proc Natl Acad Sci USA 101:8942–8944
Ivankov DN, Garbuzynskiy SO, Alm E, Plaxco K, Baker D, Finkelstein AV (2003) Contact order revisited: influence of protein size on the folding rate. Protein Sci 12:2057–2062
Ivankov DN, Bogatyreva NS, Lobanov MY, Galzitskaya OV (2009) Coupling between properties of the protein shape and the rate of protein folding. PLoS One 4:e6476
Jackson SE (1998) How do small single-domain proteins fold? Fold Des 3:R81–R91
Jiang Y, Iglinski P, Kurgan L (2009) Prediction of protein folding rates from primary sequences using hybrid sequence representation. J Comput Chem 30:772–783
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637
Kamagata K, Arai M, Kuwajima K (2004) Unification of the folding mechanisms of non-two-state and two-state proteins. J Mol Biol 339:951–965
Klein-Seetharaman J, Oikawa M, Grimshaw SB, Wirmer J, Duchardt E, Ueda T, Imoto T, Smith LJ, Dobson CM, Schwalbe H (2002) Long-range interactions within a nonnative protein. Science 295:1719–1722
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Krishna MM, Hoang L, Lin Y, Englander SW (2004) Hydrogen exchange methods to study protein folding. Methods 34:51–64
Laurents DV, Corrales S, Elias-Arnanz M, Sevilla P, Rico M, Padmanabhan S (2000) Folding kinetics of Phage 434 Cro proteins. Biochemistry 39:13963–13973
Lindorff-Larsen K, Best RB, Depristo MA, Dobson CM, Vendruscolo M (2005) Simultaneous determination of protein structure and dynamics. Nature 433:128–132
Linke D, Frank J, Pope MS, Soll J, Ilkavets I, Fromme P, Burstein EA, Reshetnyak YK, Emelyanenko VI (2004) Folding kinetics and structure of OEP16. Biophys J 86:1479–1487
Ma BG, Guo JX, Zhang HY (2006) Direct correlation between proteins’ folding rates and their amino acid compositions: an ab initio folding rate prediction. Proteins 65:362–372
Ma BG, Chen LL, Zhang HY (2007) What determines protein folding type? An investigation of intrinsic structural properties and its implications for understanding folding mechanisms. J Mol Biol 370:439–448
Maity H, Maity M, Krishna MM, Mayne L, Englander SW (2005) Protein folding: the stepwise assembly of foldon units. Proc Natl Acad Sci USA 102:4741–4746
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451
Ouyang Z, Liang J (2008) Predicting protein folding rates from geometric contact and amino acid sequence. Protein Sci 17:1256–1263
Park SH, Shastry MC, Roder H (1999) Folding dynamics of the B1 domain of protein G explored by ultrarapid mixing. Nature Struct Biol 6:943–947
Plaxco KW, Simons KT, Baker D (1998) Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 277:985–994
Punta M, Rost B (2005) Protein folding rates estimated from contact predictions. J Mol Biol 348:507–512
Ricagno S, Raimondi S, Giorgetti S, Bellotti V, Bolognesi M (2009) Human beta-2 microglobulin W60 V mutant structure: Implications for stability and amyloid aggregation. Biochem Biophys Res Commun 380:543–547
Scheraga HA, Khalili M, Liwo A (2007) Protein-folding dynamics: overview of molecular simulation techniques. Annu Rev Phys Chem 58:57–83
Schuler B, Lipman EA, Eaton WA (2002) Probing the free-energy surface for protein folding with single-molecule fluorescence spectroscopy. Nature 419:743–747
Shen HB, Song JN, Chou KC (2009) Prediction of protein folding rates from primary sequence by fusing multiple sequential features. J Biomed Sci Eng 2:136–143
Song J, Burrage K (2006) Predicting residue-wise contact orders in proteins by support vector regression. BMC Bioinform 7:425
Sosnick TR, Dothager RS, Krantz BA (2004) Differences in the folding transition state of ubiquitin indicated by φ and ψ analyses. Proc Natl Acad Sci USA 101:17377–17382
Udgaonkar JB (2008) Multiple routes and structural heterogeneity in protein folding. Annu Rev Biophys 37:489–510
Vapnik V (1998) Statistical learning theory. Wiley, New York
Viguera AR, Serrano L (2003) Hydrogenexchange stability analysis of Bergerac-Src homology 3 variants allows the characterization of a folding intermediate in equilibrium. Proc Natl Acad Sci USA 100:5730–5735
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 10th international conference on machine learning, pp 856–863
Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan LA (2008) Sequence based residue depth prediction using evolutionary information and predicted secondary structure. BMC Bioinform 9:388
Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L (2009) On the relation between residue flexibility and local solvent accessibility in proteins. Proteins 76:617–636
Zhou R, Eleftheriou M, Royyuru AK, Berne BJ (2007) Destruction of long-range interactions by a single mutation in lysozyme. Proc Natl Acad Sci USA 104:5824–5829
Acknowledgments
The authors would like to thank Emidio Capriotti, Michael Gromiha, Ji-Tao Huang, and Bin-Guang Ma for providing their datasets. This work was supported in part by the National Natural Science Foundation of China (Grant No. 61003187), Zhejiang Provincial Natural Science Foundation of China (Grant No. Y1090688), and NSERC Canada.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, H., Zhang, T., Gao, J. et al. Determination of protein folding kinetic types using sequence and predicted secondary structure and solvent accessibility. Amino Acids 42, 271–283 (2012). https://doi.org/10.1007/s00726-010-0805-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-010-0805-y