Abstract
Antifreeze proteins (AFPs), known as thermal hysteresis proteins, are ice-binding proteins. AFPs have been found in many fields such as in vertebrates, invertebrates, plants, bacteria, and fungi. Although the function of AFPs is common, the sequences and structures of them show a high degree of diversity. AFPs can be adsorbed in ice crystal surface and inhibit the growth of ice crystals in solution. However, the interaction between AFPs and ice crystal is not completely known for human beings. It is vitally significant to propose an automated means as a high-throughput tool to timely identify the AFPs. Analyzing physicochemical characteristics of AFPs sequences is very significant to understand the ice-protein interaction. In this manuscript, a predictor called “iAFP-Ense” was developed. The operation engine to run the AFPs prediction is an ensemble classifier formed by a voting system to fuse eleven different random forest classifiers based on feature extraction. We also compare our predictor with the AFP-PseAAC via the tenfold cross-validation on the same benchmark dataset. The comparison with the existing methods indicates the new predictor is very promising, meaning that many important key features which are deeply hidden in complicated protein sequences. The predictor used in this article is freely available at http://www.jci-bioinfo.cn/iAFP-Ense.
Similar content being viewed by others
References
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res 25:3389–3402
Anand A, Pugalenthi G, Suganthan P (2008) Predicting protein structural class by SVM with class-wise optimized features and decision probabilities. J Theor Biol 253:375–380
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breton G, Danyluk J, ois Ouellet F, Sarhan F (2000) Biotechnological applications of plant freezing associated proteins. Biotechnol Annu Rev 6:59–101
Cai Y-D, Ricardo P-W, Jen C-H, Chou K-C (2004) Application of SVM to predict membrane protein types. J Theor Biol 226:373–376
Cai Y-D, Zhou G-P, Chou K-C (2005) Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition. J Theor Biol 234:145–149
Cao D-S, Xu Q-S, Liang Y-Z (2013) propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29:960–962
Chen W, Lin H, Feng P-M, Ding C, Zuo Y-C et al (2012) iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS ONE 7:e47843
Chen W, Feng P-M, Lin H, Chou K-C (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucl Acids Res 41:e68
Cheng C-HC (1998) Evolution of the diverse antifreeze proteins. Curr Opin Genet Dev 8:715–720
Chou K-C (1992) Energy-optimized structure of antifreeze protein and its binding mechanism. J Mol Biol 223:509–517
Chou KC (2001a) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43:246–255
Chou KC (2001b) Prediction of protein signal sequences and their cleavage sites. Proteins 42:136–139
Chou K-C (2001c) Using subsite coupling to predict signal peptides. Protein Eng 14:75–79
Chou K-C (2001d) Prediction of signal peptides using scaled window. Peptides 22:1973–1979
Chou K-C (2005a) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19
Chou K-C (2005b) Prediction of G-protein-coupled receptor classes. J Proteome Res 4:1413–1418
Chou K-C (2009) Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr Proteom 6:262–274
Chou K-C (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273:236–247
Chou K-C (2013) Some remarks on predicting multi-label attributes in molecular biosystems. Mol BioSyst 9:1092–1100
Chou K-C, Cai Y-D (2005) Prediction of membrane protein types by incorporating amphipathic effects. J Chem Inf Model 45:407–413
Chou K-C, Cai Y-D (2006) Prediction of protease types in a hybridization space. Biochem Biophys Res Commun 339:1015–1020
Chou K-C, Shen H-B (2006) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J Proteome Res 5:1888–1897
Chou K-C, Shen H-B (2007a) Recent progress in protein subcellular location prediction. Anal Biochem 370:1–16
Chou K-C, Shen H-B (2007b) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734
Chou K-C, Shen H-B (2008) Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162
Chou K-C, Shen H-B (2009) Review: recent advances in developing web-servers for predicting protein attributes. Nat Sci 1:63
Chou K-C, Zhang C-T (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349
Davies PL, Baardsnes J, Kuiper MJ, Walker VK (2002) Structure and function of antifreeze proteins. Philos Trans Royal Soc B 357:927–935
Du P, Wang X, Xu C, Gao Y (2012) PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Anal Biochem 425:117–119
Ewart K, Lin Q, Hew C (1999) Structure, function and evolution of antifreeze proteins. Cell Mol Life Sci CMLS 55:271–283
Fan G-L, Li Q-Z (2013) Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou’s pseudo amino acid composition. J Theor Biol 334:45–51
Feng K-Y, Cai Y-D, Chou K-C (2005) Boosting classifier for predicting protein domain structural class. Biochem Biophys Res Commun 334:213–217
Griffith M, Ewart KV (1995) Antifreeze proteins and their potential use in frozen foods. Biotechnol Adv 13:375–402
Gu B, Sun X, Sheng V-S (2016) Structural Minimax Probability Machine. IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2016.2544779
Gu B, Sheng V-S, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for ν-support vector regression. Neural Networks, 67:140–150
Huang R-B, Du Q-S, Wei Y-T, Pang Z-W, Wei H et al (2009) Physics and chemistry-driven artificial neural network for predicting bioactivity of peptides and proteins and their design. J Theor Biol 256:428–435
Huang T, Wang J, Cai Y-D, Yu H, Chou K-C (2012) Hepatitis C virus network based classification of hepatocellular cirrhosis and carcinoma. PLoS ONE 7:e34460
Jia Z, Davies PL (2002) Antifreeze proteins: an unusual receptor–ligand interaction. Trends Biochem Sci 27:101–106
Jia J, Xiao X, Liu B, Jiao L (2011) Bagging-based spectral clustering ensemble selection. Pattern Recogn Lett 32:1456–1467
Jia J, Xiao X, Liu B (2015) Prediction of protein–protein interactions with physicochemical descriptors and wavelet transform via random forests. J Lab Autom 22:368–377
Jiang Y, Huang T, Chen L, Gao Y-F, Cai Y et al (2013) Signal propagation in protein interaction network during colorectal cancer progression. BioMed Res Int 2013:9
Kandaswamy KK, Chou K-C, Martinetz T, Möller S, Suganthan P et al (2011) AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 270:56–62
Levitt J (1980) Responses of plants to environmental stresses. Volume II. Water, radiation, salt, and other stresses, Academic Press, New York
Li B-Q, Huang T, Liu L, Cai Y-D, Chou K-C (2012) Identification of colorectal cancer related genes with mRMR and shortest path in protein-protein interaction network. PLoS ONE 7:e33393
Lin W-Z, Fang J-A, Xiao X, Chou K-C (2011) iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS ONE 6:e24756
Liu B, Zhang D, Xu R, Xu J, Wang X et al (2014) Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30:472–479
Min J-L, Xiao X, Chou K-C (2013) iEzy-Drug: a web server for identifying the interaction between enzymes and drugs in cellular networking. BioMed Res Int 2013:13
Mondal S, Pai PP (2014) Chou’s pseudo amino acid composition improves sequence-based antifreeze protein prediction. J Theor Biol 356:30–35
Moriyama M, Abe J, Yoshida M, Tsurumi Y, Nakayama S (1995) Seasonal changes in freezing tolerance, moisture content and dry weight of three temperate grasses [Dactylis glomerata, Lolium perenne, Phleum pratense]. J Jpn Soc Grassl Sci
Qiu W-R, Xiao X, Chou K-C (2014) iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 15:1746–1766
Sakai A, Larcher W (1987) Frost survival of plants. Responses and adaptation to freezing stress. Springer, Berlin
Schäffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL et al (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucl Acids Res 29:2994–3005
Scholander P, Van Dam L, Kanwisher J, Hammel H, Gordon M (1957) Supercooling and osmoregulation in Arctic fish. J Cell Comp Physiol 49:5–24
Sformo T, Kohl F, McIntyre J, Kerr P, Duman J et al (2009) Simultaneous freeze tolerance and avoidance in individual fungus gnats, Exechia nugatoria. J Comp Physiol B 179:897–902
Shen H-B, Chou K-C (2008) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373:386–388
Shen H-B, Chou K-C (2009) A top-down approach to enhance the power of predicting human protein subcellular localization: hum-mPLoc 2.0. Anal Biochem 394:269–274
Shen H-B, Yang J, Chou K-C (2006) Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J Theor Biol 240:9–13
Shi J-Y, Zhang S-W, Pan Q, Zhou G-P (2008) Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution. Amino Acids 35:321–327
Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins-Struct Funct Genet 28:405–420
Wang M, Yang J, Xu Z-J, Chou K-C (2005) SLLE for predicting membrane protein types. J Theor Biol 232:7–15
Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inform Sciences. 295:395–406
Xiao X, Min J-L, Wang P, Chou K-C (2013a) iGPCR-Drug: a web server for predicting interaction between GPCRs and drugs in cellular networking. PLoS ONE 8:e72234
Xiao X, Wang P, Lin W-Z, Jia J-H, Chou K-C (2013b) iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 436:168–177
Xu Y, Ding J, Wu L-Y, Chou K-C (2013a) iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS ONE 8:e55844
Xu Y, Shao X-J, Wu L-Y, Deng N-Y, Chou K-C (2013b) iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ 1:e171
Xu Y, Bäumer A, Meister K, Bischak CG, DeVries AL et al (2016) Protein–water dynamics in antifreeze protein III activity. Chem Phys Lett 647:1–6
Y-d Cai, Zhou G-P, Jen C-H, Lin S-L, Chou K-C (2004) Identify catalytic triads of serine hydrolases by support vector machines. J Theor Biol 228:551–557
Yoshida M, Abe J, Moriyama M, Shimokawa S, Nakamura Y (1997) Seasonal changes in the physical state of crown water associated with freezing tolerance in winter wheat. Physiol Plant 99:363–370
Yu C-S, Lu C-H (2011) Identification of antifreeze proteins and their functional residues by support vector machine and genetic algorithms based on n-peptide compositions. PLoS ONE 6:e20445
Zhao X, Ma Z, Yin M (2012) Using support vector machine and evolutionary profiles to predict antifreeze protein sequences. Int J Mol Sci 13:2196–2207
Acknowledgements
This work was partially supported by the National Nature Science Foundation of China (No. 31260273, 61261027), the Jiangxi Provincial Foreign Scientific and Technological Cooperation Project (No. 20120BDH80023), Natural Science Foundation of Jiangxi Province, China (No. 20114BAB211013, 20122BAB211033, 20122BAB201044, 20122BAB201020), the Department of Education of JiangXi Province (GJJ12490, GJJ4642, GJJ14651, GJJ14640), the LuoDi plan of the Department of Education of JiangXi Province (KJLD12083), and the JiangXi Provincial Foundation for Leaders of Disciplines in Science (20113BCB22008).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiao, X., Hui, M. & Liu, Z. iAFP-Ense: An Ensemble Classifier for Identifying Antifreeze Protein by Incorporating Grey Model and PSSM into PseAAC. J Membrane Biol 249, 845–854 (2016). https://doi.org/10.1007/s00232-016-9935-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00232-016-9935-9