Abstract
The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.
Similar content being viewed by others
Abbreviations
- PDB:
-
Protein Data Bank
- 3DZD:
-
3 Dimensional Zernike descriptor
- ATP:
-
Adenosine triphosphate
- HEM:
-
Heme
- NAD:
-
Nicotinamide adenine dinucleotide
- FAD:
-
Flavin adenine dinucleotide
- BTN:
-
Biotin
- F6P:
-
Fructose 6-phosphate
- GUN:
-
Guanine
- PLM:
-
Palmitic acid
- RTL:
-
Retinol
- AUC:
-
Area under the curve
- ROC:
-
Receiver operator characteristic
- EF:
-
Enrichment factor
- GO:
-
Gene ontology
- PFP:
-
Protein function prediction
- ESG:
-
Extended similarity group
- AFP-SIG:
-
Automatic Function Prediction Special Interest Group
- ISMB:
-
Intelligent System in Molecular Biology
- CASP:
-
Critical Assessment of Techniques for Protein Structure Prediction
References
Chandonia JM, Brenner SE (2006) The impact of structural genomics: expectations and outcomes. Science 311:347–351
Norvell JC, Berg JM (2007) Update on the protein structure initiative. Structure 15:1519–1522
Terwilliger TC, Stuart D, Yokoyama S (2009) Lessons from structural genomics. Annu Rev Biophys 38:371–383
Todd AE, Marsden RL, Thornton JM, Orengo CA (2005) Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol 348:1235–1260
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
Westbrook J, Feng Z, Chen L, Yang H, Berman HM (2003) The protein data bank and structural genomics. Nucleic Acids Res 31:489–491
Ellrott K, Zmasek CM, Weekes D, Sri KS, Bakolitsa C, Godzik A, Wooley J (2011) TOPSAN: a dynamic web database for structural genomics. Nucleic Acids Res 39:D494–D496
Shin DH, Hou J, Chandonia JM, Das D, Choi IG, Kim R, Kim SH (2007) Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center. J Struct Funct Genomics 8:99–105
Teplyakov A, Pullalarevu S, Obmolova G, Doseeva V, Galkin A, Herzberg O, Dauter M, Dauter Z, Gilliland GL (2004) Crystal structure of the YffB protein from Pseudomonas aeruginosa suggests a glutathione-dependent thiol reductase function. BMC Struct Biol 4:5
Teplyakov A, Obmolova G, Sarikaya E, Pullalarevu S, Krajewski W, Galkin A, Howard AJ, Herzberg O, Gilliland GL (2004) Crystal structure of the YgfZ protein from Escherichia coli suggests a folate-dependent regulatory role in one-carbon metabolism. J Bacteriol 186:7134–7140
Li De La Sierra-Gallay I, Collinet B, Graille M, Quevillon-Cheruel S, Liger D, Minard P, Blondeau K, Henckes G, Aufrere R, Leulliot N, Zhou CZ, Sorel I, Ferrer JL, Poupon A, Janin J, van Tilbeurgh H (2004) Crystal structure of the YGR205w protein from Saccharomyces cerevisiae: close structural resemblance to E. coli pantothenate kinase. Proteins 54:776–783
Graille M, Quevillon-Cheruel S, Leulliot N, Zhou CZ, Gallay ILD, Jacquamet L, Ferrer JL, Liger D, Poupon A, Janin J, van Tilbeurgh H (2004) Crystal structure of the YDR533c S. cerevisiae protein, a class II member of the Hsp31 family. Structure 12:839–847
Liger D, Graille M, Zhou CZ, Leulliot N, Quevillon-Cheruel S, Blondeau K, Janin J, van Tilbeurgh T (2004) Crystal structure and functional characterization of yeast YLR011wp, an enzyme with NAD(P)H-FMN and ferric iron reductase activities. J Biol Chem 279:34890–34897
Sanishvili R, Yakunin AF, Laskowski RA, Skarina T, Evdokimova E, Doherty-Kirby A, Lajoie GA, Thornton JM, Arrowsmith CH, Savchenko A, Joachimiak A, Edwards AM (2003) Integrating structure, bioinformatics, and enzymology to discover function: BioH, a new carboxylesterase from Escherichia coli. J Biol Chem 278:26039–26045
Kuznetsova E, Proudfoot M, Sanders SA, Reinking J, Savchenko A, Arrowsmith CH, Edwards AM, Yakunin AF (2005) Enzyme genomics: application of general enzymatic screens to discover new enzymes. FEMS Microbiol Rev 29:263–279
Fridman E, Pichersky E (2005) Metabolomics, genomics, proteomics, and the identification of enzymes and their substrates and products. Curr Opin Plant Biol 8:242–248
Roberts RJ (2011) COMBREX: COMputational BRidge to EXperiments. Biochem Soc Trans 39:581–583
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448
Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A (2010) The Pfam protein families database. Nucleic Acids Res 38:D211–D222
Hawkins T, Kihara D (2007) Function prediction of uncharacterized proteins. J Bioinform Comput Biol 5:1–30
Hawkins T, Chitale M, Kihara D (2008) New paradigm in protein function prediction for large scale omics analysis. Mol Biosyst 4:223–231
Kihara D (2011) Protein function prediction for omics era. Springer, London
Gherardini PF, Helmer-Citterich M (2008) Structure-based function prediction: approaches and applications. Brief Funct Genomic Proteomic 7:291–302
Martin AC, Orengo CA, Hutchinson EG, Jones S, Karmirantzou M, Laskowski RA, Mitchell JB, Taroni C, Thornton JM (1998) Protein folds and functions. Structure 6:875–884
Thornton JM, Todd AE, Milburn D, Borkakoti N, Orengo CA (2000) From structure to function: approaches and limitations. Nat Struct Biol 7(Suppl):991–994
Shindyalov IN, Bourne PE (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 11:739–747
Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233:123–138
Orengo CA, Taylor WR (1996) SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol 266:617–635
Thompson KE, Wang Y, Madej T, Bryant SH (2009) Improving protein structure similarity searches using domain boundaries based on conserved sequence information. BMC Struct Biol 9:33
Mizuguchi K, Go N (1995) Comparison of spatial arrangements of secondary structural elements in proteins. Protein Eng 8:353–362
Kihara D, Sael L, Chikhi R, Esquivel-Rodriguez J (2011) Molecular surface representation using 3D Zernike descriptors for protein shape comparison and docking. Curr Protein Pept Sci 12:520–530
La D, Esquivel-Rodriguez J, Venkatraman V, Li B, Sael L, Ueng S, Ahrendt S, Kihara D (2009) 3D-SURFER: software for high-throughput protein surface comparison and analysis. Bioinformatics 25:2843–2844
Sael L, Li B, La D, Fang Y, Ramani K, Rustamov R, Kihara D (2008) Fast protein tertiary structure retrieval based on global surface shape similarity. Proteins 72:1259–1273
Sael L, Kihara D (2009) Protein surface representation and comparison: new approaches in structural proteomics. In: Chen J, Lonardi S (eds) Biological data mining. Chapman & Hall/CRC Press, Boca Raton, pp 89–109
Venkatraman V, Sael L, Kihara D (2009) Potential for protein surface shape analysis using spherical harmonics and 3D Zernike descriptors. Cell Biochem Biophys 54:23–32
Ritchie DW, Graham J (1999) Fast computation, rotation, and comparison of low resolution spherical harmonic molecular surfaces. J Comp Chem 20:383–395
Orengo CA, Jones DT, Thornton JM (1994) Protein superfamilies and domain superfolds. Nature 372:631–634
Porter CT, Bartlett GJ, Thornton JM (2004) The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133
Arakaki AK, Zhang Y, Skolnick J (2004) Large-scale assessment of the utility of low-resolution protein structures for biochemical function assignment. Bioinformatics 20:1087–1096
Artymiuk PJ, Poirrette AR, Grindley HM, Rice DW, Willett P (1994) A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. J Mol Biol 243:327–344
Kleywegt GJ (1999) Recognition of spatial motifs in protein structures. J Mol Biol 285:1887–1897
Ferre F, Ausiello G, Zanzoni A, Helmer-Citterich M (2004) SURFACE: a database of protein surface regions for functional annotation. Nucleic Acids Res 32:D240–D244
Redfern OC, Dessailly BH, Dallman TJ, Sillitoe I, Orengo CA (2009) FLORA: a novel method to predict protein function from structure in diverse superfamilies. PLoS Comput Biol 5:e1000485
Schmitt S, Kuhn D, Klebe G (2002) A new method to detect related function among proteins independent of sequence and fold homology. J Mol Biol 323:387–406
Gold ND, Jackson RM (2006) Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships. J Mol Biol 355:1112–1124
Kinoshita K, Nakamura H (2005) Identification of the ligand binding sites on the molecular surface of proteins. Protein Sci 14:711–718
Morris RJ, Najmanovich RJ, Kahraman A, Thornton JM (2005) Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics 21:2347–2355
Binkowski TA, Adamian L, Liang J (2003) Inferring functional relationships of proteins from local sequence and spatial surface patterns. J Mol Biol 332:505–526
Binkowski TA, Freeman P, Liang J (2004) pvSOAR: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins. Nucleic Acids Res 32:W555–W558
Binkowski TA, Joachimiak A (2008) Protein functional surfaces: global shape matching and local spatial alignments of ligand binding sites. BMC Struct Biol 8:45
Laskowski RA, Watson JD, Thornton JM (2005) ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res 33:W89–W93
Pal D, Eisenberg D (2005) Inference of protein function from protein structure. Structure (Camb) 13:121–130
Chikhi R, Sael L, Kihara D (2010) Real-time ligand binding pocket database search using local surface descriptors. Proteins 78:2007–2028
Sael L, Kihara D (2011) Binding ligand prediction for proteins using partial matching of local surface patches. Int J Mol Sci 11:5009–5026
Sael L, Kihara D (2012) Detecting local ligand-binding site similarity in non-homologous proteins by surface patch comparison. Proteins (in press)
Novotni M, Klein R (2003) 3D Zernike descriptors for content based shape retrieval. In: ACM symposium on solid and physical modeling, proceedings of the eighth ACM symposium on solid modeling and applications pp 216–225
Canterakis N (1999) 3D Zernike moments and Zernike affine invariants for 3D image analysis and recognition. In: Proceedings of 11th scandinavian conference on image analysis, pp 85–93
Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA (2001) Electrostatics of nanosystems: application to microtubules and the ribosome. Proc Natl Acad Sci USA 98:10037–10041
Li B, Turuvekere S, Agrawal M, La D, Ramani K, Kihara D (2007) Characterization of local geometry of protein surfaces with the visibility criterion. Proteins 71:670–683
Huang B, Schroeder M (2006) LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 6:19
Demange G, Gale D, Stomayor M (1986) Multi-item auctions. J Polit Econ 94:863–872
Kahraman A, Morris RJ, Laskowski RA, Favia AD, Thornton JM (2010) On the diversity of physicochemical environments experienced by identical ligands in binding pockets of unrelated proteins. Proteins 78:1120–1136
Gribskov M, Robinson NL (1996) Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 20:25–33
Sael L, Kihara D (2012) Constructing patch-based ligand-binding pocket database for predicting function of proteins. BMC Bioinform (in press)
Wallach I, Lilien R (2009) The protein-small-molecule database, a non-redundant structural resource for the analysis of protein-ligand binding. Bioinformatics 25:615–620
Bender A, Glen RC (2005) A discussion of measures of enrichment in virtual screening: comparing the information content of descriptors with increasing levels of sophistication. J Chem Inf Model 45:1369–1375
Venkatraman V, Chakravarthy PR, Kihara D (2009) Application of 3D Zernike descriptors to shape-based ligand similarity searching. J Cheminform 1:19
Huang SY, Zou X (2010) Advances and challenges in protein-ligand docking. Int J Mol Sci 11:3016–3034
Hulo N, Bairoch A, Bulliard V, Cerutti L, De CE, Langendijk-Genevaux PS, Pagni M, Sigrist CJ (2006) The PROSITE database. Nucleic Acids Res 34:D227–D230
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJ, Silventoinen V, Studholme DJ, Vaughan R, Wu CH (2005) InterPro, progress and status in 2005. Nucleic Acids Res 33:D201–D205
Bru C, Courcelle E, Carrere S, Beausse Y, Dalmar S, Kahn D (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33:D212–D215
Chitale M, Kihara D (2011) Computational protein function prediction: framework and challenges. In: Kihara D (ed) Protein function prediction for omis era. Springer, London, pp 1–17
John B, Sali A (2004) Detection of homologous proteins by an intermediate sequence search. Protein Sci 13:54–62
Salamov AA, Suwa M, Orengo CA, Swindells MB (1999) Combining sensitive database searches with multiple intermediates to detect distant homologues. Protein Eng 12:95–100
Park J, Teichmann SA, Hubbard T, Chothia C (1997) Intermediate sequences increase the detection of homology between sequences. J Mol Biol 273:349–354
Hawkins T, Luban S, Kihara D (2006) Enhanced automated function prediction using distantly related sequences and contextual association by PFP. Protein Sci 15:1550–1556
Hawkins T, Chitale M, Luban S, Kihara D (2009) PFP: automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74:566–582
Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258–D261
Martin DM, Berriman M, Barton GJ (2004) GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinform 5:178
Khan S, Situ G, Decker K, Schmidt CJ (2003) GoFigure: automated gene ontology annotation. Bioinformatics 19:2484–2485
Zehetner G (2003) OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res 31:3799–3803
Hawkins T, Chitale M, Kihara D (2010) Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP. BMC Bioinform 11:265
Si L, Yu D, Kihara D, Yi F (2008) Combining sequence similarity scores and textual information for gene function annotation in the literature. Inf Retr 11:389–404
Chitale M, Hawkins T, Park C, Kihara D (2009) ESG: extended similarity group method for automated protein function prediction. Bioinformatics 25:1739–1745
Wass MN, Sternberg MJ (2008) ConFunc–functional annotation in the twilight zone. Bioinformatics 24:798–806
Engelhardt BE, Jordan MI, Muratore KE, Brenner SE (2005) Protein molecular function prediction by Bayesian phylogenomics. PLoS Comput Biol 1:e45
Krishnamurthy N, Brown D, Sjolander K (2007) FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function. BMC Evol Biol 7(Suppl 1):S12
Friedberg I, Harder T, Godzik A (2006) JAFA: a protein function annotation meta-server. Nucleic Acids Res 34:W379–W381
Chitale M, Hawkins T, Kihara D (2009) Automated prediction of protein function from sequence. In: Bujnicki J (ed) Prediction of protein structure, functions, and interactions. Wiley, London, pp 63–86
Uniprot Consortium (2010) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res 38:D142–D148
Friedberg I, Jambon M, Godzik A (2006) New avenues in protein function prediction. Protein Sci 15:1527–1529
Lopez G, Rojas A, Tress M, Valencia A (2007) Assessment of predictions submitted for the CASP7 function prediction category. Proteins 69:165–174
Acknowledgments
This work is supported in part by the National Institute of General Medical Sciences of the National Institutes of Health (R01GM075004, R01GM097528), the National Science Foundation (DMS0800568, EF0850009, IIS0915801) and Showalter Trust. MC is supported by Bilsland Dissertation Fellowship from College of Science, Purdue University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Lee Sael and Meghana Chitale contributed equally to this article.
Rights and permissions
About this article
Cite this article
Sael, L., Chitale, M. & Kihara, D. Structure- and sequence-based function prediction for non-homologous proteins. J Struct Funct Genomics 13, 111–123 (2012). https://doi.org/10.1007/s10969-012-9126-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10969-012-9126-6