Abstract
Background: Current structural genomics projects are being driven by two main goals; to produce a representative set of protein folds that could be used as templates for comparative modeling purposes, and to provide insight into the function of the currently unannotated protein sequences. Such projects may reveal that a newly determined protein structure shares structural similarity with a previously observed structure or that it is a novel fold. The manner in which structure can be used to suggest the function of a protein will depend on the number and diversity of homologous sequences and the extent to which these sequences are functionally characterized.
Method and results: Using sequence searching methods, we analyzed structural genomics target sequences to ascertain if they were members of functionally characterized protein families, protein families of unknown function, or orphan sequences. This analysis provided an indication of what could be expected to emerge from structural genomics projects. Matches were found to approximately 25% of the current functionally unannotated protein families in the PFAM database (protein families database of alignments and hidden Markov models). The 16% of strict orphan sequences will be the most problematic if their structures reveal novel folds. However, out of the remaining target sequences that match families whose members are largely of unknown function, 28% are particularly interesting in that they are part of protein families with considerable sequence diversity.
Conclusion: The determination of a new structure of a member of these families is likely to offer considerable insight into possible functional roles of these proteins even if it is a new fold. Mapping the sequence conservation onto the structure may reveal functionally important residues for further study by experimental methods.
References
Burley SK, Almo SC, Bonanno JB, et al. Structural genomics: beyond the human genome project. Nat Genet 1999; 23: 151–7
Dry S, McCarthy S, Harris T. Structural genomics in the biotechnology sector. Nat Struct Biol 2000 Nov; 7 Suppl.: 946–9
Brenner S, Levitt M. Expectations from structural genomics. Protein Sci 2000; 9: 197–200
Murzin AG, Brenner SE, Hubbard T, et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995; 247: 536–40
Vitkup D, Melamud E, Moult J, et al. Completeness in structural genomics. Nat Struct Biol 2001; 8(6): 559–66
Bourne PE, Allerston CJK, Krebs W, et al. The status of structural genomics defined through the analysis of current targets and structures. In: Alunan RB, Dunker AK, Hunter L, et al., editors. Pacific symposium in biocomputing. Singapore: World Scientific Publishing, 2004: 375–86
O’Toole N, Grabowski M, Otwinowski Z, et al. The structural genomics experimental pipeline: insights from global target lists. Proteins 2004; 56: 201–10
Wild DL, Saqi MAS. Structural proteomics: inferring function from protein structure. Curr Proteomics 2004; 1: 59–65
Altschul SF, Madden TL, Schaeffer AA, et al. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 1997; 25: 3389–402
Bateman A, Birney E, Cerruti L, et al. The PFAM protein families database. Nucleic Acids Res 2002; 30: 276–80
Eddy SR. Profile hidden Markov models. Bioinformatics 1998; 14: 755–63
Fischer D, Eisenberg D. Finding families for genomic ORFans. Bioinformatics 1999; 15: 759–62
Siew N, Fischer D. Analysis of singleton ORFans in fully sequenced microbial genomes. Proteins 2003; 53(2): 241–51
Siew N, Fischer D. Twenty thousand ORFan microbial protein families for the biologist? Structure 2003; 11(1): 7–9
Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996; 257: 342–58
Acknowledgments
Dr Wild acknowledges support from the National Institutes of Health Grant no. 1P01GM63208-01 (Tools and Data Resources in Support of Structural Genomics). We are grateful to Dr Arne Mueller for the use of his program for parsing PSIBLAST output. The authors have no conflicts of interest that are directly relevant to the contents of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saqi, M.A.S., Wild, D.L. Expectations from Structural Genomics Revisited. Am J Pharmacogenomics 5, 339–342 (2005). https://doi.org/10.2165/00129785-200505050-00006
Published:
Issue Date:
DOI: https://doi.org/10.2165/00129785-200505050-00006