Skip to main content
Log in

Expectations from Structural Genomics Revisited

An Analysis of Structural Genomics Targets

  • Short Communication
  • Published:
American Journal of Pharmacogenomics

Abstract

Background: Current structural genomics projects are being driven by two main goals; to produce a representative set of protein folds that could be used as templates for comparative modeling purposes, and to provide insight into the function of the currently unannotated protein sequences. Such projects may reveal that a newly determined protein structure shares structural similarity with a previously observed structure or that it is a novel fold. The manner in which structure can be used to suggest the function of a protein will depend on the number and diversity of homologous sequences and the extent to which these sequences are functionally characterized.

Method and results: Using sequence searching methods, we analyzed structural genomics target sequences to ascertain if they were members of functionally characterized protein families, protein families of unknown function, or orphan sequences. This analysis provided an indication of what could be expected to emerge from structural genomics projects. Matches were found to approximately 25% of the current functionally unannotated protein families in the PFAM database (protein families database of alignments and hidden Markov models). The 16% of strict orphan sequences will be the most problematic if their structures reveal novel folds. However, out of the remaining target sequences that match families whose members are largely of unknown function, 28% are particularly interesting in that they are part of protein families with considerable sequence diversity.

Conclusion: The determination of a new structure of a member of these families is likely to offer considerable insight into possible functional roles of these proteins even if it is a new fold. Mapping the sequence conservation onto the structure may reveal functionally important residues for further study by experimental methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Table I
Fig. 2

References

  1. Burley SK, Almo SC, Bonanno JB, et al. Structural genomics: beyond the human genome project. Nat Genet 1999; 23: 151–7

    Article  PubMed  CAS  Google Scholar 

  2. Dry S, McCarthy S, Harris T. Structural genomics in the biotechnology sector. Nat Struct Biol 2000 Nov; 7 Suppl.: 946–9

    Google Scholar 

  3. Brenner S, Levitt M. Expectations from structural genomics. Protein Sci 2000; 9: 197–200

    Article  PubMed  CAS  Google Scholar 

  4. Murzin AG, Brenner SE, Hubbard T, et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995; 247: 536–40

    PubMed  CAS  Google Scholar 

  5. Vitkup D, Melamud E, Moult J, et al. Completeness in structural genomics. Nat Struct Biol 2001; 8(6): 559–66

    Article  PubMed  CAS  Google Scholar 

  6. Bourne PE, Allerston CJK, Krebs W, et al. The status of structural genomics defined through the analysis of current targets and structures. In: Alunan RB, Dunker AK, Hunter L, et al., editors. Pacific symposium in biocomputing. Singapore: World Scientific Publishing, 2004: 375–86

    Google Scholar 

  7. O’Toole N, Grabowski M, Otwinowski Z, et al. The structural genomics experimental pipeline: insights from global target lists. Proteins 2004; 56: 201–10

    Article  PubMed  Google Scholar 

  8. Wild DL, Saqi MAS. Structural proteomics: inferring function from protein structure. Curr Proteomics 2004; 1: 59–65

    Article  CAS  Google Scholar 

  9. Altschul SF, Madden TL, Schaeffer AA, et al. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 1997; 25: 3389–402

    Article  PubMed  CAS  Google Scholar 

  10. Bateman A, Birney E, Cerruti L, et al. The PFAM protein families database. Nucleic Acids Res 2002; 30: 276–80

    Article  PubMed  CAS  Google Scholar 

  11. Eddy SR. Profile hidden Markov models. Bioinformatics 1998; 14: 755–63

    Article  PubMed  CAS  Google Scholar 

  12. Fischer D, Eisenberg D. Finding families for genomic ORFans. Bioinformatics 1999; 15: 759–62

    Article  PubMed  CAS  Google Scholar 

  13. Siew N, Fischer D. Analysis of singleton ORFans in fully sequenced microbial genomes. Proteins 2003; 53(2): 241–51

    Article  PubMed  CAS  Google Scholar 

  14. Siew N, Fischer D. Twenty thousand ORFan microbial protein families for the biologist? Structure 2003; 11(1): 7–9

    Article  PubMed  CAS  Google Scholar 

  15. Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996; 257: 342–58

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

Dr Wild acknowledges support from the National Institutes of Health Grant no. 1P01GM63208-01 (Tools and Data Resources in Support of Structural Genomics). We are grateful to Dr Arne Mueller for the use of his program for parsing PSIBLAST output. The authors have no conflicts of interest that are directly relevant to the contents of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David L. Wild.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saqi, M.A.S., Wild, D.L. Expectations from Structural Genomics Revisited. Am J Pharmacogenomics 5, 339–342 (2005). https://doi.org/10.2165/00129785-200505050-00006

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2165/00129785-200505050-00006

Keywords

Navigation