Skip to main content

Protein Structure Databases

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Part of the book series: Methods in Molecular Biology ((MIMB,volume 609))

Abstract

Web-based protein structure databases come in a wide variety of types and levels of information content. Those having the most general interest are the various atlases that describe each experimentally determined protein structure and provide useful links, analyses, and schematic diagrams relating to its 3D structure and biological function. Also of great interest are the databases that classify 3D structures by their folds as these can reveal evolutionary relationships which may be hard to detect from sequence comparison alone. Related to these are the numerous servers that compare folds – particularly useful for newly solved structures, and especially those of unknown function. Beyond these there are a vast number of databases for the more specialized user, dealing with specific families, diseases, structural features, and so on.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jr., Brice, M. D., Rodgers, J. R., et al. (1977) The Protein Data Bank: a computer-based archival file of macromolecular structures. J Mol Biol 112, 535–542.

    Article  CAS  PubMed  Google Scholar 

  2. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., et al. (2000) The Protein Data Bank. Nucleic Acids Res 28, 235–242.

    Article  CAS  PubMed  Google Scholar 

  3. Berman, H. M., Henrick, K., Nakamura, H. (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10, 980.

    Article  CAS  PubMed  Google Scholar 

  4. Westbrook, J., Fitzgerald, P. M. (2003) The PDB format, mmCIF, and other data formats. Methods Biochem Anal 44, 161–179.

    CAS  PubMed  Google Scholar 

  5. Westbrook, J., Ito, N., Nakamura, H., Henrick, K., Berman, H. M. (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21, 988–992.

    Article  CAS  PubMed  Google Scholar 

  6. Brändén, C.-I., Jones, T. A. (1990) Between objectivity and subjectivity. Nature 343, 687–689.

    Article  Google Scholar 

  7. Hooft, R. W. W., Vriend, G., Sander, C., Abola, E. E. (1996) Errors in protein structures. Nature 381, 272.

    Article  CAS  PubMed  Google Scholar 

  8. Kleywegt, G. J. (2000) Validation of protein crystal structures. Acta Crystallogr D56, 249–265.

    CAS  Google Scholar 

  9. Laskowski, R. A. (2009) Structural quality assurance, in (Gu, J., Bourne, P. E., eds.) Structural Bioinformatics, 2nd ed., John Wiley, New Jersey, pp. 341–375.

    Google Scholar 

  10. Brown, E. N., Ramaswamy, S. (2007) Quality of protein crystal structures. Acta Crystallogr D63, 941–950.

    CAS  Google Scholar 

  11. Henrick, K., Thornton, J. M. (1998) PQS: a protein quaternary structure file server. Trends Biochem Sci 23, 358–361.

    Article  CAS  PubMed  Google Scholar 

  12. Krissinel, E., Henrick, K. (2007) Inference of macromolecular assemblies from crystalline state. J Mol Biol 372, 774–797.

    Article  CAS  PubMed  Google Scholar 

  13. Hühne, R., Koch, F. T., Sühnel, J. (2007) A comparative view at comprehensive information resources on three-dimensional structures of biological macro-molecules. Brief Funct Genomic Proteomic 6, 220–239.

    Article  PubMed  Google Scholar 

  14. Murzin, A. G., Brenner, S. E., Hubbard, T., Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247, 536–540.

    CAS  PubMed  Google Scholar 

  15. Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., Thornton, J. M. (1997) CATH: a hierarchic classification of protein domain structures. Structure 5, 1093–1108.

    Article  CAS  PubMed  Google Scholar 

  16. Finn, R. D., Mistry, J., Schuster-Böckler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T. et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34, D247–D251.

    Article  CAS  PubMed  Google Scholar 

  17. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J. et al. (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 32, D262–D266.

    Article  CAS  PubMed  Google Scholar 

  18. Lovell, S. C., Davis, I. W., Arendall III, W. B., de Bakker, P. I. W., Word, J. M., Prisant, M. G. et al. (2003) Structure validation by C-alpha geometry: phi, psi, and C-beta deviation. Proteins Struct Funct Genet 50, 437–450.

    Article  CAS  PubMed  Google Scholar 

  19. Brenner, S. E. (2001) A tour of structural genomics. Nat Rev Genet 2, 801–809.

    Article  CAS  PubMed  Google Scholar 

  20. Boutselakis, H., Dimitropoulos, D., Fillon, J., Golovin, A., Henrick, K., Hussain, A. et al. (2003) E-MSD: the European Bioinformatics Institute Macromolecular Structure Database. Nucleic Acids Res 31, 458–462.

    Article  CAS  PubMed  Google Scholar 

  21. Golovin, A., Oldfield, T. J., Tate, J. G., Velankar, S., Barton, G. J., Boutselakis, H. et al. (2004) E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res 32, D211–D216.

    Article  CAS  PubMed  Google Scholar 

  22. Velankar, S., McNeil, P., Mittard-Runte, V., Suarez, A., Barrell, D., Apweiler, R. et al. (2005) E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res 33, D262–D265.

    Article  CAS  PubMed  Google Scholar 

  23. Tagari, M., Tate, J., Swaminathan, G. J., Newman, R., Naim, A., Vranken, W., et al. (2006) E-MSD: improving data deposition and structure quality. Nucleic Acids Res. 34, D287–D290.

    Article  CAS  PubMed  Google Scholar 

  24. Krissinel, E., Henrick K. (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D60, 2256–2268.

    CAS  Google Scholar 

  25. Golovin, A., Dimitropoulos, D., Oldfield, T., Rachedi, A., Henrick, K. (2005) MSDsite: a database search and retrieval system for the analysis and viewing of bound ligands and active sites. Proteins 58, 190–199.

    Article  CAS  PubMed  Google Scholar 

  26. Hartshorn, M. J. (2002) AstexViewer: a visualisation aid for structure-based drug design. J Comput-Aided Mol Design 16, 871–881.

    Article  CAS  Google Scholar 

  27. Oldfield, T. J. (2004) A Java applet for multiple linked visualization of protein structure and sequence. J Comput-Aided Mol Design 18, 225–234.

    Article  CAS  Google Scholar 

  28. Reichert, J., Sühnel, J. (2002) The IMB Jena Image Library of Biological Macromolecules: 2002 update. Nucleic Acids Res 30, 253–254.

    Article  PubMed  Google Scholar 

  29. Laskowski, R. A., Chistyakov, V. V., Thornton, J. M. (2005) PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Res 33, D266–D268.

    Article  CAS  PubMed  Google Scholar 

  30. Laskowski, R. A., MacArthur, M. W., Moss, D. S., Thornton, J. M. (1993) PROCHECK – a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26, 283–291.

    Article  CAS  Google Scholar 

  31. Laskowski, R. A. (2007) Enhancing the functional annotation of PDB structures in PDBsum using key figures extracted from the literature. Bioinformatics 23, 1824–1827.

    Article  CAS  PubMed  Google Scholar 

  32. Porter, C. T., Bartlett, G. J., Thornton, J. M. (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32, D129–D133.

    Article  CAS  PubMed  Google Scholar 

  33. Sigrist, C. J. A., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M. et al. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3, 265–274.

    Article  CAS  PubMed  Google Scholar 

  34. Glaser, F., Rosenberg, Y., Kessel, A., Pupko, T., Ben Tal, N. (2004) The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures. Proteins 58, 610–617.

    Article  Google Scholar 

  35. Wallace, A. C., Laskowski, R. A., Thornton, J. M. (1995) LIGPLOT: A program to generate schematic diagrams of protein-ligand interactions. Prot Eng 8, 127–134.

    Article  CAS  Google Scholar 

  36. Luscombe, N. M., Laskowski, R. A., Thornton, J. M. (1997) NUCPLOT: a program to generate schematic diagrams of protein-nucleic acid interactions. Nucleic Acids Res 25, 4940–4945.

    Article  CAS  PubMed  Google Scholar 

  37. Kulikova, T., Akhtar, R., Aldebert, P., Althorpe, N., Andersson, M., Baldwin, A. et al. (2007) EMBL Nucleotide Sequence Database in 2006. Nucleic Acids Res 35, D16–D20.

    Article  CAS  PubMed  Google Scholar 

  38. Schwede, T., Kopp, J., Guex, N., Peitsch, M. C. (2003) SWISS-MODEL: an automated protein-homology server. Nucleic Acids Res 31, 3381–3385.

    Article  CAS  PubMed  Google Scholar 

  39. Eyrich, V. A., Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S., Fiser, A., Pazos, F. et al. (2001) EVA: continuous automatic evaluation of protein structure prediction servers. Bioinformatics 17, 1242–1243.

    Article  CAS  PubMed  Google Scholar 

  40. Kopp, J., Schwede, T. (2004) The SWISS-MODEL Repository of annotated three-dimensional protein structure homology models. Nucleic Acids Res 32, D230–D234.

    Article  CAS  PubMed  Google Scholar 

  41. Pieper, U., Eswar, N., Braberg, H., Madhusudhan, M. S., Davis, F. P., Stuart, A. C., et al. (2004) MODBASE: a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 32, D217–D222.

    Article  CAS  PubMed  Google Scholar 

  42. Moult, J. (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15, 285–289.

    Article  CAS  PubMed  Google Scholar 

  43. Bujnicki, J. M., Elofsson, A., Fischer, D., Rychlewski, L. (2001) Livebench-1: continuous benchmarking of protein structure prediction servers. Protein Sci 10, 352–361.

    Article  CAS  PubMed  Google Scholar 

  44. Marsden, R. L., Ranea, J. A. G., Sillero, A., Redfern, O., Yeats, C., Maibaum, M. et al. (2006) Exploiting protein structure data to explore the evolution of protein function and biological complexity. Phil Trans R Soc B-Biol Sci 361, 425–440.

    Article  CAS  Google Scholar 

  45. Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., Thornton, J. M. (1997) CATH: a hierarchic classification of protein domain structures. Structure 5, 1093–1108.

    Article  CAS  PubMed  Google Scholar 

  46. Jefferson, E. R., Walsh, T. P., Barton, G. J. (2008) A comparison of SCOP and CATH with respect to domain-domain interactions. Proteins 70, 54–62.

    Article  CAS  PubMed  Google Scholar 

  47. Kolodny, R., Petrey, D., Honig, B. (2006) Protein structure comparison: implications for the nature of ‘fold space’, and structure and function prediction. Curr Opin Struct Biol 16, 393–398.

    Article  CAS  PubMed  Google Scholar 

  48. Orengo, C. A., Jones, D. T., Thornton, J. M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.

    Article  CAS  PubMed  Google Scholar 

  49. Novotny, M., Madsen, D., Kleywegt, G. J. (2004) Evaluation of protein fold comparison servers. Proteins 54, 260–270.

    Article  CAS  PubMed  Google Scholar 

  50. Carugo, O. (2006) Rapid methods for comparing protein structures and scanning structure databases. Curr Bioinform 1, 75–83.

    Article  CAS  Google Scholar 

  51. Kleywegt, G. J., Harris, M. R., Zou, J.-y., Taylor, T. C., Wählby, Jones T. A. (2004) The Uppsala Electron-Density Server. Acta Crystallogr D60, 2240–2249.

    CAS  Google Scholar 

  52. Chen, J., Anderson, J. B., DeWeese-Scott, C., Fedorova, N. D., Geer, L. Y., He, S. et al. (2003) MMDB: Entrez’s 3D-structure database. Nucleic Acids Res 31, 474–477.

    Article  CAS  PubMed  Google Scholar 

  53. Bates, P. A., Kelley, L. A., MacCallum, R. M., Sternberg, M. J. E. (2001) Enhancement of protein modelling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins 5, 39–46.

    Article  PubMed  Google Scholar 

  54. Lund, O., Frimand, K., Gorodkin, J., Bohr, H., Bohr, J., Hansen, J., Brunak, S. (1997) Protein distance constraints predicted by neural networks and probability density functions. Protein Eng 10, 1241–1248.

    Article  CAS  PubMed  Google Scholar 

  55. Lambert, C., Leonard, N., De Bolle, X., Depiereux, E. (2002) ESyPred3D: Prediction of proteins 3D structures. Bioinformatics 18, 1250–1256.

    Article  CAS  PubMed  Google Scholar 

  56. Pieper, U., Eswar, N., Davis, F. P., Braberg, H., Madhusudhan, M. S., Rossi, A. et al. (2006) MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 34, D291–D295.

    Article  CAS  PubMed  Google Scholar 

  57. Pearl, F., Todd, A., Sillitoe, I., Dibley, M., Redfern, O., Lewis, T. et al. (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33, D247–D251.

    Article  CAS  PubMed  Google Scholar 

  58. Shindyalov, I. N., Bourne, P. E. (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 11, 739–747.

    Article  CAS  PubMed  Google Scholar 

  59. Holm, L., Sander, C. (1996) Mapping the protein universe. Science 273, 595–603.

    Article  CAS  PubMed  Google Scholar 

  60. Marti-Renom, M. A., Pieper, U., Madhusudhan, M. S., Rossi, A., Eswar, N., Davis, F. P. et al. (2007) DBAli tools: mining the protein structure space. Nucleic Acids Res 35, W393–W397.

    Article  PubMed  Google Scholar 

  61. Ye, Y., Godzik, A. (2003) Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19, ii246–ii255.

    PubMed  Google Scholar 

  62. Kawabata, T. (2003) MATRAS: a program for protein 3D structure comparison. Nucleic Acids Res 31, 3367–3369.

    Article  CAS  PubMed  Google Scholar 

  63. Martin, A. C. R. (2000) The ups and downs of protein topology; rapid comparison of protein structure. Protein Eng 13, 829–837.

    Article  CAS  PubMed  Google Scholar 

  64. Gibrat, J. F., Madej, T., Bryant, S. H. (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6, 377–385.

    Article  CAS  PubMed  Google Scholar 

  65. Chandonia, J. M., Hon, G., Walker, N. S., Lo Conte, L., Koehl, P., Levitt, M., Brenner, S. E. (2004) The ASTRAL compendium in 2004. Nucleic Acids Res 32, D189–D192.

    Article  CAS  PubMed  Google Scholar 

  66. Hobohm, U., Scharf, M., Schneider, R., Sander, C. (1992) Selection of representative protein data sets. Protein Sci 1, 409–417.

    Article  CAS  PubMed  Google Scholar 

  67. Wang, G., Dunbrack, R. L. Jr. (2003) PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591.

    Article  CAS  PubMed  Google Scholar 

  68. Gerstein, M., Krebs, W. (1998) A database of macromolecular motions. Nucleic Acids Res 26, 4280–4290.

    Article  CAS  PubMed  Google Scholar 

  69. Lomize, M. A., Lomize, A. L., Pogozheva, I. D. and Mosberg, H. I. (2006) OPM: Orientations of Proteins in Membranes database. Bioinformatics 22, 623–625.

    Article  CAS  PubMed  Google Scholar 

  70. Lai, Y. L., Yen, S. C., Yu, S. H., Hwang, J. K. (2007) pKNOT: the protein KNOT web server. Nucleic Acids Res 35, W420–W424.

    Article  PubMed  Google Scholar 

  71. Kolesov, G., Virnau, P., Kardar, M., Mirny, L. A. (2007) Protein knot server: detection of knots in protein structures. Nucleic Acids Res 35, W425–W428.

    Article  PubMed  Google Scholar 

Download references

Acknowledgment

The author would like to thank Tom Oldfield for useful comments on this chapter.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Laskowski, R.A. (2010). Protein Structure Databases. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 609. Humana Press. https://doi.org/10.1007/978-1-60327-241-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-241-4_4

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-240-7

  • Online ISBN: 978-1-60327-241-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics