Skip to main content
Log in

Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination

  • Published:
Journal of Structural and Functional Genomics

Abstract

There is a limited repertoire of domain families in nature that are duplicated and combined in different ways to form the set of proteins in a genome. Most proteins in both prokaryote and eukaryote genomes consist of two or more domains, and we show that the family size distribution of multi-domain protein families follows a power law like that of individual families. Most domain pairs occur in four to six different domain architectures: in isolation and in combinations with different partners. We showed previously that within the set of all pairwise domain combinations, most small and medium-sized families are observed in combination with one or two other families, while a few large families are very versatile and combine with many different partners. Though this may appear to be a stochastic pattern, in which large families have more combination partners by virtue of their size, we establish here that all the domain families with more than three members in genomes are duplicated more frequently than would be expected by chance considering their number of neighbouring domains. This duplication of domain pairs is statistically significant for between one and three quarters of all families with seven or more members. For the majority of pairwise domain combinations, there is no known three-dimensional structure of the two domains together, and we term these novel combinations. Novel domain combinations are interesting and important targets for structural elucidation, as the geometry and interaction between the domains will help understand the function and evolution of multi-domain proteins. Of particular interest are those combinations that occur in the largest number of multi-domain proteins, and several of these frequent novel combinations contain DNA-binding domains.

Abbreviations:

SCOP: Structural Classification of Proteins database, PDB: Protein DataBank, HMM: hidden Markov model

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aloy, P. and Russell, R. B. (2002) Proc. Natl. Acad. Sci. USA., 99, 5896-5901.

    Google Scholar 

  2. Aloy P., Ciccarelli F. D., Leutwein C., Gavin A. C., Superti-Furga, G., Bork, P., Bottcher B. and Russell, R.B. (2002) EMBO Rep., 7, 628-635.

    Google Scholar 

  3. Apic, G., Gough, J. and Teichmann, S.A. (2001) J. Mol. Biol., 310, 311-325.

    Google Scholar 

  4. Bashton, M. and Chothia, C. (2002) J. Mol. Biol., 315, 927-939.

    Google Scholar 

  5. Berman, H.M., Battistuz, T., Bhat, T.N., Bluhm, W.F., Bourne, P.E., Burkhardt, K., Feng, Z., Gilliland, G.L., Iype, L., Jain, S., Fagan, P., Marvin, J., Padilla, D., Ravichandran, V., Schneider, B., Thanki, N., Weissig, H., Westbrook, J.D. and Zardecki, C. (2002) The protein data bank. Acta Crystallogr. D Biol. Crystallogr., 58, 899-907.

    Google Scholar 

  6. Blevins, R.A. and Tulinsky, A. (1985) J. Biol. Chem., 260, 4264-4268.

    Google Scholar 

  7. Blundell, T.L. and Mizuguchi, K. (2000) Prog. Biophys. Mol. Biol., 73, 289-295.

    Google Scholar 

  8. Brenner, S.E. (2001) Nat. Rev. Genet., 2, 801-809.

    Google Scholar 

  9. Chothia, C. (1992) Nature, 357, 543-544.

    Google Scholar 

  10. Erdös, P. and Rényi, A. (1960) Magyar Tud. Akad. Mat. Kutato Int. Kozl. 5, 17-61.

    Google Scholar 

  11. Geer, L.Y., Domrachev, M., Lipman D. J., Bryant, S. H. (2002) Genome Res., 12, 1619-1623

    Google Scholar 

  12. Gerstein, M. (1998a). Folding & Design, 3, 497-512.

    Google Scholar 

  13. Gerstein, M. (1998b) Proteins, 33, 518-534.

    Google Scholar 

  14. Gough, J., Karplus, K., Hughey, R. and Chothia, C. (2001) J. Mol. Biol., 313, 903-919.

    Google Scholar 

  15. Gough, J. and Chothia, C. (2002) Nucleic Acids Res., 30, 268-272.

    Google Scholar 

  16. Hegyi, H. and Gerstein, M. (2001) Genome Res., 11, 1632-40.

    Google Scholar 

  17. Jardine, O., Gough, J., Chothia, C. and Teichmann, S.A. (2002) Genome Res., 12, 916-929.

    Google Scholar 

  18. Karplus, K., Barrett, C. and Hughey, R. (1998) Bioinformatics, 14, 846-56.

    Google Scholar 

  19. Koonin, E. V., Wolf, Y. I., and Karev, P. (2002) Nature, 420, 218-223.

    Google Scholar 

  20. Liu, J. and Rost, B. (2001) Protein Sci., 10, 1970-1979.

    Google Scholar 

  21. LoConte, L., Brenner, S.E., Hubbard, T.J., Chothia, C. and Murzin, A.G. (2002) Nucleic Acids Res., 30, 264-7.

    Google Scholar 

  22. Kuznetsov, V.A., Pickalov, V.V., Senko, O.V. and Knott, G.D. (2002) J. Biol. Systems 10, 381-407.

    Google Scholar 

  23. Murzin, A., Brenner, S. E., Hubbard, T. and Chothia, C. (1995) J. Mol. Biol., 247, 536-540.

    Google Scholar 

  24. Orengo, C. A., Jones, D. T. and Thornton, J. M. (1994) Nature, 372, 631-634.

    Google Scholar 

  25. Ponting, C. P. and Russell, R. R. (2002) Annu. Rev. Biophys. Biomol. Struct., 31, 45-71.

    Google Scholar 

  26. Qian, J., Luscombe, N.M. and Gerstein, M. (2001) J. Mol. Biol., 313, 673-681.

    Google Scholar 

  27. Sigler, P. B., Jeffery, B.A., Matthews, B.W. and Blow, D. M. (1966) J. Mol. Biol., 15, 175-192.

    Google Scholar 

  28. Spahn, C. M., Beckmann, R., Eswar, N., Penczek, P. A., Sali, A., Blobel, G. and Frank, J. (2002) Cell, 107, 373-386.

    Google Scholar 

  29. Teichmann, S. A., Park, J. and Chothia, C. (1998) Proc. Natl. Acad. Sci. U.S.A., 95, 14658-14663.

    Google Scholar 

  30. Teichmann, S. A., Chothia, C. and Gerstein, M. (1999) Curr. Op. Struc. Biol., 9, 390-399.

    Google Scholar 

  31. Teichmann, S. A., Rison, S. C., Thornton, J. M., Riley, M., Gough, J. and Chothia, C. (2001) Trends Biotechnol., 19, 482-486.

    Google Scholar 

  32. Teichmann, S. A., Rison, S. C., Thornton, J.M., Riley, M., Gough, J. and Chothia, C. (2001) J. Mol. Biol., 311, 693-708.

    Google Scholar 

  33. Wolf, Y. I., Grishin, N. V. and Koonin, E. V. (2000) J. Mol. Biol. 299, 897-905.

    Google Scholar 

  34. Wuchty, S. (2001) Mol. Biol. Evol. 18, 1715-1723.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Apic, G., Huber, W. & Teichmann, S.A. Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination. J Struct Func Genom 4, 67–78 (2003). https://doi.org/10.1023/A:1026113408773

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026113408773

Navigation