Distribution of Protein Folds in the Three Superkingdoms of Life

  1. Yuri I. Wolf1,4,
  2. Steven E. Brenner2,
  3. Paul A. Bash3, and
  4. Eugene V. Koonin1,5
  1. 1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894 USA; 2Department of Structural Biology, Stanford University, Stanford, California 94305-5126 USA; 3Department of Molecular Pharmacology and Biological Chemistry, Northwestern University, Chicago, Illinois 60611 USA

Abstract

A sensitive protein-fold recognition procedure was developed on the basis of iterative database search using the PSI-BLAST program. A collection of 1193 position-dependent weight matrices that can be used as fold identifiers was produced. In the completely sequenced genomes, folds could be automatically identified for 20%–30% of the proteins, with 3%–6% more detectable by additional analysis of conserved motifs. The distribution of the most common folds is very similar in bacteria and archaea but distinct in eukaryotes. Within the bacteria, this distribution differs between parasitic and free-living species. In all analyzed genomes, the P-loop NTPases are the most abundant fold. In bacteria and archaea, the next most common folds are ferredoxin-like domains, TIM-barrels, and methyltransferases, whereas in eukaryotes, the second to fourth places belong to protein kinases, β-propellers and TIM-barrels. The observed diversity of protein folds in different proteomes is approximately twice as high as it would be expected from a simple stochastic model describing a proteome as a finite sample from an infinite pool of proteins with an exponential distribution of the fold fractions. Distribution of the number of domains with different folds in one protein fits the geometric model, which is compatible with the evolution of multidomain proteins by random combination of domains.

[Fold predictions for proteins from 14 proteomes are available on the World Wide Web atftp://ncbi.nlm.nih.gov/pub/koonin/FOLDS/index.html. The FIDs are available by anonymous ftp at the same location.]

Footnotes

  • 4 Permanent address: Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk 630090, Russia.

  • 5 Corresponding author.

  • E-MAIL koonin{at}ncbi.nlm.nih.gov; FAX (301) 480-9241.

    • Received August 19, 1998.
    • Accepted November 24, 1998.
| Table of Contents

Preprint Server