The Proteome Folding Project: Proteome-scale prediction of structure and function

  1. Richard Bonneau1,4,6,8
  1. 1Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York 10003, USA;
  2. 2IBM, Austin, Texas 78758, USA;
  3. 3Department of Biochemistry, Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA;
  4. 4Institute for Systems Biology, Seattle, Washington 98103, USA;
  5. 5Medicinal Chemistry Department, University of Washington, Seattle, Washington 98195, USA;
  6. 6Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, New York 10003, USA;
  7. 7Institute of Molecular Systems Biology, ETH Zurich, Zurich CH 8093, Switzerland

    Abstract

    The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions.

    Footnotes

    • 8 Corresponding author.

      E-mail bonneau{at}nyu.edu.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.121475.111. Freely available online through the Genome Research Open Access option.

    • Received January 26, 2011.
    • Accepted July 28, 2011.

    Freely available online through the Genome Research Open Access option.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server