Elsevier

Gene

Volume 342, Issue 2, 24 November 2004, Pages 263-268
Gene

Different age distribution patterns of human, nematode, and Arabidopsis duplicate genes

https://doi.org/10.1016/j.gene.2004.08.001Get rights and content

Abstract

We studied the age distribution of duplicate genes in each of four eukaryotic genomes: human, Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. The four distributions differ greatly from each other, contrary to the previous proposal of a universal L-shaped distribution in all eukaryotic genomes studied. Indeed, only the distribution in humans is L-shaped. The distribution in Arabidopsis is consistent with the hypothesis of an ancient genome duplication with no recent burst of duplication events, while the distribution in C. elegans is nearly uniform. We also applied a nonparametric method to the human distribution to show that the rate of loss of duplicate genes decreases over time, contrary to the proposal of an exponential decay. One possible explanation of the decreasing rate of loss of duplicate genes over time could be rapid functional divergence between duplicate genes, providing an advantage for the retention of both duplicates.

Introduction

Duplicate genes have been considered to be the primary source of genetic novelty since Ohno (1970), but how duplicate genes persist in a genome remains unclear. Lynch and Conery (2000) were one of the first to use genomic data from various species to address this question. Assuming a constant rate of loss of duplicate genes, they used the distribution of Ks between duplicate genes, which is the number of synonymous substitutions per synonymous site, to estimate the half-life of duplicate genes. However, their analysis might have some weaknesses (Zhang et al., 2001). For example, their data might have included alternatively spliced genes, which can result in false duplicate gene pairs, and they counted the number of duplicate gene pairs instead of the number of duplication events. Later, Lynch and Conery (2003a) refined their analysis by using a phylogeny approach to calculate Ka and Ks between duplicate pairs and by incorporating both the birth and loss rates of duplicate genes into their model. The result was consistent with that of their previous study.

In this study, we adopted a new approach and better-annotated data to reanalyze the age distribution of duplicate genes in a genome. First, we cleaned isoforms and repetitive elements before gene family grouping. Isoforms are defined as two different annotated proteins translated from the same gene and are often mistaken as a duplicate pair with a very small Ks. Besides, two proteins may be grouped together simply because they contain the same repetitive element fragment. By cleaning isoforms and repetitive elements, we excluded the major source of false hits in search for duplicate genes. Second, we constructed a phylogeny for each gene family and calculated Ks for every duplication event within the family. Then the Ks value was used as an index for the age of the duplication event. Third, we used only those events with a Ks value between 0.005 and 1 for our study because a gene pair with Ks=0.005 may be just two different alleles (annotation errors), and because accurate estimation of Ks becomes difficult when Ks>1.

Section snippets

Data download and processing

Human genes were downloaded from the Ensembl database release 19 (ftp://ftp.ensembl.org/pub/human-19.34b/data/fasta/). Caenorhabditis elegans genes were downloaded from the wormbase release 123 (ftp://ftp.wormbase.org/pub/wormbase/archive/). Arabidopsis genes were downloaded from the TIGR database release 5.0 (ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/SEQUENCES/). Drosophila genes were downloaded from BDGP Sequence and Annotation Databases Drosophila release 3.0 (//www.fruitfly.org/sequence/release3download.shtml

Age distribution of duplicate genes in eukaryotic genomes

We analyzed four eukaryotic genomes: human, C. elegans, Arabidopsis, and Drosophila. The numbers of duplications with Ks between 0.005 and 1 were listed in Table 1. The fly genome contains only 108 duplications within the Ks range, compared to at least eight times more duplication events in the three other species. Our stringent criteria of selecting duplicates might be one of the reasons for the small number of young duplications in fly. However, the same criteria were also applied to the

Discussion

Recently, Lynch and Conery (2003b) suggested that the age distribution of gene duplication events in a genome is L-shaped, indicating a steady birth and death process. However, our results do not support this hypothesis, showing that the eukaryotic genomes differ greatly in the shape of age distribution of duplications. In our study, only human genome shows such an L-shaped distribution. The distribution in C. elegans shows only a weak peak of very young duplicate genes, while in Arabidopsis,

Acknowledgments

This study was supported by NIH grants. We thank Andre R.O. Cavalcanti for detecting block duplications in human genome and Dr. Michael Lynch for helpful comments.

References (26)

  • E.E. Eichler

    Recent duplication, domain accretion and the dynamic mutation of the human genome

    Trends Genet.

    (2001)
  • J. Jurka

    Repeats in genomic DNA: mining and meaning

    Curr. Opin. Struct. Biol.

    (1998)
  • J. Jurka

    Repbase Update: a database and an electronic journal of repetitive elements

    Trends Genet.

    (2000)
  • G. Blanc et al.

    Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes

    Plant Cell

    (2004)
  • G. Blanc et al.

    Extensive duplication and reshuffling in the Arabidopsis genome

    Plant Cell

    (2000)
  • J.E. Bowers et al.

    Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events

    Nature

    (2003)
  • A.R. Cavalcanti et al.

    Patterns of gene duplication in Saccharomyces cerevisiae and Caenorhabditis elegans

    J. Mol. Evol.

    (2003)
  • J. Felsenstein

    PHYLIP—Phylogeny Inference Package (Version 3.2)

    Cladistics

    (1989)
  • D. Grant et al.

    Genome organization in dicots: genome duplication in Arabidopsis and synteny between soybean and Arabidopsis

    Proc. Natl. Acad. Sci. U. S. A.

    (2000)
  • X. Gu et al.

    Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution

    Nat. Genet.

    (2002)
  • Z. Gu et al.

    Extent of gene duplication in the genomes of Drosophila, nematode, and yeast

    Mol. Biol. Evol.

    (2002)
  • H.M. Ku et al.

    Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny

    Proc. Natl. Acad. Sci. U. S. A.

    (2000)
  • M. Lynch et al.

    The evolutionary fate and consequences of duplicate genes

    Science

    (2000)
  • Cited by (10)

    • Role of selection in fixation of gene duplications

      2006, Journal of Theoretical Biology
    View all citing articles on Scopus
    View full text