Abstract
Proteome complexity increases in the evolution mostly by means of gene duplication followed by divergence. In this genome-scale study of human genome I show that density distribution of duplicate gene pairs along the axis of protein divergence between pair members forms two main peaks with a small peak and plateau before the first main peak. This picture indicates the existence of three evolutionary stages of duplicate gene evolution. The analysis of various functional parameters (gene expression level and breadth, transcription factor targets, protein interaction networks) suggests that subfunctionalization (partition of function) is a predominant mode of divergence in the first main peak, whereas neofunctionalization (acquiring of novel functions) prevails in the second main peak. The young duplicate pairs show a much higher expression level compared with singleton genes and more diverged duplicates, which indicates that requirement for high gene dosage is important for retention of duplicates just after the duplication event. Thus, a prevailing route of duplicate evolution seems to be the high gene dosage–subfunctionalization–neofunctionalization. This adaptationist model suggests that an organism is evolving in the direction of its most intensively used functions.
Similar content being viewed by others
References
Ainali C, Simon M, Freilich S, Espinosa O, Hazelwood L, Tsoka S, Ouzounis CA, Hancock JM (2011) Protein coalitions linked by rapidly evolving proteins in a core mammalian biochemical network. BMC Evol Biol 11:142
Anisimova M, Kosiol C (2009) Investigating protein-coding sequence evolution with probabilistic codon substitution models. Mol Biol Evol 26:255–271
Byun-McKay SA, Geeta R (2007) Protein subcellular relocalization: a new perspective on the origin of novel genes. Trends Ecol Evol 22:338–344
Conant GC, Wolfe KH (2008) Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet 9:938–950
Davis JC, Petrov DA (2005) Do disparate mechanisms of duplication add similar genes to the genome? Trends Genet 21:548–551
Des Marais DL, Rausher MD (2008) Escape from adaptive conflict after duplication in an anthocyanin pathway gene. Nature 454:762–765
Edger PP, Pires JC (2009) Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes. Chromosom Res 17:699–717
Farre D, Alba MM (2010) Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Mol Biol Evol 27:325–335
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531–1545
Force A, Cresko WA, Pickett FB, Proulx SR, Amemiya C, Lynch M (2005) The origin of subfunctions and modular gene regulation. Genetics 170:433–446
Gu X, Wang Y, Gu J (2002) Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nat Genet 31:205–209
Han MV, Demuth JP, McGrath CL, Casola C, Hahn MW (2009) Adaptive evolution of young gene duplicates in mammals. Genome Res 19:859–867
He X, Zhang J (2005) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169:1157–1164
He X, Zhang J (2006) Higher duplicability of less important genes in yeast genomes. Mol Biol Evol 23:144–151
Hittinger CT, Carroll SB (2007) Gene duplication and the adaptive evolution of a classic genetic switch. Nature 449:677–681
Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S et al (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40:D306–D312
Innan H, Kondrashov F (2010) The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet 11:97–108
Kondrashov FA, Kondrashov AS (2006) Role of selection in fixation of gene duplications. J Theor Biol 239:141–511
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 20. Bioinformatics 23:2947–2948
Li WH (1993) Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol 36:96–99
Li WH, Wu CI, Luo CC (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitutions considering the relative likelihood of nucleotide and codon changes. Mol Biol Evol 2:150–174
Li J, Zhang Z, Vang S, Yu J, Wong GK, Wang J (2009) Correlation between Ka/Ks and Ks is related to substitution model and evolutionary lineage. J Mol Evol 68:414–423
Liang H, Plazonic KR, Chen J, Li WH, Fernández A (2008) Protein under-wrapping causes dosage sensitivity and decreases gene duplicability. PLoS Genet 4:e11
Liao BY, Zhang J (2006) Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. Mol Biol Evol 23:1119–1128
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27:1739–1740
Maglott D, Ostell J, Pruitt KD, Tatusova T (2011) Entrez gene: gene-centered information at NCBI. Nucleic Acids Res 39:D52–D57
Marques AC, Vinckenbosch N, Brawand D, Kaessmann H (2008) Functional diversification of duplicate genes through subcellular adaptation of encoded proteins. Genome Biol 9:R54
Nei M, Gojobori T (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3:418–426
Ohno S (1970) Evolution by gene duplication. Springer, New York
Pamilo P, Bianchi NO (1993) Evolution of the Zfx and Zfy genes—rates and interdependence between the genes. Mol Biol Evol 10:271–281
Pearson W (2004) Finding protein and nucleotide similarities with FASTA. Curr Protoc Bioinformatics Chapter 3: Unit 39
Qian W, Zhang J (2008) Gene dosage and gene duplicability. Genetics 179:2319–2324
Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S et al (2012) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 40:D13–D25
Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, Patapoutian A, Hampton GM, Schultz PG, Hogenesch JB (2002) Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA 99:4465–4470
Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101:6062–6067
Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, Jensen LJ, von Mering C (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39:D561–D568
Valente AX, Cusick ME (2006) Yeast protein interactome topology provides framework for coordinated-functionality. Nucleic Acids Res 34:2812–2819
Vinogradov AE (2004) Compactness of human housekeeping genes: selection for economy or genomic design? Trends Genet 20:248–253
Vinogradov AE (2008) Modularity of cellular networks shows general center-periphery polarization. Bioinformatics 24:2814–2817
Vinogradov AE (2010a) Systemic factors dominate mammal protein evolution. Proc R Soc B 277:1403–1408
Vinogradov AE (2010b) Human transcriptome nexuses: basic-eukaryotic and metazoan. Genomics 95:345–354
Vinogradov AE, Anatskaya OV (2009) Loss of protein interactions and regulatory divergence in yeast whole-genome duplicates. Genomics 93:534–542
Wagner GP, Pavlicev M, Cheverud JM (2007) The road to modularity. Nat Rev Genet 8:921–931
Wernersson R, Pedersen AG (2003) RevTrans: multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res 31:3537–3539
Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford
Yang Z (2007) PAML 4: a program package for phylogenetic analysis likelihood. Mol Biol Evol 24:1586–1591
Yang Z, Nielsen R (2000) Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17:32–43
Acknowledgments
This study was supported by the Russian Foundation for Basic Research (RFBR). I thank two anonymous reviewers for valuable comments.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Vinogradov, A.E. Large Scale of Human Duplicate Genes Divergence. J Mol Evol 75, 25–33 (2012). https://doi.org/10.1007/s00239-012-9516-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-012-9516-1