Detecting Selection on Segregating Gene Duplicates in a Population

Stark, Tristan L.; Kaufman, Rebecca S.; Maltepes, Maria A.; Chi, Peter B.; Liberles, David A.

doi:10.1007/s00239-021-10024-2

Detecting Selection on Segregating Gene Duplicates in a Population

Original Article
Published: 02 August 2021

Volume 89, pages 554–564, (2021)
Cite this article

Journal of Molecular Evolution Aims and scope Submit manuscript

Tristan L. Stark¹^nAff2,
Rebecca S. Kaufman¹,
Maria A. Maltepes¹,
Peter B. Chi^1,3 &
…
David A. Liberles ORCID: orcid.org/0000-0003-3487-8826¹

404 Accesses
2 Citations
Explore all metrics

Abstract

Gene duplication is a fundamental process that has the potential to drive phenotypic differences between populations and species. While evolutionarily neutral changes have the potential to affect phenotypes, detecting selection acting on gene duplicates can uncover cases of adaptive diversification. Existing methods to detect selection on duplicates work mostly inter-specifically and are based upon selection on coding sequence changes, here we present a method to detect selection directly on a copy number variant segregating in a population. The method relies upon expected relationships between allele (new duplication) age and frequency in the population dependent upon the effective population size. Using both a haploid and a diploid population with a Moran Model under several population sizes, the neutral baseline for copy number variants is established. The ability of the method to reject neutrality for duplicates with known age (measured in pairwise dS value) and frequency in the population is established through mathematical analysis and through simulations. Power is particularly good in the diploid case and with larger effective population sizes, as expected. With extension of this method to larger population sizes, this is a tool to analyze selection on copy number variants in any natural or experimentally evolving population. We have made an R package available at https://github.com/peterbchi/CNVSelectR/ which implements the method introduced here.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Early evolutionary history and genomic features of gene duplicates in the human genome

Article Open access 20 August 2015

CDROM: Classification of Duplicate gene RetentiOn Mechanisms

Article Open access 14 April 2016

Dosage sensitivity and exon shuffling shape the landscape of polymorphic duplicates in Drosophila and humans

Article 30 December 2021

Data Availability

No original research data were presented in this paper. Code used to perform the analysis is available at https://github.com/TristanLStark/DetectingSelection. An R script to run the full analysis has been made available at https://github.com/peterbchi/CNVSelectR/blob/master/R/CNVSelect_test.R.

References

Anisimova M, Liberles D (2012) Detecting and understanding natural selection. In: Cannarozzi GM, Schneider A (eds) Codon evolution: mechanisms and models, vol 6. Oxford University Press, Oxford, pp 73–96
Chapter Google Scholar
Arvestad L, Lagergren J, Sennblad B (2009) The gene evolution model and computing its associated probabilities. J ACM 56(2):1–100. https://doi.org/10.1145/1502793.1502796
Article Google Scholar
Bornholdt D, Atkinson TP, Bouadjar B, Catteau B, Cox H, De Silva D, Grzeschik K (2013) Genotype-phenotype correlations emerging from the identification of missense mutations in MBTPS2. Hum Mutat 34(4):587–594. https://doi.org/10.1002/humu.22275
Article CAS PubMed Google Scholar
Conant GC, Wagner A (2003) Asymmetric sequence divergence of duplicate genes. Genome Res 13(9):2052–2058. https://doi.org/10.1101/gr.1252603
Article CAS PubMed PubMed Central Google Scholar
De Sanctis B, Krukov I, de Koning AJ (2017) Allele age under non-classical assumptions is clarified by an exact computational Markov chain approach. Sci Rep 7(1):1–11. https://doi.org/10.1038/s41598-017-12239-0
Article CAS Google Scholar
Force A, Lynch M, Pickett FB, Amores A, Yan Y-L, Postlethwait J (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151(4):1531–1545
Article CAS PubMed PubMed Central Google Scholar
Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, Ahuja SK (2005) The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307(5714):1434–1440. https://doi.org/10.1126/science.1101160
Article CAS PubMed Google Scholar
Guerin MN, Weinstein DJ, Bracht JR (2019) Stress adapted Mollusca and Nematoda exhibit convergently expanded hsp70 and AIG1 gene families. J Mol Evol 87(9–10):289–297. https://doi.org/10.1007/s00239-019-09900-9
Article CAS PubMed Google Scholar
Hsieh P, Vollger MR, Dang V, Porubsky D, Baker C, Cantsilieris S, Sorensen M et al (2019) Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes. Science 366:6463
Article Google Scholar
Hughes AL (1994) The evolution of functionally novel proteins after gene duplication. Proc R Soc Lond B 256(1346):119–124. https://doi.org/10.1098/rspb.1994.0058
Article CAS Google Scholar
Innan H, Kondrashov F (2010) The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet 11(2):97–108. https://doi.org/10.1038/nrg2689
Article CAS PubMed Google Scholar
Itsara A, Wu H, Smith JD, Nickerson DA, Romieu I, London SJ, Eichler EE (2010) De novo rates and selection of large copy number variation. Genome Res 20(11):1469–1481. https://doi.org/10.1101/gr.107680.110
Article CAS PubMed PubMed Central Google Scholar
Katju V, Lynch M (2006) On the formation of novel genes by duplication in the Caenorhabditis elegans genome. Mol Biol Evol 23(5):1056–1067. https://doi.org/10.1093/molbev/msj114
Article CAS PubMed Google Scholar
Konrad A, Teufel AI, Grahnen JA, Liberles DA (2011) Toward a general model for the evolutionary dynamics of gene duplicates. Genome Biol Evol 3:1197–1209. https://doi.org/10.1093/gbe/evr093
Article CAS PubMed PubMed Central Google Scholar
Latouche G, Ramaswami V (1999) Introduction to matrix analytic methods in stochastic modeling. ASA-SIAM series on statistics and applied mathematics. Society for Industrial and Applied Mathematics, Philadelphia
Book Google Scholar
Lauer S, Avecilla G, Spealman P, Sethia G, Brandt N, Levy SF, Gresham D (2018) Single-cell copy number variant detection reveals the dynamics and diversity of adaptation. PLoS Biol. 16(12):e3000069
Article PubMed PubMed Central Google Scholar
Liberles DA, Teufel AI, Liu L, Stadler T (2013) On the need for mechanistic models in computational genomics and metagenomics. Genome Biol. Evol. 5(10):2008–2018
Article PubMed PubMed Central Google Scholar
Lynch M, Force A (2000a) The probability of duplicate gene preservation by subfunctionalization. Genetics 154(1):459–473
Article CAS PubMed PubMed Central Google Scholar
Lynch M, Force AG (2000b) The origin of interspecific genomic incompatibility via gene duplication. Am Nat 156(6):590–605. https://doi.org/10.1086/316992
Article PubMed Google Scholar
Lynch M, O’Hely M, Walsh B, Force A (2001) The probability of preservation of a newly arisen gene duplicate. Genetics 159(4):1789–1804
Article CAS PubMed PubMed Central Google Scholar
Maruyama T (1974) The age of an allele in a finite population. Genet Res 23(2):137–143. https://doi.org/10.1017/S0016672300014750
Article CAS PubMed Google Scholar
Moran PAP (1958) Random processes in genetics. In: Mathematical proceedings of the cambridge philosophical society, vol 54, Cambridge University Press, Cambridge, pp 60–71. https://doi.org/10.1017/S0305004100033193
Nei M, Gojobori T (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3(5):418–426. https://doi.org/10.1093/oxfordjournals.molbev.a040410
Article CAS PubMed Google Scholar
Ohno S (1970) The enormous diversity in genome sizes of fish as a reflection of nature’s extensive experiments with gene duplication. Trans Am Fish Soc 99(1):120–130. https://doi.org/10.1577/1548-8659(1970)99h120:TEDIGSi2.0.CO;2
Article Google Scholar
Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Stone AC (2007) Diet and the evolution of human amylase gene copy number variation. Nat Genet 39(10):1256–1260. https://doi.org/10.1038/ng2123
Article CAS PubMed PubMed Central Google Scholar
Platt A, Pivirotto A, Knoblauch J, Hey J (2019) An estimator of first coalescent time reveals selection on young variants and large heterogeneity in rare allele ages among human populations. PLoS Genet 15:8. https://doi.org/10.1371/journal.pgen.1008340
Article CAS Google Scholar
Rodrigue N, Philippe H (2010) Mechanistic revisions of phenomenological modeling strategies in molecular evolution. Trends Genet 26(6):248–252
Article CAS PubMed Google Scholar
Sidje RB (1998) Expokit: a software package for computing matrix exponentials. ACM Trans Math Softw 24(1):130–156. https://doi.org/10.1145/285861.285868
Article Google Scholar
Siltberg-Liberles J, Grahnen JA, Liberles DA (2011) The evolution of protein structures and structural ensembles under functional constraint. Genes 2(4):748–762
Article CAS PubMed PubMed Central Google Scholar
Stark TL, Liberles DA, Holland BR, O’Reilly MM (2017) Analysis of a mechanistic Markov model for gene duplicates evolving under subfunctionalization. BMC Evol Biol 17(1):1–16. https://doi.org/10.1186/s12862-016-0848-0
Article CAS Google Scholar
Steel M (2005) Should phylogenetic models be trying to ‘fit an elephant’? Trends Genet 21(6):307–309
Article CAS PubMed Google Scholar
Tofigh A, Hallett M, Lagergren J (2010) Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM Trans Comput Biol Bioinform 8(2):517–535. https://doi.org/10.1109/TCBB.2010.14
Article Google Scholar
Wagner A (2005) Energy constraints on the evolution of gene expression. Mol Biol Evol 22(6):1365–1374. https://doi.org/10.1093/molbev/msi126
Article CAS PubMed Google Scholar
Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8):1586–1591. https://doi.org/10.1093/molbev/msm088
Article CAS PubMed Google Scholar
Yohe LR, Liu L, Dávalos LM, Liberles DA (2019) Protocols for the molecular evolutionary analysis of membrane protein gene duplicates. In: Sikosek T (ed) Computational methods in protein evolution, vol 1851. Springer, New York, pp 49–62. https://doi.org/10.1007/978-1-4939-8736-83
Chapter Google Scholar
Zhang C, Zhang C, Chen S, Yin X, Pan X, Lin G, Wang W (2013) A single cell level based method for copy number variation analysis by low coverage massively parallel sequencing. PloS ONE 8:1. https://doi.org/10.1371/journal.pone.0054236
Article CAS Google Scholar
Zhang J (2003) Evolution by gene duplication: an update. Trends Ecol Evol 18(6):292–298. https://doi.org/10.1016/S0169-5347(03)00033-8
Article Google Scholar

Download references

Acknowledgements

We would like to thank the Australian Research Council for partially funding this research through Discovery Project DP180100352. We would also like to thank Ryan Houser for careful reading of an early version of the manuscript and for helpful discussions, Gene Maltepes for computational support, and Catherine Browne for technical assistance in the preparation of the manuscript.

Author information

Tristan L. Stark
Present address: Discipline of Mathematics, University of Tasmania, Hobart, Tasmania, 7001, Australia

Authors and Affiliations

Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
Tristan L. Stark, Rebecca S. Kaufman, Maria A. Maltepes, Peter B. Chi & David A. Liberles
Department of Mathematics and Statistics, Villanova University, Villanova, PA, 19085, USA
Peter B. Chi

Authors

Tristan L. Stark
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca S. Kaufman
View author publications
You can also search for this author in PubMed Google Scholar
Maria A. Maltepes
View author publications
You can also search for this author in PubMed Google Scholar
Peter B. Chi
View author publications
You can also search for this author in PubMed Google Scholar
David A. Liberles
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

This study was conceived by DAL and TLS. Modeling and theoretical results were generated by TLS and RSK. Computer code for simulations was written and run by TLS, RSK, MAM, and PBC. The manuscript was written by DAL, TLS, RSK, and MAM.

Corresponding authors

Correspondence to Tristan L. Stark or David A. Liberles.

Additional information

Handling editor: Liang Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stark, T.L., Kaufman, R.S., Maltepes, M.A. et al. Detecting Selection on Segregating Gene Duplicates in a Population. J Mol Evol 89, 554–564 (2021). https://doi.org/10.1007/s00239-021-10024-2

Download citation

Received: 07 June 2021
Accepted: 20 July 2021
Published: 02 August 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00239-021-10024-2

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting Selection on Segregating Gene Duplicates in a Population

Abstract

Access this article

Similar content being viewed by others

Early evolutionary history and genomic features of gene duplicates in the human genome

CDROM: Classification of Duplicate gene RetentiOn Mechanisms

Dosage sensitivity and exon shuffling shape the landscape of polymorphic duplicates in Drosophila and humans

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Detecting Selection on Segregating Gene Duplicates in a Population

Abstract

Access this article

Similar content being viewed by others

Early evolutionary history and genomic features of gene duplicates in the human genome

CDROM: Classification of Duplicate gene RetentiOn Mechanisms

Dosage sensitivity and exon shuffling shape the landscape of polymorphic duplicates in Drosophila and humans

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation