Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter June 11, 2018

Non-parametric estimation of population size changes from the site frequency spectrum

  • Berit Lindum Waltoft EMAIL logo and Asger Hobolth EMAIL logo

Abstract

Changes in population size is a useful quantity for understanding the evolutionary history of a species. Genetic variation within a species can be summarized by the site frequency spectrum (SFS). For a sample of size n, the SFS is a vector of length n − 1 where entry i is the number of sites where the mutant base appears i times and the ancestral base appears ni times. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from an observed SFS. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size. Our derivation is based on an eigenvalue decomposition of the instantaneous coalescent rate matrix. Second, we solve the inverse problem of determining the changes in population size from an observed SFS. Our solution is based on a cubic spline for the population size. The cubic spline is determined by minimizing the weighted average of two terms, namely (i) the goodness of fit to the observed SFS, and (ii) a penalty term based on the smoothness of the changes. The weight is determined by cross-validation. The new method is validated on simulated demographic histories and applied on unfolded and folded SFS from 26 different human populations from the 1000 Genomes Project.

Funding source: Lundbeck Foundation

Award Identifier / Grant number: R155–2014–1724

Funding statement: BLW is funded by The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Denmark. Grant number R155–2014–1724.

Acknowledgement

We are grateful to two anonymous reviewers who provided very careful and constructive questions and suggestions on the first version of this paper. The helpful comments resulted in a much improved paper. We thank X. Liu and Y.-X. Fu for providing the SFS of the 9 populations. We also thank J. Terhorst and Y. Song for providing key insight into the SMC++ method. We further thank Bjarni Vilhjalmsson, for providing the folded SFS of the 1000 Genomes Project phase 3 data.

References

Bhaskar, A. and Y. S. Song (2014): “Descartes’ rule of signs and the identifiability of populations demographic models from genomic variation data,” Ann. Stat., 42, 2469–2493.10.1214/14-AOS1264Search in Google Scholar PubMed PubMed Central

Bhaskar, A., Y. S. R. Wang and Y. S. Song (2015): “Efficient inference of population size histories and locus-specific mutation rate from large-sample genomic variation data,” Genome Res., 25, 268–279.10.1101/gr.178756.114Search in Google Scholar PubMed PubMed Central

Birgin, E. G. and J. M. Martínez (2008): “Improving ultimate convergence of an augmented Lagrangian method,” Optim. Method. Softw., 23, 177–195.10.1080/10556780701577730Search in Google Scholar

Boitard, S., W. Rodriguez, F. Jay, S. Mona and F. Austerlitz (2016): “Inferring population size history from large samples of Genome-wide molecular data – an approximate Bayesian computation approach,” PLoS Genet., 12, e1005877.10.1371/journal.pgen.1005877Search in Google Scholar PubMed PubMed Central

Eldon, B., M. Birkner, J. Blath and F. Freund (2015): “Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents?” Genetics, 199, 841–856.10.1534/genetics.114.173807Search in Google Scholar PubMed PubMed Central

Excoffier, L., I. Dupanloup, E. Huerta Sánchez, V. C. Sousa and M. Foll (2013): “Robust demographic inference from Genomic and SNP data,” PLoS Genet., 9, e1003905.10.1371/journal.pgen.1003905Search in Google Scholar PubMed PubMed Central

Gao, F. and A. Keinan (2016): “Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models,” Genetics, 202, 235–245.10.1534/genetics.115.180570Search in Google Scholar PubMed PubMed Central

Gattepaille, L., T. Gunther and M. Jakobsson (2016): “Inferring past effective population size from distributions of coalescent times,” Genetics, 204, 1191–1206.10.1534/genetics.115.185058Search in Google Scholar PubMed PubMed Central

Green, P. J. and B. W. Silvermann (1994): Nonparametric regression and generalized linear models, Chapman & Hall/CRC, Londan.10.1007/978-1-4899-4473-3Search in Google Scholar

Gutenkunst, R. N., R. D. Hernandex, S. H. Williamson and C. D. Bustamante (2009): “Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data,” PLoS Genet., 5, e1000695.10.1371/journal.pgen.1000695Search in Google Scholar PubMed PubMed Central

Gutenkunst, R. N., R. D. Herandez, S. H. Williamson and C. D. Bustamante (2010): “Diffusion approximations for demographic interence: δaδi”, Nature Precedings. http://precedings.nature.com/documents/4594/version/1.10.1038/npre.2010.4594.1Search in Google Scholar

Lan, S., J. Palacios, M. Karcher, V. N. Minin and B. Shahbaba (2015): “An efficient Bayesian inference framework for coalescent-based nonparametric phylodynamics,” Bioinformatics, 31, 3282–3289.10.1093/bioinformatics/btv378Search in Google Scholar PubMed PubMed Central

Lapierre, M., A. Lambert and G. Achaz (2017): “Accuracy of demographic inferences from the site frequency spectrum: the case of the yoruba population,” Genetics, 206, 439–449.10.1534/genetics.116.192708Search in Google Scholar PubMed PubMed Central

Li, H. and R. Durbin (2011): “Inference of human population history from individual whole-genome sequences,” Nature, 475, 493–496.10.1038/nature10231Search in Google Scholar PubMed PubMed Central

Liu, X. and Y. Fu (2015): “Exploring population size changes using SNP frequency spectra,” Nature Genet., 47, 555–559.10.1038/ng.3254Search in Google Scholar PubMed PubMed Central

Lukic, S. and J. Hey (2011): “Non-equilibrium allele frequency spectar via spectral metods,” Theor. Popul. Biol., 79, 203–219.10.1016/j.tpb.2011.02.003Search in Google Scholar PubMed PubMed Central

Lukic, S. and J. Hey (2012): “Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion,” Genetics, 192, 619–639.10.1534/genetics.112.141846Search in Google Scholar PubMed PubMed Central

Mazet, O, W. Rodrígues, S. Gruseq, S. Boitard and L. Chikhi (2016): “On the importance of being structured: instantaneous coalescence rates and human evolution – lessions for ancestral population size inference?” Heredity, 116, 362–371.10.1038/hdy.2015.104Search in Google Scholar PubMed PubMed Central

Myers, S., C. Fefferman and N. Patterson (2008): “Can one learn history from the allelic spectrum,” Theor. Popul. Biol., 73, 342–348.10.1016/j.tpb.2008.01.001Search in Google Scholar PubMed

Palacios, J. and V. N. Minin (2013): “Gaussian process-based Bayesian nonparametric inference of population size trajectories from Gene Genealogies,” Biometrics, 69, 8–18.10.1111/biom.12003Search in Google Scholar PubMed

Palacios, J. A., J. Wakaley and S. Ramachandran (2015): “Bayesian nonparametric inference of population size changes from sequential Genealogies,” Genetics, 201, 281–304.10.1534/genetics.115.177980Search in Google Scholar

Polanski, A. and M. Kimmel (2003): “New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth,” Genetics, 165, 427–436.10.1093/genetics/165.1.427Search in Google Scholar

Polanski, A., A. Bobrowski and M. Kimmel (2003): “A note on distributions of times to coalescence, under time-dependent population size,” Theor. Popul. Biol., 63, 33–40.10.1016/S0040-5809(02)00010-2Search in Google Scholar

Powell, M. J. D. (1994): Advances in Optimization and Numerical Analysis, chapter A Direct Search Optimization Method That Models the Objective and Constraint Functions by Linear Interpolation, 51–67. Springer Netherlands, Dordrecht.10.1007/978-94-015-8330-5_4Search in Google Scholar

Powell, M. J. D. (1998): “Direct search algorithms for optimization calculations,” Acta Numerica, 7, 287–336.10.1017/S0962492900002841Search in Google Scholar

Reppell, M., M. Boehnke and S. Zôllner (2014): “The impact of accelerating faster than exponential population growth on genetic variation,” Genetics, 196, 819–828.10.1534/genetics.113.158675Search in Google Scholar PubMed PubMed Central

Schiffels, S. and R. Durbin (2014): “Inferring human population size and separation history from multiple genome sequences,” Nature Genet., 46, 919–925.10.1038/ng.3015Search in Google Scholar PubMed PubMed Central

Sheehan, S., K. Harris and Y. S. Song (2013): “Estimating variable effective population size from multiple genomes: a sequenctially Markov conditional sampling distribution approach,” Genetics, 194, 647–662.10.1534/genetics.112.149096Search in Google Scholar PubMed PubMed Central

Terhorst, J., J. A. Kamm and Y. S. Song (2017): “Robust and scalable inference of population history from hundreds of unphased whole genomes,” Nature Genet., 49, 303–309.10.1038/ng.3748Search in Google Scholar PubMed PubMed Central

The 1000 Genomes Project Consortium (2015): “A global reference for human genetic variation,” Nature, 526, 68–74.10.1038/nature15393Search in Google Scholar PubMed PubMed Central

Wakeley, J. (2009): Coalescent theory: an introduction, Roberts and Company Publishers, Greenwood Village, Colorado 80111, USA.Search in Google Scholar


Supplemental Material

The online version of this article offers supplementary material (https://doi.org/10.1515/sagmb-2017-0061).


Published Online: 2018-6-11

©2018 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 20.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2017-0061/html
Scroll to top button