A novel DNA sequence database for analyzing human demographic history

  1. Jeffrey D. Wall1,
  2. Murray P. Cox2,
  3. Fernando L. Mendez3,
  4. August Woerner2,
  5. Tesa Severson2, and
  6. Michael F. Hammer2,3,4
  1. 1 Institute for Human Genetics and Department of Epidemiology and Biostatistics, University of California—San Francisco, San Francisco, California 94143, USA;
  2. 2 ARL Division of Biotechnology, University of Arizona, Tucson, Arizona 85721, USA;
  3. 3 Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA

Abstract

While there are now extensive databases of human genomic sequences from both private and public efforts to catalog human nucleotide variation, there are very few large-scale surveys designed for the purpose of analyzing human population history. Demographic inference from patterns of SNP variation in current large public databases is complicated by ascertainment biases associated with SNP discovery and the ways that populations and regions of the genome are sampled. Here, we present results from a resequencing survey of 40 independent intergenic regions on the autosomes and X chromosome comprising ∼210 kb from each of 90 humans from six geographically diverse populations (i.e., a total of ∼18.9 Mb). Unlike other public DNA sequence databases, we include multiple indigenous populations that serve as important reservoirs of human genetic diversity, such as the San of Namibia, the Biaka of the Central African Republic, and Melanesians from Papua New Guinea. In fact, only 20% of the SNPs that we find are contained in the HapMap database. We identify several key differences in patterns of variability in our database compared with other large public databases, including higher levels of nucleotide diversity within populations, greater levels of differentiation between populations, and significant differences in the frequency spectrum. Because variants at loci included in this database are less likely to be subject to ascertainment biases or linked to sites under selection, these data will be more useful for accurately reconstructing past changes in size and structure of human populations.

Footnotes

  • 4 Corresponding author.

    4 E-mail mfh{at}u.arizona.edu; fax (520) 626-8050.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.075630.107.

    • Received December 14, 2007.
    • Accepted May 5, 2008.
| Table of Contents

Preprint Server