Abstract
Many association methods use a subset of genotyped single nucleotide polymorphisms (SNPs) to capture or infer genotypes at other untyped SNPs. We and others previously showed that tag SNPs selected to capture common variation using data from The International HapMap Consortium (Nature 437:1299–1320, 2005), The International HapMap Consortium (Nature 449:851–861, 2007) could also capture variation in populations of similar ancestry to HapMap reference populations (de Bakker et al. in Nat Genet 38:1298–1303, 2006; González-Neira et al. in Genome Res 16:323–330, 2006; Montpetit et al. in PLoS Genet 2:282–290, 2006; Mueller et al. in Am J Hum Genet 76:387–398, 2005). To capture variation in admixed populations or populations less similar to HapMap panels, a “cosmopolitan approach,” in which all samples from HapMap are used as a single reference panel, was proposed. Here we refine this suggestion and show that use of a “weighted reference panel,” constructed based on empirical estimates of ancestry in the target population (relative to available reference panels), is more efficient than the cosmopolitan approach. Weighted reference panels capture, on average, only slightly fewer common variants (minor allele frequency > 5%) than the cosmopolitan approach (mean r 2 = 0.977 vs. 0.989, 94.5% variation captured vs. 96.8% at r 2 > 0.8), across the five populations of the Multiethnic Cohort, but entail approximately 25% fewer tag SNPs per panel (average 538 vs. 718). These results extend a recent study in two Indian populations (Pemberton et al. in Ann Hum Genet 72:535–546, 2008). Weighted reference panels are potentially useful for both the selection of tag SNPs in diverse populations and perhaps in the design of reference panels for imputation of untyped genotypes in genome-wide association studies in admixed populations.
Similar content being viewed by others
References
Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of ld and haplotype maps. Bioinformatics 21:263–265
Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, Rhodes M, Reich DE, Hirschhorn JN (2004) Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74:1111–1120
Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A, Chen Z, Chu J, Carcassi C, Contu L, Du R, Excoffier L, Ferrara GB, Friedlaender JS, Groot H, Gurwitz D, Jenkins T, Herrera RJ, Huang X, Kidd J, Kidd KK, Langaney A, Lin AA, Mehdi SQ, Parham P, Piazza A, Pistillo MP, Qian Y, Shu Q, Xu J, Zhu S, Weber JL, Greely HT, Feldman MW, Thomas G, Dausset J, Cavalli-Sforza LL (2002) A human genome diversity cell line panel. Science 296:261–262
Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106–120
Chapman JM, Cooper JD, Todd JA, Clayton DG (2003) Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and determinants of statistical power. Hum Hered 56:18–31
Clayton D, Chapman J, Cooper J (2004) Use of unphased multilocus genotype data in indirect association studies. Genet Epidemiol 27:415–428
Coriell Institute for Medical Research. Available at http://ccr.coriell.org/Sections/Collections/NHGRI/hapmap.aspx?PgId=266 Accessed 25 Jun 2008
de Bakker PI, Yelenski R, Pe’er I, Gabriel SB, Daly MJ, Altshuler D (2005) Efficiency and power in genetic association studies. Nat Genet 37:1217–1223
de Bakker PI, Burtt N, Graham RR, Guiducci C, Yelensky R, Drake JA, Bersaglieri T, Penney KL, Butler J, Young S, Onofrio RC, Lyon HN, Stram DO, Haiman CA, Freedman ML, Zhu X, Cooper R, Groop L, Kolonel LN, Henderson BE, Daly MJ, Hirschhorn JN, Altshuler D (2006) Transferability of tag SNPs in genetic association studies in multiple populations. Nat Genet 38:1298–1303
González-Neira A, Ke X, Lao O, Calafell F, Navarro A, Comas D, Cann H, Bumpstead S, Ghori J, Hunt S, Deloukas P, Dunham I, Cardon LR, Bertranpetit J (2006) The portability of tag SNPs across populations: a worldwide survey. Genome Res 16:323–330
Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, Neubauer J, Tandon A, Schirmer C, McDonald GJ, Greenway SC, Stram DO, Le Marchand L, Kolonel LN, Frasco M, Wong D, Pooler LC, Ardlie K, Oakley-Girvan I, Whittemore AS, Cooney KA, John EM, Ingles SA, Altshuler D, Henderson BE, Reich D (2007) Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet 39:638–644
Haploview (2008) Available at http://www.broad.mit.edu/mpg/haploview/. Accessed 08 Jun 2008
Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, Cordell HJ, Eaves IA, Dudbridge F, Twells RC, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough SC, Clayton DG, Todd JA (2001) Haplotype tagging for the identification of common disease genes. Nat Genet 29:233–237
Kolonel LN, Henderson B, Hankin JH, Nomura AM, Wilkens LR, Pike MC, Stram DO, Monroe KR, Earle ME, Nagamine FS (2000) A Multiethnic Cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol 151:346–357
Li Y, Willer CJ, Ding J, Sheet P, Abecasis GR (2008) Rapid Markov chain haplotyping and genotype inference (Submitted)
Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913
Maresso K, Broeckel U (2008) Genotyping platforms for mass-throughput genotyping with SNPs, including human genome-wide scans. Adv Genet 60:107–139
Montpetit A, Nelis M, Laflamme P, Magi R, Ke X, Remm M, Cardon L, Hudson TJ, Metspalu A (2006) An evaluation of the performance of tag SNPs derived from HapMap in a Caucasian population. PLoS Genet 2:282–290
Mueller JC, Lõhmussaar E, Mägi R, Remm M, Bettecken T, Lichtner P, Biskup S, Illig T, Pfeufer A, Luedemann J, Schreiber S, Pramstaller P, Pichler I, Romeo G, Gaddi A, Testa A, Wichmann HE, Metspalu A, Meitinger T (2005) Linkage disequilibrium patterns and tag SNP transferability among European populations. Am J Hum Genet 76:387–398
Pemberton TJ, Jakobsson M, Conrad DF, Coop G, Wall JD, Pritchard JK, Patel PI, Rosenberg NA (2008) Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India. Ann Hum Genet 72:535–546
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959
Sequenom (2008) Available at http://www.sequenom.com. Accessed 12 Jul 2008
Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E, De Jager PL, Mignault AA, Yi Z, De The G, Essex M, Sankale JL, Moore JH, Poku K, Phair JP, Goedert JJ, Vlahov D, Williams SM, Tishkoff SA, Winkler CA, De La Vega FM, Woodage T, Sninsky JJ, Hafler DA, Altshuler D, Gilbert DA, O’Brien SJ, Reich D (2004) A high-density admixture map for disease gene discovery in African Americans. Am J Hum Genet 74:1001–1013
The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299–1320
The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861
Tian C, Hinds D, Shigeta R, Kittles R, Ballinger D, Seldin M (2006) A genome-wide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am J Hum Genet 79:640–649
Zeggini E, Scott L, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PI, Abecasis GR, Almgren P, Andersen G, Ardlie K, Boström KB, Bergman RN, Bonnycastle LL, Borch-Johnsen K, Burtt NP, Chen H, Chines PS, Daly MJ, Deodhar P, Ding CJ, Doney AS, Duren WL, Elliott KS, Erdos MR, Frayling TM, Freathy RM, Gianniny L, Grallert H, Grarup N, Groves CJ, Guiducci C, Hansen T, Herder C, Hitman GA, Hughes TE, Isomaa B, Jackson AU, Jørgensen T, Kong A, Kubalanza K, Kuruvilla FG, Kuusisto J, Langenberg C, Lango H, Lauritzen T, Li Y, Lindgren CM, Lyssenko V, Marvelle AF, Meisinger C, Midthjell K, Mohlke KL, Morken MA, Morris AD, Narisu N, Nilsson P, Owen KR, Palmer CN, Payne F, Perry JR, Pettersen E, Platou C, Prokopenko I, Qi L, Qin L, Rayner NW, Rees M, Roix JJ, Sandbaek A, Shields B, Sjögren M, Steinthorsdottir V, Stringham HM, Swift AJ, Thorleifsson G, Thorsteinsdottir U, Timpson NJ, Tuomi T, Tuomilehto J, Walker M, Watanabe RM, Weedon MN, Willer CJ, Wellcome Trust Case Control Consortium, Illig T, Hveem K, Hu FB, Laakso M, Stefansson K, Pedersen O, Wareham NJ, Barroso I, Hattersley AT, Collins FS, Groop L, McCarthy MI, Boehnke M, Altshuler D (2008) Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 40:638–645
Acknowledgments
We thank P. de Bakker of the Broad Institute for support in analyzing the generated data, and members of the Hirschhorn lab for helpful discussions. Finally, we extend our deepest thanks to the participants of the International HapMap Project, Multiethnic Cohort, and Human Genome Diversity Panel, without whom none of this work would have been possible. This work was supported by grant R01DK075787 to JNH and a Strategic Program for Asthma Research award to JNH from the American Asthma Foundation.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Egyud, M.R.L., Gajdos, Z.K.Z., Butler, J.L. et al. Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation. Hum Genet 125, 295–303 (2009). https://doi.org/10.1007/s00439-009-0627-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-009-0627-8