Abstract
Genotype imputation is now routinely performed in genomic analysis. Reference panel size, that is, the number of haplotypes in the reference panel, has been well established to be one major driving factor of imputation accuracy. For that reason, huge efforts have been made worldwide to provide large reference panels, with the Haplotype Reference Consortium (HRC) being currently the largest available in the public domain. The imputation performance of HRC, whose major samples are Europeans, has been mainly evaluated in Europeans. We conducted whole-genome genotype imputation on two independent genome-wide genotyping datasets, one with 1000 European samples and the other with 1000 Han Chinese samples. We compared the results obtained using HRC with those using Phase III of the 1000 Genomes Project (1000G) reference panel. For the European dataset, using HRC improved imputation quality, especially for rare variants with minor allele-frequency (MAF) < 0.1%. However, 1000G demonstrates better performance in the Han Chinese dataset, in both imputation quality and number of well-imputed variants. We validated the performance of 1000G reference panel in a second, independent cohort of Han Chinese (N = 2402). Our study showcases the limitations of HRC for Han Chinese populations, strongly suggesting the necessity of building population-specific reference panels.
Similar content being viewed by others
References
Auer PL et al (2012) Imputation of exome sequence variants into population-based samples and blood–cell-trait-associated loci in African Americans: NHLBI GO Exome Sequencing. Project Am J Hum Genet 91:794–808. https://doi.org/10.1016/j.ajhg.2012.08.031
Auton A et al (2015) A global reference for human genetic variation. Nature 526:68–74. https://doi.org/10.1038/nature15393
Cai N et al (2017) 11,670 whole-genome sequences representative of the Han Chinese population from the CONVERGE project. Sci Data 4:170011. https://doi.org/10.1038/sdata.2017.11
Chou WC et al (2016) A combined reference panel from the 1000 Genomes and UK10K projects improved rare variant imputation in European and Chinese samples. Sci Rep 6:39313. https://doi.org/10.1038/srep39313
Craddock N et al (2010) Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3000 shared controls. Nature 464:713–720. https://doi.org/10.1038/nature08979
De Bakker PI, Ferreira MA, Jia X, Neale BM, Raychaudhuri S, Voight BF (2008) Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet 17:R122-R128
Delaneau O, Zagury JF, Marchini J (2013) Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods 10:5–6. https://doi.org/10.1038/nmeth.2307
Duan Q et al (2013) Imputation of coding variants in African Americans: better performance using data from the exome sequencing project. Bioinformatics. (Oxford England) 29:2744–2749. https://doi.org/10.1093/bioinformatics/btt477
Durbin R (2014) Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics 30:1266–1272. https://doi.org/10.1093/bioinformatics/btu014
Farh KK et al (2015) Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518:337–343. https://doi.org/10.1038/nature13835
Frazer KA et al (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861. https://doi.org/10.1038/nature06258
Fuchsberger C et al (2016) The genetic architecture of type 2 diabetes. Nature 536:41–47. https://doi.org/10.1038/nature18642
Genome of the Netherlands Consortium (2014) Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 46:818–825 https://doi.org/10.1038/ng.3021
Guan Y, Stephens M (2008) Practical issues in imputation-based association mapping. PLoS genetics 4:e1000279
Gudbjartsson DF et al (2015) Large-scale whole-genome sequencing of the Icelandic population. Nat Genet 47:435–444. https://doi.org/10.1038/ng.3247
Howie B, Marchini J, Stephens M (2011) Genotype imputation with thousands of genomes G3 (Bethesda) 1:457–470 https://doi.org/10.1534/g3.111.001198
Huang J et al (2015) Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun 6:8111. https://doi.org/10.1038/ncomms9111
Li Y, Willer C, Sanna S, Abecasis G (2009) Genotype imputation. Annu Rev Genomics Hum Genet 10:387–406. https://doi.org/10.1146/annurev.genom.9.081307.164242
Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010a) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834. https://doi.org/10.1002/gepi.20533
Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010b) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic Epidemiol 34:816–834
Liu JZ et al (2010) Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat Genet 42:436–440
Loh PR et al (2016) Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet. https://doi.org/10.1038/ng.3679
Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11:499–511. https://doi.org/10.1038/nrg2796
Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet 39:906–913
McCarthy S et al (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48:1279–1283. https://doi.org/10.1038/ng.3643
Nelson SC et al (2013) Imputation-based genomic coverage assessments of current human genotyping arrays G3: Genes, Genomes, Genetics:g3. 113.007161
Sidore C et al (2015) Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat Genet 47:1272–1281. https://doi.org/10.1038/ng.3368
Teo YY et al (2009) Singapore genome variation project: a Haplotype map of three Southeast Asian populations. Genome Res 19:2154–2162. https://doi.org/10.1101/gr.095000.109
Vergara C, Parker MM, Franco L, Cho MH, Valencia-Duarte AV, Beaty TH, Duggal P (2018) Genotype imputation performance of three reference panels using African ancestry individuals. Hum Genet 137:281–292. https://doi.org/10.1007/s00439-018-1881-4
Wu C et al (2011) Genome-wide association study identifies three new susceptibility loci for esophageal squamous-cell carcinoma in Chinese populations. Nat Genet 43:679–684. https://doi.org/10.1038/ng.849
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82. https://doi.org/10.1016/j.ajhg.2010.11.011
Zhang XJ et al (2009) Psoriasis genome-wide association study identifies susceptibility variants within LCE gene cluster at 1q21. Nat Genet 41:205–210. https://doi.org/10.1038/ng.310
Zhou W et al (2017) Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels. Genetic Epidemiol 41:744–755. https://doi.org/10.1002/gepi.22067
Acknowledgements
We acknowledge data sharing from European Genome-phenome archive (EGAD00000000024). This research was funded by Normal Project (81370044), Youth Project (81000692), Key Project of National Natural Science Foundation of China (81130031), Anhui Excellent Youth Fund (1808085J08), Anhui High Education Talent Fund (X. Yin), Anhui Medical University Ph.D. Fund (XJ201429). We thank Dr. Kimberly Robasky at University of North Carolina at Chapel Hill for valuable comments.
Author information
Authors and Affiliations
Contributions
XYY and YL conceived the study. XYY and YL designed the research strategy and conducted the analysis. SY, DXL, and XJZ participated in sample procurement. XYY and YL prepared the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Lin, Y., Liu, L., Yang, S. et al. Genotype imputation for Han Chinese population using Haplotype Reference Consortium as reference. Hum Genet 137, 431–436 (2018). https://doi.org/10.1007/s00439-018-1894-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-018-1894-z