Abstract
Genomic variation is the genetic basis of phenotypic diversity among individuals, including variation in disease susceptibility and drug response. The greatest promise of the International HapMap is to provide roadmaps for identifying genetic variants predisposing to complex diseases. Single nucleotide polymorphism (SNP) is the fundamental element of the HapMap. Allele frequency of SNPs is one of the major factors affecting the resulting HapMap, being the factor upon which linkage disequilibrium (LD) is calculated, haplotypes are constructed, and tagging SNPs (tagSNPs) are selected. The cutoff thresholds for the frequency of minor alleles used in the making of the map therefore have profound effects on the resolution of that map. To date most researchers have adopted their own cutoff thresholds, and there has been little real dataset-based evaluation of the effects of different cutoff thresholds on HapMap resolution. In an attempt to assess the implications of different cutoff values, we analyzed our own data for the centromeric genes on Chromosome 15 in Chinese Han and Tibetan populations, with respect to minor allele frequency cutoff values of ≥0.01 (0.01 group), ≥0.05 (0.05 group), and ≥0.10 (0.10 group), and constructed HapMaps from each of the datasets. The resolution, study power and cost-effectiveness for each of the maps were compared. Our results show that the 0.01 threshold provides the greatest power (P = 0.019 in Han and P = 0.029 in Tibetan for 0.01 vs. 0.05 threshold) and detects most population-specific haploypes (P = 0.012 for 0.01 vs. 0.05 threshold). However, in the regions studied, the 0.05 cutoff threshold did not significantly increase power above the 0.10 threshold (P = 0.191 in Han; 1.000 in Tibetans), and did not improve resolution over the 0.10 value for populationspecific haplotypes (P = 0.592) neither. Furthermore the 0.05 and 0.10 values produced the same figures for tagging efficiency, LD block number, LD length, study power and cost-savings in the Tibetan population. These results suggest that a lower cutoff value is more appropriate for studies in which population-specific haplotypes are crucial, and that the most appropriate cutoff value may differ between populations. Due to the limited genes studied in this project more studies should be conducted to further address this important issue.
Similar content being viewed by others
References
The International HapMap Consortium. A haplotype map of the human genome. Nature, 2005, 437: 1299–1320
Payseur B A, Place M, Weber J L, et al. Linkage disequilibrium between STRPs and SNPs across the human genome. Am J Hum Genet, 2008, 82: 1039–1050
Gabriel S B, Schaffner S F, Nguyen H, et al. The structure of haplotype blocks in the human genome. Science, 2002, 296: 2225–2229
Guthery S L, Salisbury B A, Pungliya M S, et al. The structure of common genetic variation in United States populations. Am J Hum Genet, 2007, 81: 1221–1231
The International Hapmap consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature, 2007, 449: 851–861
Siva N. 1000 Genomes project. Nat Biotechnol, 2008, 26: 256
Vasan R S, Larson M G, Aragam J, et al. Genome-wide association of echocardiographic dimensions, brachial artery endothelial function and treadmill exercise responses in the Framingham Heart Study. BMC Med Genet, 2007, 8(Suppl 1): S2
Iida A, Ozaki K, Tanaka T, et al. Fine-scale SNP map of an 11-kb genomic region at 22q13.1 containing the galectin-1 gene. J Hum Genet, 2005, 50: 41–45
Huang W, Li C, Chen S, et al. Construction of fine SNP haplotypes and haplotype blocks in 5 genes in the centromere of chromosome 15 in Chinese Han subjects. Chin Sci Bull, 2004, 49: 1044–1051
Huang W, Li C, Labu, et al. High resolution linkage disequilibrium and haplotype maps for the genes in the centromeric region of chromosome 15 in Tibetans and comparisons with Han population. Chin Sci Bull, 2006, 51: 542–551
Ke X, Durrant C, Morris A P, et al. Efficiency and consistency of haplotype tagging of dense SNP maps in multiple samples. Hum Mol Genet, 2004, 13: 2557–2565
Carlson C S, Eberle M A, Rieder M J, et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet, 2004, 74: 106–120
Crawford D C, Carlson C S, Rieder M J, et al. Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations. Am J Hum Genet, 2004, 74: 610–622
Skol A D, Scott L J, Abecasis G R, et al. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet, 2006, 38: 209–213
Lindman H R. Analysis of variance in complex experimental designs. San Francisco: W. H. Freeman & Co, 1974
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc Ser B, 1995, 57: 289–300
Nie N H, Hull C H, Jenkins J G, et al. SPSS: Statistical Package for the Social Sciences, Berkshire: McGraw-Hill Education, 1975
Wang W Y, Barratt B J, Clayton D G, et al. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet, 2005, 6: 109–118
McCarthy M I, Abecasis G R, Cardon L R, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, 2008, 9: 356–369
Anderson C A, Pettersson F H, Barrett J C, et al. Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet, 2008, 83: 112–119
Bhangale T R, Rieder M J, Nickerson D A, et al. Estimating coverage and power for genetic association studies using near-complete variation data. Nat Genet, 2008, 40: 841–843
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the Key Construction Program of the National “985” Project of China (Phase II), Natural Science Foundation of Guangdong Province (Grant No. 031673), Guangzhou Municipal Science and Technology Foundation (Grant Nos. 2002Z3-C7191, 2004Z3-C7501)
About this article
Cite this article
Xiong, S., Hao, Y., Rao, S. et al. Effects of cutoff thresholds for minor allele frequencies on HapMap resolution: A real dataset-based evaluation of the Chinese Han and Tibetan populations. Chin. Sci. Bull. 54, 2069–2075 (2009). https://doi.org/10.1007/s11434-009-0302-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11434-009-0302-4