Abstract
In genome-wide association studies (GWAS), single-marker analysis is usually employed to identify the most significant single nucleotide polymorphisms (SNPs). The trend test has been proposed for analysis of case-control association. Three trend tests, optimal for the recessive, additive and dominant models respectively, are available. When the underlying genetic model is unknown, the maximum of the three trend test results (MAX) has been shown to be robust against genetic model misspecification. Since the asymptotic distribution of MAX depends on the allele frequency of the SNP, using the P-value of MAX for ranking may be different from using the MAX statistic. Calculating the P-value of MAX for 300,000 (300 K) or more SNPs is computationally intensive and the software and program to obtain the P-value of MAX are not widely available. On the other hand, the MAX statistic is very easy to calculate without complex computer programs. Thus, we study whether or not one could use the MAX statistic instead of its P-value to rank SNPs in GWAS. The approaches using the MAX and its P-value to rank SNPs are referred to as MAX-rank and P-rank. By applying MAX-rank and P-rank to simulated and four real datasets from GWAS, we found the ranks of SNPs with true association are very similar using both approaches. Thus, we recommend to use MAX-rank for genome-wide scans. After the top-ranked SNPs are identified, their P-values based on MAX can be calculated and compared with the significance level.
Similar content being viewed by others
References
Agresti A (1990) Categorical data analysis. Wiley, London
Balding D (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791
Conneely KN, Boehnke M (2007) So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. Am J Hum Genet 81:1158–1168
Davies RB (1977) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64:247–254
Freidlin B, Zheng G, Li Z, Gastwirth JL (2002) Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum Hered 53:146–152
Gail MH, Pfeiffer RM, Wheeler W, Pee D (2008) Probability of detecting disease-associated single nucleotide polymorphisms in case-control genome-wide association studies. Biostatistics 9:201–215
Gastwirth JL (1966) On robust procedures. J Am Stat Assoc 61:929–948
Gastwirth JL (1985) The use of maximin efficiency robust tests in combining contingency tables and survival analysis. J Am Stat Assoc 80:380–384
Gonzalez JR, Carrasco JL, Dudbridge F, Armengol L, Estivill X, Moreno V (2008) Maximizing association statistics over genetic models. Genet Epidemiol (in press). doi:10.1002/gepi.20299
Herbert A, Gerry NP, McQueen MB, Heid IM, Pfeufer A et al (2006) A common genetic variant is associated with adult and childhood obesity. Science 312:279–283
Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager N, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, et al (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39:870–874
Klein RJ, Zeiss C, Chew EY, Tsai J-Y, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL et al (2005) Complement factor H polymorphism in aged-related macular degeneration. Science 308:385–389
Li W (2008) Three lectures on case-control genetic association analysis. Brief Bioinform 9:1–13
Li Q, Zheng G, Li Z, Yu K (2008) Efficient approximation of p-value of maximum of correlated tests, with applications to genome-wide association studies. Ann Hum Genet 72:397–406
Sasieni PD (1997) From genotypes to genes: doubling the sample size. Biometrics 53:1253–1261
Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S et al (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445:881–885
The Wellcome Trust Case Control Consortium (WTCCC) (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–683
Van Steen K, McQueen MB, Herbert A, Raby B, Lyon H et al (2005) Genomic screening and replication using the same data set in family-based association testing. Nat Genet 37:683–691
Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N et al (2007) Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39:645–649
Zaykin DV, Zhivotovsky LA (2005) Ranks of genuine associations in whole-genome scans. Genetics 171:813–823
Zheng G (2004) Maximizing a family of optimal statistics over a nuisance parameter with applications to genetic data analysis. J Appl Stat 31:661–671
Zheng G, Chen Z (2005) Comparison of maximum statistics for hypothesis testing when a nuisance parameter is present only under the alternative. Biometrics 61:254–258
Zheng G, Freidlin B, Li Z, Gastwirth JL (2003) Choice of scores in trend tests for case-control studies of candidate-gene associations. Biom J 45:335–348
Zheng G, Freidlin B, Gastwirth JL (2006) Comparison of robust tests for genetic association using case-control studies, vol 49. In: IMS lecture notes monograph series (2nd Lehmann symposium—optimality), pp 253–265
Zheng G, Joo J, Lin JP, Stylianou M, Waclawiw MA, Geller NL (2007) Robust ranks of true associations in genome-wide case-control association studies. BMC Proc 1(Suppl 1):S165
Acknowledgments
We thank the Center for Information Technology, NIH, for providing access to the high-performance computational capabilities of the Biowulf cluster computer system. The authors would like to thank J Hoh for sharing her AMD data with us and BJ Stone of NCI for her helpful on the English edits. Three reviewers provided useful comments and suggestions with which we improved our presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work of Q. Li was partially supported by the Knowledge Innovation Program of the Chinese Academy of Sciences, No. 30465W0 and 30475V0. The research of Z Li was partially sponsored by NIH grant EY014478.
Rights and permissions
About this article
Cite this article
Li, Q., Yu, K., Li, Z. et al. MAX-rank: a simple and robust genome-wide scan for case-control association studies. Hum Genet 123, 617–623 (2008). https://doi.org/10.1007/s00439-008-0514-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-008-0514-8