Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Joint Analysis of Multiple Traits Using "Optimal" Maximum Heritability Test

  • Zhenchuan Wang,

    Affiliation Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, 49931, United States of America

  • Qiuying Sha,

    Affiliation Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, 49931, United States of America

  • Shuanglin Zhang

    shuzhang@mtu.edu

    Affiliation Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, 49931, United States of America

Abstract

The joint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods use all of the traits for testing the association between multiple traits and a single variant. However, those methods for association studies may lose power in the presence of a large number of noise traits. In this paper, we propose an “optimal” maximum heritability test (MHT-O) to test the association between multiple traits and a single variant. MHT-O includes a procedure of deleting traits that have weak or no association with the variant. Using extensive simulation studies, we compare the performance of MHT-O with MHT, Trait-based Association Test uses Extended Simes procedure (TATES), SUM_SCORE and MANOVA. Our results show that, in all of the simulation scenarios, MHT-O is either the most powerful test or comparable to the most powerful test among the five tests we compared.

Introduction

Increasing evidence shows that pleiotropy, the effect of one variant on multiple traits, is a widespread phenomenon in complex diseases [1]. Furthermore, in genetic association studies of complex diseases, multiple related traits are usually measured. For example, hyperuricemia is usually present in patients with gout [2]; coronary heart disease is predicted by cytokine interleukin-6, C-reactive protein, interleukin-1, tumor necrosis factor-α and fibrinogen [3, 4]; and neuropsychiatric disorders depend on a range of overlapping clinical characteristics [5]. Although most published genome-wide association studies (GWASs) analyze each of the related traits separately, joint analysis of multiple traits may increase statistical power to detect genetic variants [69]. Thus, joint analysis of multiple traits has recently become popular.

Several statistical methods have been developed for joint analysis of multiple traits. These methods can be roughly divided into three groups: combining the univariate analysis results, regression methods, and dimension reduction methods. For combining univariate analysis results, one first conducts the univariate test by performing an association test for each trait individually and then combines the univariate test statistics or combines the p-values of the univariate tests [2, 1012]. Regression methods include mixed effect models [9, 13, 14], generalized estimating equation (GEE) methods [15, 16], and reverse regression methods [5, 17]. Mixed effect models can account for relatedness, population structure, and polygenic background effect, but it is computationally challenging. The GEE methods, based on a marginal regression model, allow the variant having different effect sizes and effect directions on different traits. These methods can also accommodate covariates and different types of traits. Reverse regression methods take genotypes as the response variable and multiple traits as independent predictors, therefore, reverse regression models do not need to know the complex distributions of traits and can be applied to a large number of mixed types of traits. Dimension reduction methods include canonical correlation analysis (CCA) [18], principal components of traits (PCT) [19], and principal components of heritability (PCH) [2023]. CCA is to seek a linear combination of multiple variants and a linear combination of multiple traits such that the correlation between the two linear combinations reaches its maximum. The PCT methods are usually based on the first PC or first few PCs of the traits [22, 24]. However, as Aschard et al. [2014] showed that testing only the first few PCs often has low power, whereas combining signals across all PCs can have greater power. Nevertheless, it is not clear how many PCs are needed, and how robust these methods are when there exists noise traits. PCH is to find a linear combination of multiple traits such that this linear combination has the maximum heritability.

In this article, we first propose a maximum heritability test (MHT). Based on MHT, we develop an “optimal” maximum heritability test (MHT-O) to test the association between multiple traits and a single variant. In each step of MHT-O, we delete one trait that has the weakest association with the variant. Then, we find the optimal number of traits and use MHT to test the association between the optimal number of traits and the variant. Using extensive simulation studies, we compare the performance of MHT-O with MHT, Trait-based Association Test uses Extended Simes procedure (TATES) [11], SUM_SCORE and MANOVA [8]. Our results show that, in all of the simulation scenarios, MHT-O is either the most powerful test or comparable to the most powerful test among the five tests we compared.

Method

We consider a sample with n unrelated individuals. Each individual has K (potentially correlated) traits and has been genotyped at one variant. Let Y = (Y1,…,YK)T denote the random vector of K traits and X denote the random variable of the genotype score at a variant. Let yi = (yi1,…,yiK)T denote the values of K traits and xi denote the genotype score of the ith individual, where xi is the number of minor alleles that the ith individual has at the variant. We can consider that y1,…,yn is a random sample from Y and x1,…,xn is a random sample from X.

Now, let us consider linear models

We partition the total phenotypic covariance of Y as VP = VG + VR [25]; VG = var[β1X,…, βKX] = var(X)ββT is the genetic variance due to the genotype scores X, where β = (β1,…, βK)T; VR = var[ε1,…, εK] is the residual covariance after removing the genetic effect. var(X) can be estimated by . β and VR can be estimated from the linear models βk is estimated by the least square estimator. Let rik denote the estimates of residuals εik. Then, the (j, k)th element of VR is estimated by .

Let us consider a linear combination of Y, , where w = (w1,…,wK)T. The heritability of wTY can be written as

If we define , we can write as where . The heritability of wTY depends on w and we can find a linear combination of wTY that has the largest heritability among all linear combinations of Y. We define the maximum heritability as the test statistic to test the association between these K traits and the variant. We denote this test as maximum heritability test (MHT). The MHT statistic can be written as where λmax(A) denotes the largest eigenvalue of matrix A.

However, the test statistic TMHT may lose power in the presence of a large number of noise traits. Therefore, we propose an “optimal” maximum heritability test (MHT-O) to test the association between multiple traits and the variant. MHT-O includes a procedure of deleting traits that have weak or no association with the variant. It has the following steps:

  1. Step 1. Given traits Y = (Y1,…,YK), initialize r = K and Y(r) = Y. Denote TMHT, r as TMHT based on Y(r).
  2. Step 2. Denote as TMHT based on Y(r) with the ith trait deleted for i = 1,…,r; denote and . Let Y(r−1) denote Y(r) with the Ith trait deleted and update r = r − 1.
  3. Step 3. Repeat step 2 until r = 1.

Denote pr as the p-value of TMHT, r. The test statistic of MHT-O is defined as

We use a permutation test to evaluate the p-value of TMHTO. Intuitively, two layers of permutations are needed to estimate pr and the overall p-value for the test statistic TMHTO. Ge et al. [26] proposed that one layer of permutation can be used to estimate these p-values. We use the permutation procedure of Ge et al. to estimate pr and the overall p-value for the test statistic TMHTO. In details, we randomly shuffle the genotypes in each permutation. Suppose we perform B times of permutations. Let denote the value of TMHT, r based on the bth permuted data, where b = 0 represents the original data. Then, we transfer to by

Let , then, the p-value of TMHTO is given by

The R code of MHT-O is available at Shuanglin Zhang’s homepage http://www.math.mtu.edu/~shuzhang/software.html.

Comparisons of Methods

We compare our proposed method with MHT, TATES [11], MANOVA [8], and SUM_SCORE. TATES combines p-values obtained in a standard univariate GAWS to acquire one trait-based p-value, while correcting for correlations between components. SUM_SCORE performs an association test for each trait individually to obtain the univariate score test statistic for each trait. Then, the test statistic of SUM_SOCRE is the summation of the univariate score test statistics. We use asymptotic distributions to evaluate the p-values of SUM_SCORE, TATES and MANOVA.

Simulation

To evaluate the type I error rates and powers of MHT and MHT-O, we generate genotypes according to minor allele frequency (MAF) and assume Hardy Weinberg equilibrium. Then, we generate K traits by the factor model [11, 19] (1) where y = (y1,…,yK)T; x is the genotype score at the variant of interest; λ = (λ1,…,λK) is the vector of effect sizes of the genetic variant on the K traits; f = (f1,…,fR)TMVN(0, Σ), Σ = (1 − ρ)I + ρA, A is a matrix with elements of 1, I is the identity matrix, and ρ is the correlation between factors; γ is a K by R matrix; c is a constant number; and ε = (ε1,…, εK)T is a vector of residuals, and ε1,…, εK are independent, and εkN(0, 1) for k = 1,…, K.

Based on Eq (1), we consider five models:

Model 1: There is only one factor and genotypes impact on all traits with the same effect size. That is, R = 1, λ = (β,…,β)T, and γ = (1,…,1)T.

Model 2: There are five factors and genotypes impact on one factor. That is, , and γ = diag(D1, D2, D3, D4, D5), where for i = 1,…,5.

Model 3: There are two factors and genotypes impact on one factor. That is, , and γ = diag(D1, D2), where for i = 1, 2.

Model 4: There are five factors and genotypes impact on one trait. That is, R = 5, λ = (0,…,0, β)T, and γ = diag(D1, D2, D3, D4, D5), where for i = 1,…,5.

Model 5: There is only one factor and genotypes impact on one trait. That is, R = 1, λ = (0,…,0, β)T, and γ = (1,…,1)T.

To evaluate type I error rates of MHT and MHT-O, we let β = 0. To evaluate powers, we let β > 0. In the simulation studies for evaluation of type I error rates and powers, we set MAF = 0.3 and ρ = 0.2.

Results

To evaluate the type I error rates of the two proposed tests (MHT and MHT-O), we consider 20 quantitative traits. We also consider different sample sizes, different significance levels, and different models. In each simulation scenario, the p-values of MHT and MHT-O are estimated by 1,000 permutations and the type I error rates of the two tests are evaluated using 10,000 replicated samples. For 10,000 replicated samples, the 95% confidence intervals (CIs) for estimated type I error rates of nominal levels 0.05 and 0.01 are (0.046, 0.054) and (0.008, 0.012), respectively (see Appendix for details). The estimated type I error rates of the two tests are summarized in Table 1. From this table, we can see that 58 out of 60 (greater than 95%) estimated type I error rates are within the 95% CIs and the two estimated type I error rates (0.05415 and 0.0126) not within the 95% CIs are very close to the bound of the corresponding 95% CI, which indicates that the two tests are all valid.

thumbnail
Table 1. The estimated type I error rates of MHT and MHT-O.

10,000 replicates used.

https://doi.org/10.1371/journal.pone.0150975.t001

For power comparisons, we consider different values of the effect size, different models, and different numbers of traits. Sample size is 1,000 for all the cases. In each of the simulation scenarios, the p-values of MHT and MHT-O are estimated using 1,000 permutations and the p-values of SUM_SCORE, TATES and MANOVA are estimated using their asymptotic distributions. The powers of all of the five tests are evaluated using 500 replicated samples at a significance level of 0.05.

Fig 1 gives the power comparisons of the five tests (SUM_SCORE, TATES, MHT, MHT-O and MANOVA) for the power as a function of the effect size based on the five models for 20 traits. This figure shows that (1) MHT-O is either the most powerful one (genotypes directly impact on a single trait: models 4–5) or comparable to the most powerful one (genotypes directly impact on all or a portion of the traits: models 1–3) among the five tests; (2) MHT and MANOVA have very similar powers; (3) MHT and MANOVA are much less powerful than other methods when genotypes directly impact on only a portion of the traits (models 2–3); (4) TATES is much less powerful than other methods when genotypes directly impact on all the traits (model 1); and (5) SUM_SCORE is much less powerful than other methods when genotypes directly impact on a single trait (models 4–5).

thumbnail
Fig 1. Power comparisons of the five tests (SUM_SCORE, TATES, MHT, MHT-O and MANOVA) for the power as a function of the effect size.

Sample size is 1000. Total number of traits is 20.

https://doi.org/10.1371/journal.pone.0150975.g001

Power comparisons of the five tests for 30 and 40 traits are given in Figs 2 and 3, respectively. The patterns of power comparisons for 30 and 40 traits (Figs 2 and 3) are similar to that for 20 traits (Fig 1). We also give power comparisons of the five tests using a significance level of 5×10−8 with 108 permutations and 500 replicates for 20 traits under model 1 (S1 Fig). S1 Fig shows that the patterns of the power comparisons using significance level 5×10−8 are similar to that using a significance level of 0.05 in Fig 1 (model 1). In summary, MHT-O is either the most powerful test or comparable to the most powerful test among all the tests we compared. Therefore, our MHT-O is a robust test to a variety of models.

thumbnail
Fig 2. Power comparisons of the five tests (SUM_ SCORE, TATES, MHT, MHT-O and MANOVA) for the power as a function of the effect size.

Sample size is 1000. Total number of traits is 30.

https://doi.org/10.1371/journal.pone.0150975.g002

thumbnail
Fig 3. Power comparisons of the five tests (SUM_SCORE, TATES, MHT, MHT-O and MANOVA) for the power as a function of the effect size.

Sample size is 1000. Total number of traits is 40.

https://doi.org/10.1371/journal.pone.0150975.g003

Discussion

We propose MHT-O to perform joint analysis of multiple traits in association studies based on the following reasons: (1) multiple related traits are usually measured in genetic association studies of complex diseases; (2) there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases; and (3) the power of existing methods decreases in the presence of non-associated traits. The proposed MHT-O includes a procedure of deleting traits that have weak or no association with the variant. Therefore, it can be robust to the existence and the number of non-associated traits. By deleting one trait that has the weakest association with the variant in each step, MHT-O can maintain high power in the presence of a large number of non-associated traits. This feature is essentially important when there exist a large number of correlated traits but there are no guidelines to select relevant traits. Our results show that MHT-O has correct type I error rates and is either the most powerful test or comparable to the most powerful test among the five tests we compared. No other methods in the simulation studies show consistent good performance.

Due to the allelic heterogeneity and the extreme rarity of individual variants in rare variant association studies, the variant-by-variant methods for common variant association studies may not be optimal [27]. It has been shown by recent studies that complex diseases are caused by both common and rare variants [2834]. Statistical methods including burden tests [27, 3538], quadratic tests [3941], and combined tests [4244] have been developed for rare variant association studies with a single trait. Currently, there are limited researches on rare variant association studies for joint analysis of multiple traits [14, 45]. MHT-O can be extended to rare variant association studies by extending Eq (1) to include multiple variants. MHT-O can also be extended to family-based studies by extending Eq (1) to mixed linear model. However, the performance of MHT-O in rare variant association studies and in family-based association studies needs further investigation.

The fact that population stratification can seriously confound association results has been long recognized in association studies based on unrelated individuals [46, 47]. Several methods to control for population stratification have been developed for association studies based on unrelated individuals. These methods include principal component (PC) approach [4852], genomic control (GC) approach [5355], and mixed linear model (MLM) approach [29, 56]. Like most association tests based on unrelated individuals, MHT-O subjects to bias due to population stratification. To make MHT-O robust to population stratification, we can use the PC approach. Let Pi = (pi1,…,piL)T denote the first L PCs of the genotypes at a set of genomic markers for the ith individual. Let and denote the residuals of the regressions and the residuals of the regression xi = α0 + αTPi + εi, respectively. Using and to replace yik and xi, we can make MHT-O robust to population stratification. However, the performance of using the PC approach to control for population stratification in MHT-O needs further investigations.

Appendix

Let p denote the p-value of the test and denote a random variable where α is the significance level. Then, Pr(ξ = 1) = α and Pr(ξ = 0) = 1 − α because p follows a uniform distribution between 0 and 1 under the null hypothesis. Suppose there are R replicates. Let ξi denote the value of ξ for the ith replicate, where i = 1,…,R Then, the estimated type I error rate is given by that asymptotically follows a normal distribution . Thus, .

We define as the 95% confidence interval for the estimated type I error rate for the nominal level α.

Supporting Information

S1 Fig. Power comparisons of the five tests (SUM_SCORE, TATES, MHT, MHT-O and MANOVA) for the power as a function of the effect size (model 1).

Sample size is 1000. Total number of traits is 20. The significance level is 5×10−8. The number of replicates is 500. The number of permutations is 108.

https://doi.org/10.1371/journal.pone.0150975.s001

(EPS)

Author Contributions

Conceived and designed the experiments: SZ. Performed the experiments: ZW SZ. Analyzed the data: SZ ZW. Wrote the paper: SZ ZW QS.

References

  1. 1. Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T, et al. Abundant pleiotropy in human complex diseases and traits. American journal of human genetics. 2011;89(5):607–18. pmid:22077970; PubMed Central PMCID: PMC3213397.
  2. 2. Yang Q, Wu H, Guo CY, Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genetic epidemiology. 2010;34(5):444–54. pmid:20583287; PubMed Central PMCID: PMC3090041.
  3. 3. Rifai N, Ridker PM. Inflammatory markers and coronary heart disease. Current opinion in lipidology. 2002;13(4):383–9. pmid:12151853.
  4. 4. Yudkin JS, Kumari M, Humphries SE, Mohamed-Ali V. Inflammation, obesity, stress and coronary heart disease: is interleukin-6 the link? Atherosclerosis. 2000;148(2):209–14. pmid:10657556.
  5. 5. O'Reilly PF, Hoggart CJ, Pomyen Y, Calboli FC, Elliott P, Jarvelin MR, et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PloS one. 2012;7(5):e34861. pmid:22567092; PubMed Central PMCID: PMC3342314.
  6. 6. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nature reviews Genetics. 2013;14(7):483–95. pmid:23752797; PubMed Central PMCID: PMC4104202.
  7. 7. Stephens M. A unified framework for association analysis with multiple related phenotypes. PloS one. 2013;8(7):e65245. pmid:23861737; PubMed Central PMCID: PMC3702528.
  8. 8. Yang Q, Wang Y. Methods for analyzing multivariate phenotypes in genetic association studies. J Probab Stat. 2012;2012:652569. pmid:24748889; PubMed Central PMCID: PMCPMC3989935.
  9. 9. Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods. 2014;11(4):407–9. pmid:24531419; PubMed Central PMCID: PMC4211878.
  10. 10. O'Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40(4):1079–87. pmid:6534410.
  11. 11. van der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS genetics. 2013;9(1):e1003235. pmid:23359524; PubMed Central PMCID: PMC3554627.
  12. 12. Kim J, Bai Y, Pan W. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics. Genetic epidemiology. 2015;39(8):651–63. pmid:26493956
  13. 13. Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nature genetics. 2012;44(9):1066–71. pmid:22902788; PubMed Central PMCID: PMC3432668.
  14. 14. Casale FP, Rakitsch B, Lippert C, Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nature methods. 2015;12(8):755–8. pmid:26076425.
  15. 15. Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42(1):121–30. pmid:3719049.
  16. 16. Zhang Y, Xu Z, Shen X, Pan W, Alzheimer's Disease Neuroimaging I. Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage. 2014;96:309–25. pmid:24704269; PubMed Central PMCID: PMCPMC4043944.
  17. 17. Yan T, Li Q, Li Y, Li Z, Zheng G. Genetic association with multiple traits in the presence of population stratification. Genetic epidemiology. 2013;37(6):571–80. pmid:23740720.
  18. 18. Tang CS, Ferreira MA. A gene-based test of association using canonical correlation analysis. Bioinformatics. 2012;28(6):845–50. pmid:22296789.
  19. 19. Aschard H, Vilhjalmsson BJ, Greliche N, Morange PE, Tregouet DA, Kraft P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. American journal of human genetics. 2014;94(5):662–76. pmid:24746957; PubMed Central PMCID: PMC4067564.
  20. 20. Ott J, Rabinowitz D. A principal-components approach based on heritability for combining phenotype information. Human heredity. 1999;49(2):106–11. 22854. pmid:10077732.
  21. 21. Lange C, van Steen K, Andrew T, Lyon H, DeMeo DL, Raby B, et al. A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Statistical applications in genetics and molecular biology. 2004;3:Article17. pmid:16646795.
  22. 22. Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genetic epidemiology. 2008;32(1):9–19. pmid:17922480.
  23. 23. Zhou JJ, Cho MH, Lange C, Lutz S, Silverman EK, Laird NM. Integrating multiple correlated phenotypes for genetic association analysis by maximizing heritability. Hum Hered. 2015;79(2):93–104. pmid:26111731; PubMed Central PMCID: PMCPMC4508328.
  24. 24. Feng T, Zhang S, Sha Q. A method dealing with a large number of correlated traits in a linkage genome scan. BMC proceedings. 2007;1 Suppl 1:S84. pmid:18466587; PubMed Central PMCID: PMC2367490.
  25. 25. Falconer DS, Mackay TFC. Introduction to quantitative genetics. 4th ed. Essex, England: Longman; 1996. xiii, 464 p. p.
  26. 26. Ge Y, Dudoit S, Speed TP. Resampling-based multiple testing for microarray data analysis. Test. 2003;12(1):1–77.
  27. 27. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83(3):311–21. pmid:18691683; PubMed Central PMCID: PMC2842185.
  28. 28. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40(6):695–701. pmid:18509313; PubMed Central PMCID: PMC2527050.
  29. 29. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nature genetics. 2010;42(4):348–54. pmid:20208533; PubMed Central PMCID: PMC3092069.
  30. 30. Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69(1):124–37. pmid:11404818; PubMed Central PMCID: PMC1226027.
  31. 31. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet. 2002;11(20):2417–23. pmid:12351577.
  32. 32. Stratton MR, Rahman N. The emerging landscape of breast cancer susceptibility. Nat Genet. 2008;40(1):17–22. pmid:18163131.
  33. 33. Teer JK, Mullikin JC. Exome sequencing: the sweet spot before whole genomes. Hum Mol Genet. 2010;19(R2):R145–51. pmid:20705737; PubMed Central PMCID: PMC2953745.
  34. 34. Walsh T, King MC. Ten genes for inherited breast cancer. Cancer Cell. 2007;11(2):103–5. pmid:17292821.
  35. 35. Morgenthaler S, Thilly WG. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res. 2007;615(1–2):28–56. pmid:17101154.
  36. 36. Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS genetics. 2009;5(2):e1000384. pmid:19214210; PubMed Central PMCID: PMC2633048.
  37. 37. Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, et al. Pooled association tests for rare variants in exon-resequencing studies. American journal of human genetics. 2010;86(6):832–8. pmid:20471002; PubMed Central PMCID: PMC3032073.
  38. 38. Zawistowski M, Gopalakrishnan S, Ding J, Li Y, Grimm S, Zollner S. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. American journal of human genetics. 2010;87(5):604–17. pmid:21070896; PubMed Central PMCID: PMC2978957.
  39. 39. Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, et al. Testing for an unusual distribution of rare variants. PLoS genetics. 2011;7(3):e1001322. pmid:21408211; PubMed Central PMCID: PMC3048375.
  40. 40. Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genetic epidemiology. 2012;36(6):561–71. pmid:22714994.
  41. 41. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. American journal of human genetics. 2011;89(1):82–93. pmid:21737059; PubMed Central PMCID: PMC3135811.
  42. 42. Sha Q, Zhang S. A rare variant association test based on combinations of single-variant tests. Genetic epidemiology. 2014;38(6):494–501. pmid:25065727; PubMed Central PMCID: PMC4127117.
  43. 43. Derkach A, Lawless JF, Sun L. Robust and powerful tests for rare variants using Fisher's method to combine evidence of association from two or more complementary tests. Genetic epidemiology. 2013;37(1):110–21. pmid:23032573.
  44. 44. Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. American journal of human genetics. 2012;91(2):224–37. pmid:22863193; PubMed Central PMCID: PMC3415556.
  45. 45. Wang Y, Liu A, Mills JL, Boehnke M, Wilson AF, Bailey-Wilson JE, et al. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genetic epidemiology. 2015;39(4):259–75. pmid:25809955; PubMed Central PMCID: PMC4443751.
  46. 46. Knowler WC, Williams RC, Pettitt DJ, Steinberg AG. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. American journal of human genetics. 1988;43(4):520–6. pmid:3177389; PubMed Central PMCID: PMC1715499.
  47. 47. Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265(5181):2037–48. pmid:8091226.
  48. 48. Bauchet M, McEvoy B, Pearson LN, Quillen EE, Sarkisian T, Hovhannesyan K, et al. Measuring European population stratification with microarray genotype data. American journal of human genetics. 2007;80(5):948–56. pmid:17436249; PubMed Central PMCID: PMC1852743.
  49. 49. Chen HS, Zhu X, Zhao H, Zhang S. Qualitative semi-parametric test for genetic associations in case-control designs under structured populations. Annals of human genetics. 2003;67(Pt 3):250–64. pmid:12914577.
  50. 50. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics. 2006;38(8):904–9. pmid:16862161.
  51. 51. Zhang S, Zhu X, Zhao H. On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals. Genetic epidemiology. 2003;24(1):44–56. pmid:12508255.
  52. 52. Zhu X, Zhang S, Zhao H, Cooper RS. Association mapping, using a mixture model for complex traits. Genetic epidemiology. 2002;23(2):181–96. pmid:12214310.
  53. 53. Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55(4):997–1004. pmid:11315092.
  54. 54. Devlin B, Roeder K, Wasserman L. Genomic control, a new approach to genetic-based association studies. Theoretical population biology. 2001;60(3):155–66. pmid:11855950.
  55. 55. Reich DE, Goldstein DB. Detecting association in a case-control study while correcting for population stratification. Genetic epidemiology. 2001;20(1):4–16. pmid:11119293.
  56. 56. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed linear model approach adapted for genome-wide association studies. Nature genetics. 2010;42(4):355–60. pmid:20208535; PubMed Central PMCID: PMC2931336.