Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

Mixed linear model approach adapted for genome-wide association studies

Abstract

Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The forms of MLM classified by the random effect size and types of kinship.
Figure 2: Quantile-quantile plots of type I error (false positive) rates of association tests using the compressed MLM under different compression levels.
Figure 3: The performance of the compressed MLM under different compression levels (horizontal axis).
Figure 4: The P values and statistical power of association tests obtained by using the one-step MLM with the full optimization (full OPT) for all unknown parameters compared to P3D on a maize phenotype simulated with different epistatic effects (E).

Similar content being viewed by others

References

  1. Abiola, O. et al. The nature and identification of quantitative trait loci: a community's view. Nat. Rev. Genet. 4, 911–916 (2003).

    PubMed  Google Scholar 

  2. Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    Article  CAS  PubMed  Google Scholar 

  4. Abecasis, G.R., Cardon, L.R. & Cookson, W.O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279–292 (2000).

    Article  CAS  PubMed  Google Scholar 

  5. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

    Article  CAS  PubMed  Google Scholar 

  6. Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Balding, D.J. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7, 781–791 (2006).

    Article  CAS  PubMed  Google Scholar 

  8. Buckler, E.S. et al. The genetic architecture of maize flowering time. Science 325, 714–718 (2009).

    Article  CAS  PubMed  Google Scholar 

  9. Henderson, C.R. Comparison of alternative sire evaluation methods. J. Anim. Sci. 41, 760–770 (1975).

    Article  Google Scholar 

  10. Pollak, E.J. & Quaas, R.L. Definition of group effects in sire evaluation models. J. Dairy Sci. 66, 1503–1509 (1983).

    Article  Google Scholar 

  11. Thompson, R. Sire evaluation. Biometrics 35, 339–353 (1979).

    Article  Google Scholar 

  12. Quass, R.L. & Pollak, E.J. Mixed model methodology for farm and ranch beef cattle testing programs. J. Anim. Sci. 51, 1277–1287 (1980).

    Article  Google Scholar 

  13. Myles, S. et al. Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell 21, 2194–2202 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zhu, L. et al. The long (and winding) road to gene discovery for canine hip dysplasia. Vet. J. 181, 97–110 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Henderson, C.R. Applications of Linear Models in Animal Breeding (University of Guelph, Guelph, Ontario, Canada, 1984).

  16. Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Aulchenko, Y.S., de Koning, D.-J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Searle, S.R., Casella, G. & McCulloch, C.E. Variance Components (Wiley & Sons, New York, 1992).

  19. Robertson, A. Optimum group size in progeny testing and family selection. Biometrics 13, 442–450 (1957).

    Article  Google Scholar 

  20. Hannrup, B., Jansson, G. & Danell, Ö. Comparing gain and optimum test size from progeny testing and phenotypic selection in Pinus sylvestris. Can. J. For. Res. 37, 1227–1235 (2007).

    Article  Google Scholar 

  21. de Oliveira, H.N. & Lobo, R.B. Use of progeny testing in beef cattle: prediction of genetic gain in Nelore cattle breeding program. Rev. Bras. Genet. 18, 207–214 z(1995).

    Google Scholar 

  22. Yu, J., Arbelbide, M. & Bernardo, R. Power of in silico QTL mapping from phenotypic, pedigree and marker data in a hybrid breeding program. Theor. Appl. Genet. 110, 1061–1067 (2005).

    Article  CAS  PubMed  Google Scholar 

  23. Rutherford, J.R. & Krutchkoff, R.G. The empirical Bayes approach: estimating the prior distribution. Biometrika 54, 326–328 (1967).

    Article  CAS  PubMed  Google Scholar 

  24. Romesberg, H.C. Cluster Analysis for Researchers (LULU Press, Raleigh, North Carolina, USA, 2004).

  25. Jain, A.K., Murty, M.N. & Flynn, P.J. Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999).

    Article  Google Scholar 

  26. SAS Institute Inc. Statistical Analysis Software for Windows (Cary, North Carolina, 2002).

  27. Bradbury, P.J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).

    Article  CAS  PubMed  Google Scholar 

  28. Lai, C.Q. et al. Fenofibrate effect on triglyceride and postprandial response of apolipoprotein A5 variants: the GOLDN study. Arterioscler. Thromb. Vasc. Biol. 27, 1417–1425 (2007).

    Article  CAS  PubMed  Google Scholar 

  29. Zhang, Z. et al. Estimation of heritabilities, genetic correlations, and breeding values of four traits collectively defining hip dysplasia in dogs. Am. J. Vet. Res. 70, 483–492 (2009).

    Article  PubMed  Google Scholar 

  30. Long, A.D. & Langley, C.H. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9, 720–731 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Lande, R. & Thompson, R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124, 743–756 (1990).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Loiselle, B.A., Sork, V.L., Nason, J. & Graham, C. Spatial genetic-structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am. J. Bot. 82, 1420–1425 (1995).

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported by the US National Science Foundation (NSF)–Plant Genome Program (DBI-0321467, 0703908 and 0820619), NSF–Plant Genome Comparative Sequencing Program (DBI-06638566), US National Institutes of Health (1R21AR055228-01A1), National Heart, Lung, and Blood Institute (U 01 HL72524, HL54776 and 5U01HL072524-06), US Department of Agriculture Research Service (53-K06–5-10 and 58–1950-9–001), USDA–Cooperative State Research, Education and Extension Service National Research Initiative (2006-35300-17155), Morris Animal Foundation (D04CA-135), WALTHAM Centre for Pet Nutrition, Cornell Advanced Technology in Biotechnology and the Collaborative Research Program in the Cornell Veterinary College. The authors would like to thank K. Zhao for providing the source code to compute kinship and L. Rigamer Lirette, A.L. Ingham and S. Myles for editing of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Z.Z. conceptualized the study, performed the data analyses and wrote the manuscript. E.E., M.A.G. and J.Y. participated in the data analyses and wrote the manuscript. P.J.B. implemented the two new methods in the TASSEL software package. C.L., H.K.T., D.K.A. and J.M.O. provided the human data and supervised its analyses. R.J.T. provided the dog data and supervised its analyses. E.S.B designed and supervised the project. All authors edited the manuscript.

Corresponding author

Correspondence to Zhiwu Zhang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 and Supplementary Note (PDF 1425 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Ersoz, E., Lai, CQ. et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42, 355–360 (2010). https://doi.org/10.1038/ng.546

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.546

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing