Genomic Control to the extreme

Devlin, B; Bacanu, Silviu-Alin; Roeder, Kathryn

doi:10.1038/ng1104-1129

Correspondence
Published: 01 November 2004

Genomic Control to the extreme

B Devlin¹,
Silviu-Alin Bacanu¹ &
Kathryn Roeder²

Nature Genetics volume 36, pages 1129–1130 (2004)Cite this article

4269 Accesses
109 Citations
4 Altmetric
Metrics details

You have full access to this article via your institution.

Download PDF

To the editor:

In the study of complex disease, separating causal from confounded factors is a challenge for genetic epidemiologists. One tool useful for separating these factors is Genomic Control (GC). In this communication we clarify how and when to use GC. We also describe a refined approach to GC, which should be used when GC is applied to extreme settings.

Population-based studies, such as case-control studies, are common designs used to determine the genetic and environmental bases of disease. To avoid false positive associations, design and analysis of population-based studies should account for population stratification, which can inflate association test statistics. One analytic method used to control the false positive rate is GC. In our original paper¹, we investigated two scenarios with two corresponding analytic methods. GC is the version similar to the typical approach to hypothesis testing, and GCB is the version that uses Bayesian inference. GC is suitable when a modest number of candidate genes are assessed and L supplementary loci are included for control. The supplementary loci, called null loci, are used to correct any inflation, λ, in association test statistic(s) by estimating λ from the null test statistics. GC produces average rejection rates close to the targeted 0.05 significance level^2,3,4.

We also considered population-based studies when large numbers of markers are tested¹. GCB is designed for this scenario. Rather than preselecting null loci, GCB delineates loci associated with disease as 'outliers' relative to most of the loci tested.

In our original papers^1,2, we argued that a population-based study should attempt to remove the effect of stratification by experimental design and analysis, such as by matching cases and controls for ethnicity and environmental covariates. GC then adjusts for the residual effects of stratification. Careful study design and implementation pay off in statistical power^5,6; even small stratification can have considerable consequences for large samples^1,2,3,4,5.

Marchini et al.⁷ explored the efficacy of GC for population-based studies under less ideal conditions, using subjects that originate from different populations and including environmental effects that induce geographically distinct prevalences; both of these possibilities were ignored in the design and analysis. Because they genotyped a large number of loci, they required an extremely small significance level (α) for P values. They found that GC could be anticonservative when α is small. Their results are sensible because GC treats λ as a known constant⁸. For small values of α, variability in the estimate of λ matters. The population-based studies explored by Marchini et al.⁷ can produce highly inflated test statistics (Fig. 1), and, because these population-based studies involve a large number of candidate loci, they are more appropriately analyzed by using GCB rather than GC.

**Figure 1: Performance of GC as a function of the targeted significant P value (α), the effect of stratification (λ) and the number of null loci included (L).**

Because λ is determined by sample size, stratification and differential prevalence¹, we can generate and compactly represent the general findings of Marchini et al.⁷ (Fig. 1). Four features stand out from our results: (i) GC works well in situations for which it was originally intended, namely larger values of α and/or smaller values of λ (refs. 1–4); (ii) GC becomes increasingly anticonservative as α decreases or as λ increases; (iii) bias is also a function of L (refs. 1,2); and (iv) even minor stratification can have a substantial impact on population-based studies with large sample sizes^1,7,9. Of these results, the second feature is new to Marchini et al.⁷; the fourth was shown mathematically¹ before it was demonstrated empirically^7,9.

Is there a way to adjust the procedure if a researcher wishes to apply the logic of GC and use an extremely small α value? Correcting the bias in GC is straightforward by simple modification of the test statistic (GCF). For GCF, estimate λ using the mean (λ_m) of the null test statistics and account for the variability of λ_m by using an F test to determine the P values. Notably, GCF is accurate throughout the parameter space, even for only 30 null loci (Fig. 2).

**Figure 2: Performance of GCF as a function of α, λ_m and L.**

Our means of validating the results of Marchini et al.⁷ and our own results use a shortcut method that Marchini et al.⁷ did not use. Our results are also supported by using the simulation methods of Marchini et al.⁷. When we used their methods and analyzed the data using GCF, we again found that GCF yielded an excellent approximation for small values of α (Table 1), even when λ is inflated substantially by large sample size or geographically distinct prevalences.

Table 1 Targeted significance levels of GCF compared with the realized values produced by simulations using a beta-binomial model

Full size table

In summary, when a large number of candidate loci are genotyped or when α is small, application of GC produces misleading results (Fig. 1), as Marchini et al.⁷ show. Because GCF corrects this bias for small values of α, and does so in a range of settings (Fig. 2 and Table 1), we conclude that the bias is largely due to the uncertainty in λ. GCF accounts for this uncertainty in its degrees of freedom. Thus, GCF provides a simple alternative to recently suggested methods based on the confidence interval for λ (ref. 9).

As we have pointed out before^1,10, for experiments involving a large number of tests of genetic markers, one should analyze the entire distribution of test statistics. In this setting different statistical paradigms should be considered, such as methods based on the false discovery rate principle¹¹, which has great promise for this setting (refs. 10,12 and S.-A.B., B.D., K.R. and L. Wasserman, unpublished data).

Note: Supplementary information is available on the Nature Genetics website.

References

Devlin, B. & Roeder, K. Biometrics 55, 997–1004 (1999).
Article CAS Google Scholar
Bacanu, S.-A., Devlin, B. & Roeder K. Am. J. Hum. Genet. 66, 1933–1944 (2000).
Article CAS Google Scholar
Devlin, B., Roeder, K. & Bacanu S.A. Genet. Epidemiol. 21, 273–284 (2001).
Article CAS Google Scholar
Pritchard, J.K. & Donnelly, P. Theor. Pop. Biol. 60, 227–237 (2001).
Article CAS Google Scholar
Lee, W.-C. Genet. Epidemiol. 27, 1–13 (2004).
Article Google Scholar
Hinds, D.A. et al. Am. J. Hum. Genet. 74, 317–325 (2004).
Article CAS Google Scholar
Marchini, J., Cardon, L.R., Phillips, M.S. & Donnelly P. Nat. Genet. 36, 512–728 (2004).
Article CAS Google Scholar
Reich, D.E. & Goldstein, D.B. Genet. Epidemiol. 20, 4–16 (2001).
Article CAS Google Scholar
Freedman, M.L. et al. Nat. Genet. 36, 388–393 (2004).
Article CAS Google Scholar
Tzeng, J-Y., Byerley, W., Devlin, B., Roeder, K. & Wasserman, L. J. Amer. Statist. Assoc. 98, 236–247 (2003).
Article Google Scholar
Benjamini, Y. & Hochberg, Y. J. R. Statist. Soc. B 57, 289–300 (1995).
Google Scholar
Devlin, B., Roeder, K. & Wasserman L. Genet. Epidemiol. 25, 36–47 (2003).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, 15213, Pennsylvania, USA
B Devlin & Silviu-Alin Bacanu
Department of Statistics, Carnegie Mellon University, Pittsburgh, 15213, Pennsylvania, USA
Kathryn Roeder

Authors

B Devlin
View author publications
You can also search for this author in PubMed Google Scholar
Silviu-Alin Bacanu
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn Roeder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B Devlin.

Supplementary information

Supplementary Note (PDF 23 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Devlin, B., Bacanu, SA. & Roeder, K. Genomic Control to the extreme. Nat Genet 36, 1129–1130 (2004). https://doi.org/10.1038/ng1104-1129

Download citation

Issue Date: 01 November 2004
DOI: https://doi.org/10.1038/ng1104-1129

This article is cited by

Interactions between exposure to polycyclic aromatic hydrocarbons and xenobiotic metabolism genes, and risk of breast cancer
- Derrick G. Lee
- Johanna M. Schuetz
- John J. Spinelli
Breast Cancer (2022)
Identification of 31 loci for mammographic density phenotypes and their associations with breast cancer risk
- Weiva Sieh
- Joseph H. Rothstein
- Laurel A. Habel
Nature Communications (2020)
Association of CREBRF variants with obesity and diabetes in Pacific Islanders from Guam and Saipan
- Robert L. Hanson
- Saied Safabakhsh
- Robert G. Nelson
Diabetologia (2019)
Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution
- Maarten van Iterson
- Erik W. van Zwet
- Bastiaan T. Heijmans
Genome Biology (2017)
Gene by stress genome-wide interaction analysis and path analysis identify EBF1 as a cardiovascular and metabolic risk gene
- Abanish Singh
- Michael A Babyak
- Elizabeth R Hauser
European Journal of Human Genetics (2015)

Genomic Control to the extreme

References

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Supplementary Note (PDF 23 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

Interactions between exposure to polycyclic aromatic hydrocarbons and xenobiotic metabolism genes, and risk of breast cancer

Identification of 31 loci for mammographic density phenotypes and their associations with breast cancer risk

Association of CREBRF variants with obesity and diabetes in Pacific Islanders from Guam and Saipan

Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution

Gene by stress genome-wide interaction analysis and path analysis identify EBF1 as a cardiovascular and metabolic risk gene

Search

Quick links

References

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Supplementary Note (PDF 23 kb)

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Interactions between exposure to polycyclic aromatic hydrocarbons and xenobiotic metabolism genes, and risk of breast cancer

Identification of 31 loci for mammographic density phenotypes and their associations with breast cancer risk

Association of CREBRF variants with obesity and diabetes in Pacific Islanders from Guam and Saipan

Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution

Gene by stress genome-wide interaction analysis and path analysis identify EBF1 as a cardiovascular and metabolic risk gene

Search

Quick links