To the editor:

In the study of complex disease, separating causal from confounded factors is a challenge for genetic epidemiologists. One tool useful for separating these factors is Genomic Control (GC). In this communication we clarify how and when to use GC. We also describe a refined approach to GC, which should be used when GC is applied to extreme settings.

Population-based studies, such as case-control studies, are common designs used to determine the genetic and environmental bases of disease. To avoid false positive associations, design and analysis of population-based studies should account for population stratification, which can inflate association test statistics. One analytic method used to control the false positive rate is GC. In our original paper1, we investigated two scenarios with two corresponding analytic methods. GC is the version similar to the typical approach to hypothesis testing, and GCB is the version that uses Bayesian inference. GC is suitable when a modest number of candidate genes are assessed and L supplementary loci are included for control. The supplementary loci, called null loci, are used to correct any inflation, λ, in association test statistic(s) by estimating λ from the null test statistics. GC produces average rejection rates close to the targeted 0.05 significance level2,3,4.

We also considered population-based studies when large numbers of markers are tested1. GCB is designed for this scenario. Rather than preselecting null loci, GCB delineates loci associated with disease as 'outliers' relative to most of the loci tested.

In our original papers1,2, we argued that a population-based study should attempt to remove the effect of stratification by experimental design and analysis, such as by matching cases and controls for ethnicity and environmental covariates. GC then adjusts for the residual effects of stratification. Careful study design and implementation pay off in statistical power5,6; even small stratification can have considerable consequences for large samples1,2,3,4,5.

Marchini et al.7 explored the efficacy of GC for population-based studies under less ideal conditions, using subjects that originate from different populations and including environmental effects that induce geographically distinct prevalences; both of these possibilities were ignored in the design and analysis. Because they genotyped a large number of loci, they required an extremely small significance level (α) for P values. They found that GC could be anticonservative when α is small. Their results are sensible because GC treats λ as a known constant8. For small values of α, variability in the estimate of λ matters. The population-based studies explored by Marchini et al.7 can produce highly inflated test statistics (Fig. 1), and, because these population-based studies involve a large number of candidate loci, they are more appropriately analyzed by using GCB rather than GC.

Figure 1: Performance of GC as a function of the targeted significant P value (α), the effect of stratification (λ) and the number of null loci included (L).
figure 1

For the solid line, α = 10−2; at λ = 10, α decreases by an order of magnitude for each consecutive line thereafter. Note the different scales in the top panels versus the bottom panels. Marchini et al.7 generated their data by using a beta-binomial model. We avoided generating individual loci by working with a summary statistic for the values, thereby obtaining a good approximation to their simulations. The tests are distributed as a scaled χ2 statistic, λχ2. A sketch of our procedure, for a single choice of λ, α and L, requires several steps: generate L copies of x, each x distributed χ2, and multiply each x by λ; use GC to compute λε; draw another random realization x from a χ2 and compute the GC test statistic as y = λxε. Carrying out these steps many times produces pm, the expected fraction of times the P value exceeds α for a given λ and L. Then log10(pm/α) is calculated. Carrying out this procedure for a large number of settings for λ, α and L produces these results, which capture the essence of the results that Marchini et al.7 obtained by using their simulation techniques. For n = 1,000, models A1, A2 and B1–B5 of Marchini et al.7 are inflated by λ ≈ 18.8, 11.0, 1.1, 1.2, 1.7, 1.6 and 4.1, respectively. See Supplementary Note online for more information.

Because λ is determined by sample size, stratification and differential prevalence1, we can generate and compactly represent the general findings of Marchini et al.7 (Fig. 1). Four features stand out from our results: (i) GC works well in situations for which it was originally intended, namely larger values of α and/or smaller values of λ (refs. 14); (ii) GC becomes increasingly anticonservative as α decreases or as λ increases; (iii) bias is also a function of L (refs. 1,2); and (iv) even minor stratification can have a substantial impact on population-based studies with large sample sizes1,7,9. Of these results, the second feature is new to Marchini et al.7; the fourth was shown mathematically1 before it was demonstrated empirically7,9.

Is there a way to adjust the procedure if a researcher wishes to apply the logic of GC and use an extremely small α value? Correcting the bias in GC is straightforward by simple modification of the test statistic (GCF). For GCF, estimate λ using the mean (λm) of the null test statistics and account for the variability of λm by using an F test to determine the P values. Notably, GCF is accurate throughout the parameter space, even for only 30 null loci (Fig. 2).

Figure 2: Performance of GCF as a function of α, λm and L.
figure 2

Simulations were done as described for Figure 1, with two exceptions. Instead of using the robust estimate for λ, λε, we used the mean λm. And instead of determining the P value of y from a χ2 distribution, y was assumed to be distributed according to F(1,L) and the P values were calculated from that distribution. Note the compressed vertical scale (relative to Fig. 1), reflecting the miniscule error for all settings. The greatest error was observed for L = 30 and α = 10−7. See Supplementary Note online for more information.

Our means of validating the results of Marchini et al.7 and our own results use a shortcut method that Marchini et al.7 did not use. Our results are also supported by using the simulation methods of Marchini et al.7. When we used their methods and analyzed the data using GCF, we again found that GCF yielded an excellent approximation for small values of α (Table 1), even when λ is inflated substantially by large sample size or geographically distinct prevalences.

Table 1 Targeted significance levels of GCF compared with the realized values produced by simulations using a beta-binomial model

In summary, when a large number of candidate loci are genotyped or when α is small, application of GC produces misleading results (Fig. 1), as Marchini et al.7 show. Because GCF corrects this bias for small values of α, and does so in a range of settings (Fig. 2 and Table 1), we conclude that the bias is largely due to the uncertainty in λ. GCF accounts for this uncertainty in its degrees of freedom. Thus, GCF provides a simple alternative to recently suggested methods based on the confidence interval for λ (ref. 9).

As we have pointed out before1,10, for experiments involving a large number of tests of genetic markers, one should analyze the entire distribution of test statistics. In this setting different statistical paradigms should be considered, such as methods based on the false discovery rate principle11, which has great promise for this setting (refs. 10,12 and S.-A.B., B.D., K.R. and L. Wasserman, unpublished data).

Note: Supplementary information is available on the Nature Genetics website.