Introduction

Case–control design is widely used in genetic association studies. In a well-designed case–control study, cases and controls have no differential structures except the obvious difference between the disease status and the genotypes (if linked to disease) under investigation. However, differential structure is a common issue in population-based case–control studies. The most well-known example is the population structure, correction of which has been studied extensively in the literature.1, 2, 3, 4, 5, 6, 7, 8, 9 Differential genotype errors between cases and controls is another possible source of biases for association analysis, especially so in large-scale genetic studies for which cases and controls may be genotyped under different conditions.10, 11

Genotype error occurs with inaccurate genotyping measurement or laboratory error. Genotype of single-nucleotide polymorphisms were reported to have misclassification rates in the range of 0.1–5%.12, 13 Most studies assumed that the genotype misclassification rates are the same (non-differential) for case and control groups, for which the association tests have correct type I error but may lose power. Different methods have been proposed to increase power in the presence of non-differential genotype error.14, 15, 16, 17, 18

However, the genotype measurement errors may have systematic variabilities between cases and controls when the two groups are subject to different experiment conditions.19 Genotype error may depend on factors such as, among others, quality of blood samples, working conditions of genotyping instrument and expertise of laboratory researchers. For example, the DNA samples from cases and controls in a large-scale association study may be collected at different phases of a study or from different centers using different genotyping instruments. Another example is, if the person genotyping the samples were not blinded to the disease status, quality of genotype measurements may have systematic difference between cases and controls and genotype errors may be differential.

The importance of addressing the differential errors in association analysis was reported in Clayton et al.10 In their study, the DNA sample were obtained from the same extraction protocol and of the same quality, but inflation in the single locus type I error was observed, much of which was caused by the systematic differential call rates between samples. Other heterogeneity factors such as population substructure contribute only minor proportion to the type I error inflation. Moskvina et al.11 studied the effects of differential genotyping error rate on the type I error in case–control studies. Their results show that differential genotyping error can substantially inflate the type I error of association tests even at a low misclassification rate. In fact, any factors (observed or unobserved) that are differential for cases and controls lead to bias as well as variance inflation in association tests.

Genomic control (GC) method1 was proposed for correcting variance inflation caused by population substructure. It assumes the cases and controls have differential sub-populations with different cryptic relatedness structures. The GC method corrects for the effect of differential cryptic relatedness by using information from unlinked null (or null) loci. Devlin and Roeder1 and Devlin et al.5 argued that in the presence of many unknown sub-populations with cryptic relatedness, there is no bias on average and the variance inflation contribute most to the inflation of the type I error. On the other hand, Gorroochurn et al.20 proposed a δ centralization method to correct for the bias of test statistic caused by population stratification.

In this paper, similar to the GC method, we assume that null markers (with similar allele frequencies) that are unlinked to the disease are available and these markers are subject to the same or similar misclassification models as the candidate marker. We use these null markers to estimate quantities related to the null distribution of the test statistic for the candidate marker. This study shows that the variance distortion is minor in the presence of differential errors and the bias caused by differential errors leads to the inflation of the type I error. We centralize the test statistics by using information estimated from the null markers. Note that with random misclassifications, perfect matching of allele frequencies is not feasible. Our strategy is to match allele frequencies as much as possible, and apply the regression-based procedure to correct for the biases. Simulation results show that our method is quite robust to the variabilities of the allele frequencies of the null markers and robust to mixing of a small proportion of linked loci in the selected ‘null’ markers.

Materials and methods

We focus on a bi-allelic candidate marker with alleles A and a. Let p=P(A) be the population allele frequency. Denote the three genotypes aa, Aa and AA by G′=0, 1, 2, respectively. In a case–control design, denote the disease status by Y=1 for cases and Y=0 for controls, and let the true genotype frequencies be pij=P(G′=j|Y=i), j=0, 1, 2; i=0, 1. The null hypothesis of no association is H0: p0j=p1j=pj, j=0, 1, 2. Let G be the observed genotype. The definitions of the observed genotype counts are given in Table 1.

Table 1 Genotype counts for case–control design

Owing to the genotyping error, the true genotypes may be misclassified. Denote the probability of misclassifying true genotype k as j by

where the superscript i denotes the group. Non-differential error means , or P(G=j|G′=k,Y=i)=P(G=j|G′=k). We assume that the misclassification rates are differential for the case and control groups, that is, for at least some k, j=0, 1, 2. Then the frequencies of observed genotypes for group i(i=0, 1) are given as

Note that when the error rates are non-differential, that is, for all k, j, we have p0j=p1j, j=0, 1, 2 under the null H0: p1j=p0j, j=0, 1, 2. This means that association tests is valid with correct type I error. In this paper, we consider an error model with two parameters (the probability of misclassifying a homozygote as the heterozygote in group i and (the probability of misclassifying the heterozygote as a homozygote in group i for i=0, 1. This model has been considered by other authors.16, 22 The error rates (or misclassification rates) with this model are presented in Table 2. We define the misclassification rate ratio (MRR) to be and use this as a measure of differential degree in genotype errors. Obviously, with the differential errors, the frequencies of observed genotypes for case (p1j) and groups (p0j) may not be identical under the null.

Table 2 Conditional probability of observed genotype given the true genotype in group i (i=0, 1)

Cochran–Armitage’s trend test and Pearson’s chi-square test

The Cochran–Armitage’s trend (CAT) test is widely used in genetic association analysis.23, 24 The CAT test assigns scores 0, x, 1 to the three genotypes aa, Aa, AA, respectively. Different specifications of the score x for the heterozygote Aa rely on the underlying mode of inheritance. Specifically, scores x=0, 0.5, 1 correspond, respectively, to the recessive, additive and dominant models.

Let and , are their estimates, where pj=n1p1j/n+n0p0j/n, and . For a specific choice of the score for genotype , the CAT indexed by is given as

This test is used for testing null hypothesis H0: p1j=p0j=pj, j=0, 1, 2 for the two groups with observed data. Let and . If there is no genotype error, then μx=0, and Zx is asymptotically distributed as a standard normal distribution N(0, 1).

If there exists differential error between cases and controls, then μx≠0 and . We call as the variance inflation factor (VIF). Figure 1 illustrates the bias μx and variance inflation factor λx as functions of allele frequency p or MRR, which measures the degree of differentiation of misclassification rates between cases and controls. For a given MRR, both of the μx and the λx decrease with allele p, indicating that the CAT is more severely biased when the allele A is rare. On the other hand, both of them are increasing with MRR (bottom panels of Figure 1), indicating the CAT is more biased when cases and controls are more differential in genotype misclassifications. The variance inflation factor is very close to 1 as shown in Figure 1. Extensive simulation and theoretical investigation also confirm this. We will thus focus on correcting the bias term in our study. In Appendix A, we show that

Figure 1
figure 1

Variance inflation factor (VIF) and bias of Cochran–Armitage’s trend (CAT) test with score x (n1=n0=500,=0.005) and misclassification rate ratio (MRR).

We therefore apply a quadratic regression method to correct for the biases caused by differential error.

Pearson's chi-square test (PCT) is given by where eij=nimj/n, which has several different representations. In our study, we choose to use the following representation,25, 26 which clearly links the PCT with the CAT test for x=0, 1:

Where . With this representation, we can correct the PCT through correcting the trend tests Z0 and Z1.27

Centralize the tests by using null markers

Centralizing the CAT test is possible if the error probabilities are known or estimable from additional information sources. Otherwise, the CAT has inflated type I error and cannot be used as a test for association when the misclassification rates are differential for the case and control groups. The GC method uses null markers that are unlinked to the disease and control to correct the bias and/or variance distortion caused by population structures. If the null markers satisfy the additional assumption that they are subject to the same (or roughly the same) misclassification distribution as the candidate marker, then these markers can also be used to correct the CAT similar to the GC method. As shown above, the variance distortion is not problematic if we consider small genotype errors. Thus, we consider only correction of bias using the information from the null markers.

Suppose there are K null markers available. We can assume that all these K markers are in Hardy–Weinberg equilibrium (HWE). Because in a large-scale genome-wide association studies, there are sufficient null markers that it is easy to pick out K markers in Hardy–Weinberg equilibrium. Suppose numerators of the CAT test statistics for the null markers are , then we can estimate μx from these statistics. A basic requirement for the null marker to be able to correct for the bias of the test on the candidate marker is that the null markers should be subject to the same (or roughly the same) misclassifications as the candidate. We will consider two different situations: (i) the null markers are matched with the candidate marker in minor allele frequencies (MAFs). In this situation, the null markers have the same or closely the same MAFs as the candidate marker, therefore it is reasonable to assume that they have the same or closely the same misclassification rates. Thus, the homogeneity assumption on misclassification rates is unquestionable; (ii) the null markers are not matched with the candidate. In this situations, we do not require the null markers to have the same MAFs as the candidate and therefore allows much more flexibility in selecting null markers.

For situation (i), one can simply estimate the bias parameter by sample mean of the ’s. Namely, This case is essentially the same as what is done in the original GC method.1 However, we would not explore this method further since matching MAFs for null markers is not reliable in presence of genotyping error and matching can also reduce the number of available null markers that can be used. For situation (ii), the null markers are not matched with the candidate marker in MAF, therefore we need to adjust for the variability of MAFs. Let the MAF of the k-th null marker be pk, which can be estimated by counting alleles from the genotype in control group (denoted by ). Appendix B shows that such estimates are asymptotically unbiased when misclassification rates are small (see Appendix B for details). Note that genotypes of null markers have errors, the MAF estimate may be biased. However, since presumably the misclassification rates are small (say, between 0.1% and 0.5%), is not severely biased and can be used as linear regression. We propose estimating the bias parameter by the quadratic formula

where are the regression parameters obtained from the linear regression model

where is the estimated MAF for the candidate marker and , k=1, 2,…, K are the computed CAT test statistics and estimated MAFs from the null markers. With either estimate of μx in situation (i) or (ii), one can centralize the CAT test statistic by

and the corrected the PCT by

Simulation

Let the risk allele be , population frequency is p=P(A), genotypes are AA, Aa, aa, use G′=2, 1, 0 (number of allele A) to denote the three genotypes and G the observed genotype.

We assume the following penetrance model,

where Y is the indicator of disease, zk is the score of genotype k. We take z0≡0, z2≡1 and use z1=1 for dominant model, z1=0.5 for additive model and z1=0 for recessive model, respectively. Note that eβ is the genotype relative risk and α is baseline log odds. In our simulation studies, we will investigate the GC method under the recessive (REC), additive (ADD) and dominant models (DOM). The corresponding tests are CAT with x=0, 0.5 and 1, respectively. Assuming Hardy–Weinberg equilibrium in the population, then the population genotype frequency , where k=0, 1, 2 and p=P(A) is the MAF for allele A in population. The genotype frequencies for the two groups are

where is specified by the given error model (for example, the (j, k) entry in Table 2).

Genotype data for null markers are generated in similar way except that these markers are not linked to the diseases (that is, β=0 in equation 4). In our simulation study, we do not require the null markers match with the candidate marker in MAF, their MAFs are randomly generated from uniform distribution U(0, 0.5). The genotype misclassification for the null markers are taken to be close to those for the candidate marker. Specifically, for each null marker we generate the error probabilities for control from and the error probabilities of case group are then determined by these values and the given value of MRR.

Results

For the candidate marker, MAF p is assumed to be 0.1 and 0.3, respectively. The misclassification rates are =0.005 and =0.003 for the control group in our simulation. The differential index, MRR, is taken to be 1, 3, 5, 7, 9. Note that when MRR=1, the errors are not differential. We use the error model shown in Table 2. Then the misclassification rates for the case group can be calculated by (i=0,1) and MRRs. Null markers are generated independently with the candidate marker. Their MAFs are generated from uniform distribution U(0, 0.5). The null markers are in principle unlinked to the disease, but we allow a proportion, r, of them to be linked with disease. When r=0, the null markers are unlinked to the disease. The misclassification rates for the null markers are taken to be close to the candidate marker allowing slight deviations (ɛ=0.002), specifically, the error rates for the null markers in control groups are generated from U(0.003, 0.007), U(0.001, 0.005). The number of null markers is K=100. The sample size is n=2000 (1000 cases and 1000 controls). We investigate the three common modes of inheritance of recessive, additive and dominant models indexed respectively by x=0, 0.5 and 1. All simulations are run with 2000 replicates.

Tables 3 and 4 show the simulation results for nominal significance levels α=0.05 and 0.01, respectively. From Tables 3 and 4, the type I errors of CAT are considerably inflated for large MRRs. For example, when the disease model is recessive, MAF=0.1 and MRR=5, the type I error is 0.194 for CAT and 0.155 for PCT if the nominal significance level is 0.05. Generally, the type I error of CAT or PCT is increasingly inflated when MRR increases or MAF decreases. The inflation is most evident for rare allele cases and is negligible when MAF is close to 0.5. In addition, the inflation is largest for recessive model and smallest for dominant model. The same phenomenon is observed in all the scenarios we investigated with different error models. Tables 5 and 6 give the type I errors for K=50 and K=200. Type I error can be well controlled even with small number of null markers; for example, K=50. When the number of null markers is becoming larger, the type I errors are better controlled, but there is not too much improvement when the number is greater than K=100. Thus, we suggest that K=100 is much enough in practice. Tables 7 and 8 report the type I errors when the misclassification rates are as large as 2%. From these two tables, we can conclude that the type I errors of the corrected CAT and PCT are much closer to the nominal level than the non-corrected ones with large misclassification rates.

Table 3 Type I error of the corrected and uncorrected trend tests and the PCT ( α =0.05, View full size image)
Table 4 Type I error of the corrected and uncorrected trend tests and the PCT (View full size image)
Table 5 Type I error of the corrected and uncorrected trend tests and the Pearson’s chi-square test (View full size image)
Table 6 Type I error of the corrected and uncorrected trend tests and the Pearson’s chi-square test (View full size image)
Table 7 Type I error of the corrected and uncorrected trend tests and the PCT (View full size image)
Table 8 Type I error of the corrected and uncorrected trend tests and the PCT (View full size image)

Note that when MRR=1, the genotype errors are not differential for cases and controls. The type I errors of CAT and PCT are generally close to the nominal values except when the mode of inheritance is recessive. When sample size becomes larger, the type I errors of CAT or PCT are all close to the nominal values when MRR=1 (results not shown here). We observed that the corrected CAT (denoted by CAT* in the table) has slightly inflated type I error compared with the original CAT. This is intuitively true since use of null markers introduced more variabilities. When MRR is greater than 1, that is, differential errors are present, the original CAT has substantially inflated type I error and the corrected CAT has type I error that are reasonably close to the nominal values for all the situations we considered.

Figure 2 illustrates the type I errors and the corrected type I errors under another setting of parameters (=0.0025, ɛ=0.0015). Figure 3 illustrates the power of the CAT and the quadratic regression GC-corrected CAT. Upper row is for MRR=1 and lower row is for MRR=4. We can see from the upper row that, when MRR=1, use of null markers does not affect the power performance of the trend test under the three genetic models. From the lower row, because of inflation of type I error, power of CAT test is also inflated. Our method can correctly control type I error of CAT and consequently report correct power and avoid spurious association.

Figure 2
figure 2

Type I error of Cochran–Armitage’s trend (CAT) tests and the corrected trend tests (CAT*); (n1=n0=1000, α=0.05, , ɛ=0.0015). MRR, misclassification rate ratio.

Figure 3
figure 3

Power of the Cochran–Armitage’s trend (CAT) tests and the corrected trend tests (CAT*); (n1=n0=1000, nominal level α=0.05, number of null markers K=100; , ɛ=0.0015). MRR, misclassification rate ratio.

Discussion

In this study, we proposed to correct for the bias introduced by differential errors in association analysis by a regression method using null markers. The bias of the association tests can be effectively captured by null markers that are unlinked to the disease. We have not required that the null markers match with the candidate marker in MAFs and we estimate the bias by using a quadratic regression method to adjust for the variabilities of MAFs of the null markers. The regression method fit the bias by a quadratic function of the MAFs. We have shown that the approach works well although the MAF estimates have biases with the presence of genotyping errors. This is because the bias of MAF estimates are small and does not have a significant role in the quadratic regression.

Misclassification rates for null markers (allele frequencies that are close to that of the candidate marker) are assumed to be the same with the candidates. However, it might be true that different loci may have different misclassification rates, which might depend on the specific allele frequency or genotype frequencies on each locus. We have recognized this and allowed the misclassification rates to be deviated slightly from that for the candidate marker. Simulation studies show that with slight fluctuations in misclassification rates of the null markers, the proposed method performs reasonably well.