Abstract
Correlated multiple testing is widely performed in genetic research, particularly in multilocus analyses of complex diseases. Failure to control appropriately for the effect of multiple testing will either result in a flood of false-positive claims or in true hits being overlooked. Cheverud proposed the idea of adjusting correlated tests as if they were independent, according to an ‘effective number’ (Meff) of independent tests. However, our experience has indicated that Cheverud's estimate of the Meff is overly large and will lead to excessively conservative results. We propose a more accurate estimate of the Meff, and design Meff-based procedures to control the experiment-wise significant level and the false discovery rate. In an evaluation, based on both real and simulated data, the Meff-based procedures were able to control the error rate accurately and consequently resulted in a power increase, especially in multilocus analyses. The results confirm that the Meff is a useful concept in the error-rate control of correlated tests. With its efficiency and accuracy, the Meff method provides an alternative to computationally intensive methods such as the permutation test.
Similar content being viewed by others
Introduction
Complex diseases such as diabetes, schizophrenia and cancer lay a heavy burden on public health, with frequencies over 1%. As they are caused by interactions among genes, an increasing number of studies (Hoh et al, 2001; Nelson et al, 2001; Ritchie et al, 2001; Cordell and Clayton, 2002) have employed multilocus methods to detect the causative genes. Despite the abundance of such studies, only 16–30% of the reports of significant associations have been consistently replicated (Page et al, 2003). One reason for the low replication rate is the failure in controlling for the effect of multiple testing (Lander and Kruglyak, 1995).
Multilocus methods search loci in combinations, which inevitably tests a great number of hypotheses simultaneously: a k-locus analysis of n loci would have Cnk locus-combinations to inspect. In addition to Cnk being a large and cumbersome number, the tests are correlated with each other. For example, locus-combination AB is obviously correlated with combination BC for they share a mutual locus B. Even when loci are completely independent, their combinations are still correlated. These characteristics complicate the control of the error rate in multilocus analyses. Appropriate adjustment for multiple testing is very important to prevent a flood of false-positive claims and to prevent true associations being overlooked.
A traditional criterion for multiple testing is the experiment-wise significant level (EWSL, αe), which is the probability that one or more hypotheses would be falsely rejected. To control the EWSL, the corresponding point-wise significant level (PWSL, αp), which is the probability that an individual hypothesis would be falsely rejected, should be calculated appropriately. For independent tests, the Bonferroni (or Sidak) correction provides a simple and accurate control: αp=1−(1−αe)1/M≈αe/M, where M is the number of tests. However, this adjustment is overly conservative for correlated tests. Lander and Botstein (1989) and Lander and Kruglyak (1995) offered analytical adjustments for correlated tests in single-locus analyses, but it is hard to apply them to multilocus analyses.
In practice, most researchers have therefore employed resampling-based methods to control the EWSL of correlated tests. Hoh et al (2001) and Ritchie et al (2001) used the permutation test; Nelson et al (2001) used the cross-validation test. Although resampling-based methods are accurate, they demand intensive computation, which constrains them to cases with a limited number of tests. Taking the permutation test as an example, Churchill and Doerge (1994) recommended at least 1000 shuffles to estimate a 0.05 EWSL and at least 10 000 shuffles to estimate a 0.01 EWSL.
An alternative criterion for multiple testing is the false discovery rate (FDR) (Benjamini and Hochberg, 1995), which is the expected proportion of falsely rejected hypotheses among all those rejected. When all hypotheses are true, the FDR is equal to the EWSL, but otherwise it is smaller. For independent tests, Benjamini and Hochberg (1995) and Benjamini and Liu (1999) proposed a step-up procedure and a step-down procedure to control the FDR. For correlated tests, Benjamini and Yekutieli (2001) proposed a simple but highly conservative procedure. Storey (2002) and Storey and Tibshirani (2003) proposed a variation of the FDR: the positive false discovery rate (pFDR), which is the conditional FDR given that at least one hypothesis is rejected. These authors gave direct estimates of the pFDR for independent tests, and suggested resampling-based methods such as the permutation test for correlated tests. Correlation among tests is still a problem for FDR methods.
In 2001, Cheverud proposed a new idea to adjust correlated tests as though they were independent according to an effective number (Meff) of independent tests. He also gave an estimation of the Meff for single-locus analyses. In Nyholt (2004)'s evaluation, the Meff method approximated the permutation test accurately, and was more than 100 times faster. It is highly desirable to apply this method to control the EWSL and FDR in multilocus analyses. However, our experience suggests that Cheverud's (2001) estimate of Meff is overly large, especially when there are a substantial number of moderately correlated tests. This situation is, unfortunately, typical of multilocus analyses, leading to overly conservative results.
In this study, we propose a more accurate estimate of Meff, and design Meff-based procedures to control the EWSL and FDR. An assessment based on both real and simulated data shows that the procedures can control the error rate accurately in multilocus analyses.
Approach
The Meff method
When loci are in linkage disequilibrium (LD), hypothesis tests in single-locus analyses are not independent. Cheverud (2001) proposed the method below to adjust the correlated tests:
Step 1: Calculate the correlation matrix for the loci.
Step2: Estimate the effective number (Meff) of independent tests from the eigenvalues of the correlation matrix with Equation (1), where M is the number of tests (equal to the number of loci in single-locus analyses) and λi (i=1,…, M) are the eigenvalues
Step 3: Adjust the test criteria as though there were Meff independent tests with the Sidak (1967) correction below
Step 4: Test the association between genotypes and phenotypes locus by locus. If the p-value of any test is lower than αp, the nonassociation hypothesis is rejected.
A new Meff equation
The key step of the Meff method is Step 2, estimating the Meff from the eigenvalues. As Cheverud (2001) explained, Equation (1) considers two extreme situations. When the tests are completely independent, the eigenvalues are all equal to 1 and Equation (1) gives Meff=M. When the tests are completely identical, the first eigenvalue is M, while all the others are zero, and Equation (1) gives Meff=1. Although Equation (1) gives correct estimates at the two extremes, it overestimates the Meff in other situations. If the M tests are composed of c (1⩽c⩽M) copies of M/c independent tests, the eigenvalues are Sequence (3)
According to Equation (1), Meff=M+1–c, but actually there are only M/c independent tests. The ratio r is
In multilocus analyses, M is large and c is moderate, so r tends to be considerably larger than 1.
Intuitively, c identical tests will result in an eigenvalue λi=c, while a test partially correlated with others will result in an eigenvalue 0<λi<1. Thus, an eigenvalue can be decomposed into two parts: the integral part and the nonintegral part. The integral part represents identical tests that should be counted as one in the Meff. The nonintegral part represents a partially correlated test that should be counted as a fractional number between 0 and 1. From this intuitive consideration, we propose Equation (5) to estimate the Meff:
where I(x⩾1) is the indicator function, which gives 1 when x⩾1 and gives 0 otherwise, and is the floor function, which gives the largest integer less than or equal to x. Thus, completely correlated tests will be counted as I(x⩾1) and partially correlated tests will be counted as . Equation (5) gives the right Meff for integral eigenvalue Sequence (6), where t is the true number of independent tests and c1, c2,…, ct are integers
Multilocus analyses
By regarding locus-combinations as ‘generalized loci’ (GLs), we can apply the Meff method to multilocus analyses. For example, if two loci have g1 and g2 genotypes, respectively, their combination can be regarded as a new ‘locus’ with g1g2 genotypes. Consequently, the calculation of the correlation coefficients in Step 1 needs adaptation. As GLs are multinomial variables, neither the Pearson correlation (used by Cheverud, 2001) nor the LD measure (used by Nyholt, 2004) is suitable.
Similar to the Pearson correlation for numerical variables, the correlation for multinomial variables can be defined as in Equation (7a), but the variance (Var) and the covariance (Cov) should be measured differently. Let X1 and X2 denote two GLs, that is, two multinomial variables. Their variance and covariance can be measured with the entropy and mutual information (Cover and Thomas, 1991), respectively, as in Equation (7b) and (7c).
where P(Xi) is the probability function of Xi and E[x] is the expectation of a random variable x.
Besides its compatibility with both single- and multilocus analysis, this correlation coefficient has a property:
where A, B and C are independent loci or GLs.
An Meff-based FDR procedure
Let R denote the number of rejected hypotheses and V the number of those falsely rejected. Benjamini and Hochberg (1995) defined the FDR as
where I(R>0) is the indicator function, which gives 1 when R>0 and gives 0 otherwise. The FDR is equal to the EWSL when all null hypotheses are true and is smaller otherwise. They also proposed the following step-up procedure to control the FDR for independent tests. Let p(1)⩽⋯⩽p(M) be the ordered p-values of M independent hypotheses H(1),…, H(M). To control the FDR at level q (usually 5%), hypotheses H(1),…, H(k) should be rejected, where
The ordered p-values are compared with an arithmetic sequence from q/M to q. The start of the sequence, q/M, ensures that the FDR is equal to the EWSL when all the hypotheses are true. The end of the sequence, q, ensures that the FDR is controlled at q when all the hypotheses are rejected. To involve the effect of correlation, we can replace the start q/M with q/Meff, keep the end as q, and consequently replace the step q/M with (q−q/Meff)/(M−1). The Meff-based step-up procedure is to reject H(1),…, H(k), where
Performance
Single-locus analyses
The Meff method has previously been applied to two single-locus analyses. Cheverud (2001) applied it to the LG/J and SM/J mouse data (Cheverud et al, 2001), and Nyholt (2004) applied it to the angiotensin-I-converting enzyme (ACE) gene data (Keavney et al, 1998). In these analyses, Cheverud (2001) and Nyholt (2004) derived two eigenvalue sequences from the two data sets, respectively. In this study, we calculated different Meff's with both Equations (1) and (5) from the two eigenvalue sequences, and compared the results. As shown in Table 1, the two equations gave similar Meff's, so they should have similar performance on the two data sets. Setting αe=5% for the Keavney data set, Nyholt (2004) obtained αp=0.015 from a permutation test with 50 000 shuffles, and obtained αp=0.011 with the Meff=4.59 from Equation (1). Using Meff=4.01 from Equation (5), we obtained αp=0.013, a value closer to the permutation test's 0.015 than the value of 0.011 obtained from Equation (1).
Multilocus analyses
To compare the performance of Equations (1) and (5) in multilocus analyses, we used four sub-data sets extracted from a schizophrenia data set (Xu et al, 2004). The schizophrenia data set concerns 12 genes of the dopamine pathway, COMT (7), ALDH3 (11), ADH (14), TYR (1), TPO (2), MAO (3), DAO (2), AOX1 (2), DDC (4), HDG (1), DBH (2) and GSTZ1 (1), with 83 cases and 108 controls. (The integers in the parentheses are the numbers of SNPs genotyped for each gene.) To investigate the performance on correlated loci, we extracted the SNPs of gene COMT, ALDH3 and ADH as three sub-data sets COMT, ALDH3 and ADH, respectively. The three genes were reported as potential effective SNPs by Xu et al (2004), and they have enough SNPs (more than five) to ensure the multiplicity of tests in the analyses. To investigate the performance on independent loci, we selected one SNP per gene as the fourth sub-data set and called it DOP (short for dopamine). For each data set, we performed single-, two- and three-locus analyses.
We used the permutation test as the golden standard to evaluate the performance. In each permutation shuffle, we applied the Meff method to control the EWSL (αe). As a result, we obtained a decision of whether or not to reject the nonassociation hypothesis. After repeating this procedure 3000 times, we counted the proportion of rejection (Pr). Ideally, Pr should be equal to the preset αe. The association between a GL and the phenotype was tested with a combination of the χ2 test and the Fisher's exact test (When Cochran (1954)'s condition was satisfied, the association was tested with the χ2 test. When the condition was not satisfied, it was tested with the Fisher's exact test. Cochran's condition: in a contingency table, every cell has an expected value no less than 1, and 80% of the cells have expected values of no less than 5. This algorithm was implemented in ACM algorithm 643 (Clarkson et al, 1993).)
Figure 1 shows the comparison between Pr and αe. In single-locus analyses, both Equations (1) and (5) controlled Pr at αe. In two- and three-locus analyses, the Meff's obtained from Equation (1) were much greater than those obtained from the permutation test (see Table 2), and consequently the Pr of Equation (1) was much lower than αe. In contrast, the Pr of Equation (5) approximated αe fairly well in most cases.
Effects on FDR
We performed two-locus analyses on simulated case–control data to evaluate the effects of the Meff on the FDR. The simulated data contains 10 genomic regions with five observed biallelic SNPs in each region. All SNPs are in the Hardy–Weinberg equilibrium and the minor alleles have the same frequency. SNPs in different regions are in linkage equilibrium, and SNPs in the same region are in LD with r2=0.8 (Hill and Robertson, 1968) between two successive SNPs. The disease is caused by the epistatic interaction of two unobserved loci, which hide in the first region and the second region, respectively. The LD (r2) between a causative loci and its nearest marker SNP is 0.8. As neither of the causative loci has a marginal effect, a two-locus analysis is necessary to detect the epistatic interaction. Consequently, there are C502=1225 SNP pairs to inspect, and only the 25 pairs composed of the loci from the first and second regions are associated with the disease. The data were simulated from the six epistasis models in Table 3 (Ritchie et al, 2003). For each model, 10 000 data sets were generated with 200 cases and 200 controls in each of them.
Both Benjamini and Hochberg (1995)'s direct step-up procedure and the new Meff-based step-up procedure were applied to control the FDR. Although Benjamini and Yekutieli (2001) designed an FDR procedure for correlated multiple testing, we did not employ it because it is much more conservative than the direct step-up procedure. The Meff was estimated with Equation (5). As shown in Figure 2, the Meff-based step-up procedure controlled the FDR more accurately than the direct method, and was considerably more powerful.
Discussion and conclusion
In 2001, Cheverud proposed the idea of adjusting correlated tests as if they were independent according to the effective number (Meff) of independent tests. Piepho (2001) has also proposed a quick method for computing approximate threshold levels that control the genome-wise type I error rate of tests for quantitative trait locus detection in interval mapping and composite interval mapping. In the multilocus analyses presented in this paper, our extended concept of Meff succeeded in controlling both the EWSL and the FDR. This result supports the use of Meff in error-rate control of correlated multiple testing. As a data-driven algorithm based on a correlation matrix, the Meff method can be applied to a wide range of problems. Nowadays, dense and linked markers are widely used in genetic study, and it is hard to model their genetic behavior precisely, especially in multilocus analyses. The diversity of populations also prohibits models obtained from one population from being applied to another. Avoiding the error of an imprecise model, the data-driven Meff method is suitable for different situations. In addition to the data-driven property, the Meff can be calculated for various types of data by generalizing the definition of correlation coefficient. With Equation (7), it can be applied to categorical data. With the Pearson correlation, it can be applied to numerical data such as microarray data. With King and Chinchilli (2001)'s generalized correlation coefficient, it can be applied to a mixture of numerical and categorical data.
Cheverud (2001) proposed Equation (1) to estimate the Meff. As he explained, Equation (1) considers two extreme situations: hypothesis tests are completely independent or they are completely identical. However, Equation (4) shows that Equation (1) tends to overestimate the Meff when there are a large number of moderately correlated tests. Unfortunately, in multilocus analyses, this situation is usually encountered, and Equation (1) will lead to conservative reports. In this paper, we propose Equation (5) to estimate the Meff more accurately. It considers a more general situation than the two extremes Equation (1) concerns, where tests are composed of copies of independent tests. Therefore, it outperforms Equation (1) in multilocus analyses. Figure 1 clearly shows that both Equations (1) and (5) perform excellently in single-locus analyses, but only the latter is effective in multilocus analyses. The higher performance of Equation (5) over Equation (1) is not because it better fits the correlation coefficient defined by Equation (7). As Equation (4) has shown, no matter what correlation coefficient is employed to calculate the correlation matrix, Equation (1) will overestimate the Meff for eigenvalue Sequence (3), while Equation (5) will give an appropriate estimate. We also used the Pearson correlation in single-locus analyses, but as shown in Table 2, its performance was similar to that of Equation (7).
If n tests need adjustment, the computation complexity of a permutation test is O(stn), where s is the number of shuffles and t is the time needed to test a hypothesis. The computation cost for the Meff method concentrates on eigenvalues' calculation whose complexity is O(mn3), where m is the time needed to multiply two real numbers. Theoretically, when n is large, the permutation test is more efficient than the Meff method. However, because m is tiny in comparison with st, the Meff method is much faster than the permutation test until n is very large. The largest n we encountered in this study was 1225 in Section Effects on FDR. It took about 10 000 s to repeat the calculation of the 1225 p-values 10 000 times, and took about 3 s to solve the eigenvalues with the numeric package Lapack. (The computation time was measured on a 2.8 GHz Pentium with Linux 2.6.10.) When there are about 1000 tests, the Meff method is at least 1000 times faster than the permutation test. This would meet the need of a genome scan with about 1000 SNPs per chromosome. As SNPs in different chromosomes are in linkage equilibrium, the correlation matrix of a single-locus analysis is diagonally blocked. We can calculate the Meff for each chromosome individually and then sum them up as the Meff for the scan.
Freed from the heavy computation burden, statisticians can realize many sophisticated multilocus methods. For example, we can split a classification tree's nodes according to the EWSL or the FDR rather than the PWSL. In a tree's growth procedure, a node splits when the p-value of the optimal variable exceeds a threshold. The more the variables are, the easier it is for the optimal variable to exceed the threshold, and the more easily the tree overfits. Zhang and Bonney (2000) considered this problem in splitting, but they arbitrarily set the PWSL<0.001 as the threshold. With the Meff method, the threshold can be estimated efficiently. If the threshold is calculated with the permutation test, the computation will be very demanding because every node needs thousands of shuffles.
Multiple testing is not only important in data analysis but also in experiment design. Neglecting the issue would lead to an underpowered design, while overemphasizing it would waste resources. Owing to the expense of genetic experiments, it is highly desirable to apply the Meff method to experiment designs. By calculating the Meff from a pilot study, we can estimate the appropriate PWSL and the sample size for an experiment involving multiple testing.
References
Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B (Methodological) 57: 289–300.
Benjamini Y, Liu W (1999). A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. J Stat Plan Infer 82: 163.
Benjamini Y, Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency. Ann Stat 29: 1165–1188.
Cheverud JM (2001). A simple correction for multiple comparisons in interval mapping genome scans. Heredity 87: 52–58.
Cheverud JM, Vaughn TT, Pletscher LS, Peripato AC, Adams ES, Erikson CF et al (2001). Genetic architecture of adiposity in the cross of LG/J and SM/J inbred mice. Mamm Genome 12: 3–12.
Churchill GA, Doerge RW (1994). Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971.
Clarkson DB, Fan Y-a, Joe H (1993). A remark on algorithm 643: FEXACT: an algorithm for performing Fisher's exact test in r × c contingency tables. ACM Trans Math Software (TOMS) 19: 484–488.
Cochran W (1954). Some methods of strengthening the common χ2 tests. Biometrics 10: 417–451.
Cordell HJ, Clayton DG (2002). A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am J Hum Genet 70: 124–141.
Cover TM, Thomas JA (1991). Elements of Information Theory. Wiley-Interscience: New York.
Hill WG, Robertson A (1968). Linkage disequilibrium in finite populations. Theoret Appl Genet 38: 226–231.
Hoh J, Ott J (2003). Mathematical multi-locus approaches to localizing complex human trait genes. Nat Rev Genet 4: 701–709.
Hoh J, Wille A, Ott J (2001). Trimming, weighting, and grouping SNPs in human case–control association studies. Genome Res 11: 2115–2119.
Keavney B, McKenzie CA, Connell JM, Julier C, Ratcliffe PJ, Sobel E et al (1998). Measured haplotype analysis of the angiotensin-I converting enzyme gene. Hum Mol Genet 7: 1745–1751.
King TS, Chinchilli VM (2001). A generalized concordance correlation coefficient for continuous and categorical data. Stat Med 20: 2131–2147.
Lander E, Kruglyak L (1995). Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet 11: 241–247.
Lander ES, Botstein D (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.
Nelson MR, Kardia SL, Ferrell RE, Sing CF (2001). A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res 11: 458–470.
Nyholt DR (2004). A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet 74: 765–769.
Page GP, George V, Go RC, Page PZ, Allison DB (2003). ‘Are we there yet’? Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits. Am J Hum Genet 73: 711–719.
Piepho H-P (2001). A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157: 425–432.
Ritchie MD, Hahn LW, Moore JH (2003). Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol 24: 150–157.
Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF et al (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69: 138–147.
Sidak Z (1967). Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc 62: 626–633.
Storey JD (2002). A direct approach to false discovery rates. J Roy Stat Soc Ser B 64: 479–498.
Storey JD, Tibshirani R (2003). Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100: 9440–9445.
Xu Q, Jia YB, Zhang BY, Zou K, Tao YB, Wang YP et al (2004). Association study of an SNP combination pattern in the dopaminergic pathway in paranoid schizophrenia: a novel strategy for complex disorders. Mol Psychiatry 9: 510–521.
Zhang H, Bonney G (2000). Use of classification trees for association studies. Genet Epidemiol 19: 323–332.
Acknowledgements
We are grateful to Qi Xu et al for their generous sharing of the published schizophrenia data. We thank Ying Huang, Yulong Zhang and the journal reviewers for their helpful comments. This work was supported by the Ministry of Science and Technology of People's Republic of China under Contract No. 2003CB517106.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, J., Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005). https://doi.org/10.1038/sj.hdy.6800717
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/sj.hdy.6800717