An Adaptive Fisher’s Combination Method for Joint Analysis of Multiple Phenotypes in Association Studies

Currently, the analyses of most genome-wide association studies (GWAS) have been performed on a single phenotype. There is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Therefore, using only one single phenotype may lose statistical power to identify the underlying genetic mechanism. There is an increasing need to develop and apply powerful statistical tests to detect association between multiple phenotypes and a genetic variant. In this paper, we develop an Adaptive Fisher’s Combination (AFC) method for joint analysis of multiple phenotypes in association studies. The AFC method combines p-values obtained in standard univariate GWAS by using the optimal number of p-values which is determined by the data. We perform extensive simulations to evaluate the performance of the AFC method and compare the power of our method with the powers of TATES, Tippett’s method, Fisher’s combination test, MANOVA, MultiPhen, and SUMSCORE. Our simulation studies show that the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful test. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study.

Scientific RepoRts | 6:34323 | DOI: 10.1038/srep34323 to a linear combination of phenotypes that has the highest heritability among all linear combinations of the phenotypes. Based on PCH, several advanced methods have been proposed such as penalized PCH applicable to high-dimensional data 22,23 and principle components of heritability with coefficients maximizing the quantitative phenotype locus heritability (PCQH) 11,24,25 . The third group, combining test statistics from univariate tests, is to conduct univariate analysis on each phenotype, then combine the univariate test statistics 26 . The Trait-based Association Test that uses Extended Simes procedure (TATES) 12 belongs to this group. TATES combines p-values obtained in standard univariate GWAS while correcting for the correlation between p-values.
Motivated by TATES, in this article, we propose an Adaptive Fisher's Combination (AFC) method for joint analysis of multiple phenotypes in genetic association studies. We first test the association between each of the phenotypes and a genetic variant of interest using standard GWAS software. Then, AFC uses the optimal number of p-values which is determined by the data to test the association. Using extensive simulation studies, we evaluate the performance of the proposed method and compare the power of the proposed method with the powers of TATES, Tippett's method 27 , Fisher's Combination test (FC) 28 , Multivariate Analysis of Variance (MANOVA) 29 , joint model of Multiple Phenotypes (MultiPhen) 8 , and Sum Score method (SUMSCORE) 12 . Our simulation studies show that the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful tests. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study.

Method
Consider a sample of n unrelated individuals. Each individual has K phenotypes. Denote Y k = (y 1k , … , y nk ) T as the k th phenotype of n individuals. Denote X = (x 1 , … , x n ) T as the genotypic score of n individuals at a genetic variant of interest, where x i ∈ {0, 1, 2} is the number of minor alleles that the i th individual carries at the genetic variant. We propose a new method to test the null hypothesis H 0 : none of the K phenotypes are associated with the genetic variant.
, and p T k denote the p-value of T k . The statistic of AFC to test the association between the K phenotypes and the genetic variant is given by We use the following permutation procedure to evaluate the p-values of T k and T all .
Suppose that we perform B. times of permutations.
Then, the p-vae of T all is given by As shown in Appendix, the null distributions of p 1 , p 2 , … , p K and thus of T all do not depend on the genetic variant being tested. Thus, the permutation procedure described above to generate an empirical null distribution of T all needs to be done only once for a GWAS.  8 , and Sum Score method (SUMSCORE) 12 . Here we briefly introduce each of those methods using the notations in the method section. MANOVA. Consider a multivariate multiple linear regression model: Y = Xβ T + ε, where Y denotes the n × K matrix of phenotypes, β T = (β 1 , … , β K ) is a vector of coefficients corresponding to the K phenotypes, and ε. is the n × K matrix of random errors with each row of ε to be i.i.d. MVN (0, Σ), where Σ is the covariance matrix of ε.
To test H 0 : β = 0, the likelihood ratio test is equivalent to the Wilk's Lambda test statistic of MANOVA 37 , that is, Here Λ denote the ratio of the likelihood function under H 0 to the likelihood function under H 1 , l(β, Σ) is the log-likelihood function, 1 is the maximum likelihood estimator (MLE) of β, and |·| denotes the determinant of a matrix. Then the test statistic has an asymptotic χ K 2 distribution 38 .
MultiPhen. By performing ordinaregression, MultiPhen develops a reversed analysis for joint analysis of multiple phenotypes by considering a genetic variant of interest X = (x 1 , … , x n ) T as a response variable, and the correlated phenotypes Y k = (y 1k , … , y nk ) T as predictors.
SUMSCORE. Let T score k denote the score test statistic to test the association between the k th phenotype and the genetic variant. The test statistic of SUMSCORE is given by The p-value of T SUMSCORE is estimated using a permutation procedure.
Tippett. The test statistic of Tippett is given by = T p min Tippett k k . The p-value of T Tippett is estimated using a permutation procedure.

FC. The Fisher's combination test statistic is defined as
Yang et al. 28 adopted three different approaches to calculate the p-value for correlated phenotypes. In this article, we calculate the p-value using a permutation procedure.
AFC, FC, and Tippett are closely related. Intuitively, when only one phenotype or very few phenotypes are associated with the variant, Tippett is more powerful than FC because in this case FC contains a lot of noises. When all phenotypes or a large proportion of the phenotypes are associated with the variant, FC is more powerful than Tippett because in this case Tippett only uses the minimum p-value and loses information. AFC can be adaptive to the number of phenotypes associated with the variant.
Simulation. We generate genotype data at a genetic variant according to a minor allele frequency (MAF) under Hardy-Weinberg equilibrium. Phenotypes are generated similarly to that of van der Sluis et al. 12 . The phenotypic correlation structures mimic that of UK10K 39 , that is, the phenotypes are divided into several groups (factors) and the within-group correlation is larger than the between-group correlation. Denote Y k = (y 1k , … , y nk ) T as the k th phenotype of n individuals and X = (x 1 , … , x n ) T as the genotypic score of the n individuals at the genetic variant.
Scenario 1. considering one factor model with genetic variant effect on the factor. We first generate a common factor, f = βX, where f is the n by 1 common factor and β is the effect size. Then we simulate K phenotypes by where a is a factor loading, ε k = (ε 1k , … , ε nk ) T ~ MVN(0, I n ), and I n is the identity matrix.
Scenario 2. considering 4 factor model with the genetic variant effect on the fourth factor, each factor has (K)/4 (K is a multiple of 4) phenotypes. We generate 4 correlated factors using (f 1 , A is a matrix with elements of 1, I is the identity matrix, and ρ fa is the correlation between any two factors. Then, we transform the fourth factor f 4 to ⁎ f 4 by β = + ⁎ f f X 4 4 and simulate K phenotypes using 1, , where a is a factor loading, ε j = (ε 1j , … , ε nj ) T ~ MVN(0, I n ) for j = 1, … , 4, and β is the effect size.
Scenario 3. considering two factor model with the genetic variant effect on the second factor, each factor has (K)/(2) (K is a multiple of 2) phenotypes. We generate two correlated factors using (f 1 , A is a matrix with elements of 1, I is the identity matrix, and ρ fa is the correlation between any two factors. Then, we transform the second factor f 2 to ⁎ f 2 by and simulate K phenotypes using where a is a factor loading, ε j = (ε 1j , … , ε nj ) T ~ MVN(0, I n ) for j = 1, 2, and β is the effect size.
Scenario 4. considering 4 factor model with genetic variant effect specific to the K th phenotype, each factor has (K)/4 (K is a multiple of 4) phenotypes. By using the original factors where a is a factor loading, ε j = (ε 1j , … , ε nj ) T ~ MVN(0, I n ) for j = 1, … , 4, and β is the effect size.
Scenario 5. considering one factor model with the genetic variant effect specific to the K th phenotype. We simulate K phenotypes by where f~MVN(0, I n ), a is a factor loading, ε k = (ε 1k , … , ε nk ) T ~ MVN(0, I n ), and β is the effect size.
Scenario 6. considering a network model, where the K phenotypes are correlated and the correlation structure is not due to one or multiple underlying common factors. We generate K phenotypes independent of genotypes for each individual by using A is a matrix with elements of 1, I is the identity matrix, and ρ phe is the correlation between any two phenotypes. After generat- . In scenarios 2-5, the within-factor correlation is a 2 and between-factor correlaiton is a 2 ρ fa . To evaluate type I error of the proposed method, we generate phenotypic values independent of genotypes by assigning β = 0. To evaluate power, we generate phenotypic values according to the six scenarios described above.
Simulation results. We use two sets of simulations to evaluate the type I error rates of the proposed method.
The first set of simulations is normal simulation studies and includes 10,000 replicated samples for each sample size under each scenario. The p-values are estimated using 10,000 permutations. For 10,000 replicated samples, the 95% confidence intervals (CIs) for type I error rates at the nominal levels 0.01 and 0.001 are (0.008, 0.012) and (0.0004, 0.0016), respectively. The estimated type I error rates of the proposed test (AFC) are summarized in Table 1. From Table 1, we can see that most of the estimated type I error rates are within the 95% CIs and those type I error rates not within the 95% CIs are very close to the bound of the corresponding 95% CI, which indicates that the proposed method is valid.
The second set of simulations mimics GWAS. To be comparable with the real data analysis, we generate 6,000 unrelated individuals with 8 phenotypes at 10 6 variant sites under each scenario. The phenotypes are independent of genotypes. The MAF at each variant is a random number between 0.05 and 0.5. The null distributions of T 1 , … , T K and T all are generated by 10 6 permutations using the genotypes at the first variant. We consider genotypes at 10 6 variants as 10 6 replicated samples. For 10 6 replicated samples, the 95% confidence intervals (CIs) for the type I error rates at the nominal levels 10 −3 , 10 −4 , and 10 −5 are (0.94× 10 −3 , 1.06× 10 −3 ), (0.80 × 10 −4 , 1.20 × 10 −4 ) and (0.38 × 10 −5 , 1.62 × 10 −5 ), respectively. The estimated type I error rates of the proposed test (AFC) are summarized in Table 2. From Table 2, we can see that all of the estimated type I error rates are within the 95% CIs, which indicates that the proposed method is valid.
For power comparisons, we consider (1)   involved. For Figs 1 and 2, the p-values of AFC, FC, Tippett, and SUMSCORE are estimated using 10,000 permutations and the p-values of TATES, MultiPhen, and MANOVA are estimated using asymptotic distributions. The powers of all tests are evaluated using 1,000 replicated samples at 0.1% significance level. For Fig. 3, the p-values of AFC, FC, Tippett, and SUMSCORE are estimated using 10 7 permutations. The powers of all tests are evaluated using 1,000 replicated samples at 10 −6 significance level. Figure 1 gives  38 . We also give power comparisons of the 7 tests using a significance level of 10 −6 with 10 7 permutations and 1,000 replicates for the power as a function of effect size (β ) under scenario 2 (Fig. 3). Figure 3 shows that the patterns of the power comparisons using significance level 10 −6 are similar to that using a significance level of 0.1% in Fig. 1 (scenario 2).
In summary, the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful tests. No other methods have consistently good performance under the six simulation scenarios.
Application to the COPDGene. Chronic obstructive pulmonary disease (COPD) is one of the most common lung diseases characterized by long term poor airflow and is a major public health problem 40 . The COPDGene Study 41 (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id= phs000179.v1.p1) is a multi-center genetic and epidemiologic investigation to study COPD. This study is sufficiently large and appropriately designed for genome-wide association analysis of COPD. In this study, a total of more than 10,000 subjects have been recruited including non-Hispanic Whites (NHW) and African-Americans (AA). The participants in this study have completed a detailed protocol, including questionnaires, pre-and post-bronchodilator spirometry, high-resolution CT scanning of the chest, exercise capacity (assessed by six-minute walk distance), and blood samples for genotyping. The participants were genotyped using the Illumina OmniExpress platform. The genotype data have gone through standard quality-control procedures for genome-wide association analysis detailed at http://www.copdgene.org/sites/default/files/GWAS_QC_Methodology_20121115.pdf. Variants with MAF < 1% were excluded in the data set.
To evaluate the performance of our proposed method on a real data set, we applied all of the 7 methods to the COPDGene of NHW population to carry out GWAS of COPD-related phenotypes. Based on the literature studies of COPD 42,43 , we selected 7 key quantitative COPD-related phenotypes, including FEV1 (% predicted FEV1), Emphysema (Emph), Emphysema Distribution (EmphDist), Gas Trapping (GasTrap), Airway Wall Area (Pi10), Exacerbation frequency (ExacerFreq), Six-minute walk distance (6MWD), and 4 covariates, including BMI, Age, Pack-Years (PackYear) and Sex. EmphDist is the ratio of emphysema at − 950 HU in the upper 1/3 of lung fields compared to the lower 1/3 of lung fields. Followed by Chu et al. 42 , we did a log transformation on EmphDist in the following analysis. The correlation structure of the 7 COPD-related phenotypes is given in Fig. 4. In the analysis, participants with missing data in any of the 11 variables were excluded. Therefore, a complete set of 5,430 individuals across 630,860 SNPs were used in the following analyses. In the analysis, we first adjusted    each of the 7 phenotypes for the 4 covariates using linear models. Then, we performed the analysis based on the adjusted phenotypes.

Discussion
GWAS have identified many variants with each variant affecting multiple phenotypes, which suggests that pleiotropic effects on human complex phenotypes may be widespread. Therefore, statistical methods that can jointly analyze multiple phenotypes in GWAS may have advantages over analyzing each phenotype individually. In this article, we developed a new method AFC to jointly analyze multivariate phenotypes in genetic association studies. We used extensive simulation studies as well as real data application to compare the performance of AFC with TATES, Tippett, FC, MANOVA, MultiPhen, and SUMSCORE. Our simulation results showed that AFC has correct type I error rates. With respect to power, AFC is either the most powerful test or has similar power with the most powerful test under a variety of simulation scenarios. Additionally, the real data analysis results demonstrated that the proposed method has great potential in GWAS on complex diseases with multiple phenotypes such as COPD.
AFC has several important advantages. First, it allows researchers to test genetic associations using standard GWAS software. Second, phenotypes of different types (e.g., dichotomous, ordinal, continuous) can be easily  analyzed simultaneously. Third, since AFC is based on p-values obtained from standard univariate GWAS, it can not only test the association between multiple phenotypes and one genetic variant of interest, but also can test the association between multiple phenotypes and multiple genetic variants. For common variants, multiple-variant AFC can be applied based on p-values obtained in standard univariate GWAS for each variant and each phenotype. For rare variants, we can first combine genotypes of rare variants by giving different weights, hoping that we give big weights to the variants having strong associations with the phenotypes. Then, we can apply AFC to test the association between the combined genotypes and multiple phenotypes. In conclusion, we showed that our proposed method provides a useful framework for joint analysis of multiple phenotypes in association studies.
It is well known that the effect sizes of identified variants are often small and that a large sample size is necessary to ensure sufficient power to detect such variants. A common strategy to increase sample size is to perform a meta-analysis by combining summary statistics from a series of studies. The proposed AFC method can be applied to meta-analysis. Suppose that there are L independent studies containing the variant of interest and each study has K phenotypes. Let T 1l , … , T Kl denote the summary statistics from the l th study. Suppose that T l = (T 1l , …, T Kl ) T~N (0,Σ l ) under the null hypothesis, where Σ l can be estimated from the values of T l for all independent SNPs in the GWAS from the l th study 58  , where P l = (p 1l , … , p Kl ) T . The AFC test statistic is based on the p-values P. In the permutation procedure, we can generate T according to the distribution N(0,Σ) and then we can calculate the p-values P in each permutation.