FunSPU: A versatile and adaptive multiple functional annotation-based association test of whole-genome sequencing data

Despite ongoing large-scale population-based whole-genome sequencing (WGS) projects such as the NIH NHLBI TOPMed program and the NHGRI Genome Sequencing Program, WGS-based association analysis of complex traits remains a tremendous challenge due to the large number of rare variants, many of which are non-trait-associated neutral variants. External biological knowledge, such as functional annotations based on the ENCODE, Epigenomics Roadmap and GTEx projects, may be helpful in distinguishing causal rare variants from neutral ones; however, each functional annotation can only provide certain aspects of the biological functions. Our knowledge for selecting informative annotations a priori is limited, and incorporating non-informative annotations will introduce noise and lose power. We propose FunSPU, a versatile and adaptive test that incorporates multiple biological annotations and is adaptive at both the annotation and variant levels and thus maintains high power even in the presence of noninformative annotations. In addition to extensive simulations, we illustrate our proposed test using the TWINSUK cohort (n = 1,752) of UK10K WGS data based on six functional annotations: CADD, RegulomeDB, FunSeq, Funseq2, GERP++, and GenoSkyline. We identified genome-wide significant genetic loci on chromosome 19 near gene TOMM40 and APOC4-APOC2 associated with low-density lipoprotein (LDL), which are replicated in the UK10K ALSPAC cohort (n = 1,497). These replicated LDL-associated loci were missed by existing rare variant association tests that either ignore external biological information or rely on a single source of biological knowledge. We have implemented the proposed test in an R package “FunSPU”.


Introduction
In recent years, large-scale whole-exome sequencing and whole-genome sequencing (WGS) data have been generated, such as those in the Exome Sequencing Project 1 , the UK10K project 2 and the ongoing NIH NHLBI Trans-Omics for Precision Medicine (TOPMed) WGS program 3 , providing unprecedented opportunities to investigate low-frequency variants (minor allele frequency [MAF] between 1% and 5%) and rare variants (RVs; MAF < 1%) in association with complex diseases and traits. However, WGS-based association analysis of complex traits remains a tremendous challenge due to the large number of RVs, many of which are non-traitassociated neutral variants. External biological knowledge, such as functional annotations, might be informative to distinguish causal RVs from neutral ones. Some recent large-scale functional genomic studies, such as ENCODE 4 , NIH Epigenomics Roadmap 5 and GTEx 6 projects, provide rich resources to use in characterizing the functional consequences of single nucleotide variants (SNVs), especially those in non-coding regions. Many approaches have been developed for functional annotations by integrating these data, e.g., CADD 7 , GenoSkyline 8 and Eigen 9 ; see Liu et al for a recent comparative review 10 . In WGS analysis, investigators may filter a subset of SNVs by annotations 2; 11 , or use a single source of functional scores as weights in association tests to boost the statistical power [12][13][14] ; however, each functional annotation can only provide a certain aspect of the biological functions, e.g., sequence conservation across species or biochemical activity of non-coding regions in a tissue. Our a priori knowledge to select the informative annotation(s) regarding a phenotype and genomic regions of interest is limited, and incorporating noninformative annotations will introduce noise and lose power.
To address this analytical challenge, we propose a family of versatile and powerful tests called "FunSPU" that allow for incorporating multiple functional annotations simultaneously in the adaptive sum of powered score (aSPU) test framework 15 . The fundamental idea of aSPU is to construct a general class of association tests, each of which is the most powerful under varying, yet unknown, local genetic architecture, then data-adaptively select the most significant test. Since each functional annotation system contains limited biological knowledge, multiple sources of functional annotations may provide complementary information. Therefore, a test that integrates multiple functional annotations simultaneously is potentially powerful. The proposed test is adaptive at both the annotation and variant levels and thus maintains high power even in the presence of noninformative annotations and a large number of neutral RVs. We also propose minimum p-value (minP) and Fisher's meta-analysis-like approaches to combine the p-values with respect to multiple annotations. Moreover, to further increase the statistical power, we propose to incorporate a trait-specific global weight for each annotation based on partitioning the heritability.
Using extensive simulations and application to the UK10K WGS data 2 , we compared our proposed FunSPU tests with the corresponding annotation-ignorant aSPU test as well as some existing RV association tests, such as the T5 burden test and SKAT 16 . We also compared our method with a recently published multiple functional annotation-based association test called functional score test (FST) 17

Notations
For the purpose of exposition, we introduce our proposed tests in the linear model framework with a quantitative trait and no covariates, though the methods can be similarly extended to binary or survival traits, and adjusted for covariates. Suppose that for subject i = 1, …, n, Y = (Y1, … , Yn ) is the vector of a quantitative trait, and Xi = (Xi1, … , Xik )' is the vector of the genotype scores of k RVs, for example, from a gene or some genomic region. Here, we use additive coding for each RV; that is, Xij is the count of the minor allele at RV j for subject i. For simplicity, we ignore other covariates in our model. We consider a linear regression model: We test the null hypothesis H0: β = (β1, … , βk )' = 0, that is, there is no association between any of the RVs and the trait under H0. Our proposed tests are based on the score vector U = (U1, … , Uk )' for β and its covariance matrix V, where ̅ and ̅ are the sample means of the Yi's and Xi's, respectively.

Review of the data-adaptive aSPU test
Pan et al. 15

New test: FunSPU -a data-adaptive test incorporating multiple annotations
Our proposed test is in the data-adaptive aSPU test framework. Importantly, the proposed test is adaptive at both the annotation and SNV levels. Suppose that we have the score vector U = The intuition to use γ as the powers of the weighted SPU is similar to that for γ. In general, a smaller γ , e.g., γ = 1, is more effective when there are more informative annotations, each of which is roughly equally discriminative regarding the deleteriousness of the RVs for the trait of interest. In contrast, a larger γ is preferred if there is only one or fewer informative annotations that can well distinguish causal variants from neutral ones for the trait.
As γ → ∞, only the most significant weighted SPU is considered. for the rest of the study. We retained γ = ∞ as an approximation to the minP test and ignored some higher values of γ since the results tend to be similar to γ = 6. Specifically, we first permute the original set of trait Y to obtain a new set of Then, we calculate the null score vector U (b) and the corresponding test statistic In the FunSPU test above, we ignored the possibly different variances of the score function component , for example, due to varying MAF of the RVs. On the other hand, previous research has shown that it may be beneficial to account for the heterogeneity of variances in the SPU framework 22 . Therefore, we further propose an inverse-variance weighted version of FunSPU: where Vjj is the jth diagonal element of = Cov( | 0 ) as given before.

Alternative approaches to incorporating multiple functional annotations: aSPU_minP and aSPU_Fisher
We considered alternative approaches to incorporate multiple functional annotations into the aSPU test. In contrast to the two-level FunSPU approach, we can obtain modified aSPU tests via the score vector U weighted by each functional annotation, i.e., Another common method for combining p-values is Fisher's meta-analysis approach, i.e., . If the m p-values were independent, _ ℎ would follow a chi-squared distribution with 2m degrees of freedom. However, our ( ) tests are correlated via the score vector U. Hence, we also use resampling approaches to calculate the final p-value. We can similarly apply the inverse-variance weighted method to aSPU_minP and aSPU_Fisher tests, respectively denoted as aSPUw_minP and aSPUs_Fisher.

wtFunSPU: extension of FunSPU to allow for global weighting of multiple annotations
In our proposed FunSPU test, we treated all m functional annotations equally a priori and completely relied on the data to adaptively combine multiple annotations in each test unit, for Since we assume no a priori knowledge regarding the informativeness of a functional annotation for a given trait, we propose to estimate based on some global correlation measure between the annotation weights, genotypes and phenotype. A promising approach is based on partitioning the heritability ℎ 2 by functional annotations 24 : a functional annotation is more informative for the trait of interest if SNVs with higher functional scores contribute to more heritability on average. Specifically, given an annotation, we first partition the genome-wide RVs based on Q discrete functional categories or percentiles of continuous functional scores; we then estimate the heritability ℎ 2 for all SNVs in functional category q = 1,…, Q, using the GCTA tool 25 . We next compute the average per-SNV heritability ℎ 2 #SNV ( ) ⁄ for each annotation category q and regress ℎ 2 #SNV ⁄ on q to estimate the slope: where is used as the global weight for the corresponding functional annotation in the wtFunSPU test. Prior to this calculation, we transform the functional annotation to positive integers q = 1,…, Q such that larger q corresponds to a more likely functional category. If a functional category has a very small number of SNVs or ℎ 2 close to zero, this category is combined with a nearby category; see Supplemental Figures S1 to S4 and Table S1 for details.

Simulation setups
We conducted extensive simulations to evaluate and compare the performance of our proposed functional annotation-based tests with existing association tests for RVs. To make the simulation study representative of real RV data, we randomly selected 200 RVs from chr16:56.8M~57.1M of the UK10K TWINSUK genotype data of 1,718 unrelated individuals.
MAFs of the selected RVs were no larger than 1%.
To evaluate power, we generated the simulated phenotypes as follows. First, we simulated 3 sets of informative annotations ( 1 , 2 , 3 ) and 3 sets of random annotations Throughout the simulations, we fixed the significance level at α = 0.05. The empirical power was calculated based on 1,000 replications for each scenario, and the empirical type I error rate was calculated based on 5,000 replications. For permutation-based tests, 1,000 resamplings were conducted for each replication. Table 1, all the tests under comparison could control the type I error rate satisfactorily around 0.05. Regarding power, we first considered scenario A (Figure 1), which was an advantageous scenario for our proposed tests since all three informative annotations together with three random annotations and one dummy annotation were used in the tests. The dummy annotation (constant 1) was supposed to retain the unweighted SPU in the adaptive tests, as in aSPU. Although the simulated annotations for causal and neutral RVs had modest differences, i.e., from U(0.4, 1) and U(0,0.6), respectively, the tests incorporating functional annotations, such as FunSPU, wtFunSPU and aSPU_minP, always had higher power than tests that ignored functional annotations, such as aSPU, SKAT and T1. The FunSPU test appeared to be less powerful than aSPU_minP, suggesting a lack of efficiency in the former's complete dataadaptive strategy to combine multiple annotations. On the other hand, wtFunSPU and wtFunSPUw outperformed aSPU_minP, supporting the effectiveness of the global weighting scheme. We also observed that the inverse-variance weighted tests always outperformed the original tests, e.g., wtFunSPUw versus wtFunSPU, and this advantage became more obvious with a higher proportion of neutral RVs. Lastly, the power of the aSPU_Fisher test was similar to that of the aSPU_minP test (results not shown).

As shown in
Next, we considered a weaker scenario for our proposed tests. In scenario B (Figure 2), we used only one informative annotation, but all three random annotations and one dummy annotation in the tests. In this case, we had a higher proportion of "noisy" annotations in our tests. We observed that the FunSPU test was marginally more powerful than aSPU, SKAT and T1, but was less powerful than the aSPU_minP test by a large margin. In fact, scenario B was an advantageous scenario for the latter test, which only considered the most informative annotation.
Finally, the globally weighted wtFunSPU and wtFunSPUw, especially the latter, were more powerful than the aSPU_minP test, again suggesting the benefit of globally down-weighting noninformative annotations.

Application to the UK10K WGS data
To further evaluate the performance of our proposed tests on real data, we applied library, and GenoSkyline (blood) annotation was generated from the region-based GenoSkyline library 8 . We re-scaled all annotations to numerical weights within the interval (0, 1), with larger weights corresponding to a greater likelihood of being functional (Supplemental Figure S5).   To confirm our findings in the TWINSUK cohort, we performed replication analysis of the genome-wide significant sliding windows in the ALSPAC cohort. As shown in Table 2, four sliding windows were replicated for the corresponding phenotypes and association tests with a replication p-value < 0.05/24 = 2e-3 based on the Bonferroni correction for 24 sliding windows: three by functional annotation-based wtFunSPU, FunSPU, aSPU_minP and aSPUw_minP tests and one by the aSPU test. In contrast, none of the 6 sliding windows identified by the FST test in the discovery cohort was replicated; neither did SKAT nor T5 replicate any sliding window.
Three of the four replicated sliding windows were close to each other on chromosome 19 around TOMM40 and APOC4-APOC2 genes. These loci have been previously identified and replicated to be associated with LDL by large-scale meta-analysis of GWAS common variants [28][29][30] ; however, this was the first time they were identified to harbor LDL-associated RVs with fewer than a couple of thousand samples, suggesting that the power of the FunSPU test was boosted by incorporating external biological knowledge.
We also looked into the effects of multiple annotations on the FunSPU tests. Although some high scores were observed around the TOMM40 and APOC4-APOC2 gene regions for Funseq2, Funseq, RegulomeDB and GenoSkyline (Figure 3), they did not appear to be obviously different from those scores outside these two loci. Figure 4 shows the association signals of  Table S1 (B)), which further boosted the p-value of the wtFunSPU test to the genome-wide significance level. As for TOMM40, Funseq2, CADD and GERP++ (Supplemental Figure S11-E,F,D) positively contributed to the genome-wide significance of FunSPU and aSPU_minP (Supplemental Figure S11-A and Table 2); whereas wtFunSPU missed this locus due to its low global weighting of these three annotations (Supplemental Table S1 (B)). This suggests that wtFunSPU, FunSPU and aSPU_minP may complement each other and may be used together in association analysis of WGS data.

Discussion
We have proposed a versatile and adaptive association test, FunSPU, to exploit multiple sources of biological knowledge in the analysis of WGS data. It is adaptive at both the annotation and variant levels, and thus maintains high statistical power, even in the presence of noninformative annotations and a larger number of neutral variants. We have further proposed a globally weighted wtFunSPU test to more effectively down-weight less informative functional annotations in a trait-specific manner. Using the UK10K WGS data, we demonstrated that our proposed FunSPU test and its extensions, including the wtFunSPU and aSPU_minP tests, are more powerful tools to identify genome-wide significant loci than existing RV association tests and DeepSEA 33 .
To further de-noise noninformative annotations, we proposed a novel trait-specific measure based on partitioning the heritability and used it as a global weight for each annotation in the wtFunSPU test. Interestingly, our proposal is along the line of estimating group-specific weights in the context of weighted hypothesis testing 34; 35 , though the latter is based on the mixture model, in contrast to the mixed model-based heritability partition here. Our proposed measure also has the potential to be used to compare the discriminative performance of wholegenome annotations for a complex trait of interest, for which known deleterious and neutral variants are rarely available (see Supplemental Table S1). This warrants further investigation.