Abstract
GWAS has identified thousands of loci associated with disease, yet the causal genes within these loci remain largely unknown. Identifying these causal genes would enable deeper understanding of the disease and assist in genetics-based drug development. Exome-wide association studies (ExWAS) are more expensive but can pinpoint causal genes offering high-yield drug targets, yet suffer from a high false-negative rate. Several algorithms have been developed to prioritize genes at GWAS loci, such as the Effector Index (Ei), Locus-2-Gene (L2G), Polygenic Prioritization score (PoPs), and Activity-by-Contact score (ABC) and it is not known if these algorithms can predict ExWAS findings from GWAS data. However, if this were the case, thousands of associated GWAS loci could potentially be resolved to causal genes. Here, we quantified the performance of these algorithms by evaluating their ability to identify ExWAS significant genes for nine traits. We found that Ei, L2G, and PoPs can identify ExWAS significant genes with high areas under the precision recall curve (Ei: 0.52, L2G: 0.37, PoPs: 0.18, ABC: 0.14). Furthermore, we found that for every unit increase in the normalized scores, there was an associated 1.3–4.6-fold increase in the odds of a gene reaching exome-wide significance (Ei: 4.6, L2G: 2.5, PoPs: 2.1, ABC: 1.3). Overall, we found that Ei, L2G, and PoPs can anticipate ExWAS findings from widely available GWAS results. These techniques are therefore promising when well-powered ExWAS data are not readily available and can be used to anticipate ExWAS findings, allowing for prioritization of genes at GWAS loci.
Similar content being viewed by others
Data availability
Source code can be accessed through Github upon publication.
References
Auer PL, Lettre G (2015) Rare variant association studies: considerations, challenges and opportunities. Genome Med 7:16. https://doi.org/10.1186/s13073-015-0138-2
Backman JD, Li AH, Marcketta A et al (2021) Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. https://doi.org/10.1038/s41586-021-04103-z
Boyle EA, Li YI, Pritchard JK (2017) An expanded view of complex traits: from polygenic to omnigenic. Cell 169:1177–1186. https://doi.org/10.1016/j.cell.2017.05.038
Bulik-Sullivan BK, Loh P-R, Finucane HK et al (2015) LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47:291–295. https://doi.org/10.1038/ng.3211
Butcher SP (2003) Target discovery and validation in the post-genomic era. Neurochem Res 28:367–371. https://doi.org/10.1023/A:1022349805831
Carvalho-Silva D, Pierleoni A, Pignatelli M et al (2019) Open targets platform: new developments and updates two years on. Nucleic Acids Res 47:D1056–D1065. https://doi.org/10.1093/nar/gky1133
Curtis D (2019) A weighted burden test using logistic regression for integrated analysis of sequence variants, copy number variants and polygenic risk score. Eur J Hum Genet 27:114–124. https://doi.org/10.1038/s41431-018-0272-6
de Leeuw CA, Mooij JM, Heskes T, Posthuma D (2015) MAGMA: generalized gene-set analysis of GWAS data. PLOS Comput Biol 11:4219. https://doi.org/10.1371/journal.pcbi.1004219
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64
Edwards SL, Beesley J, French JD, Dunning AM (2013) Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet 93:779–797. https://doi.org/10.1016/j.ajhg.2013.10.012
Forgetta V, Jiang L, Vulpescu NA et al (2022) An effector index to predict target genes at GWAS loci. Hum Genet. https://doi.org/10.1007/s00439-022-02434-z
Fulco CP, Nasser J, Jones TR et al (2019) Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat Genet 51:1664–1669. https://doi.org/10.1038/s41588-019-0538-0
Gazal S, Weissbrod O, Hormozdiari F et al (2022) Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat Genet. https://doi.org/10.1038/s41588-022-01087-y
Ghoussaini M, Mountjoy E, Carmona M et al (2021) Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res 49:D1311–D1320. https://doi.org/10.1093/nar/gkaa840
Hrdlickova B, de Almeida RC, Borek Z, Withoff S (2014) Genetic variation in the non-coding genome: involvement of micro-RNAs and long non-coding RNAs in disease. Biochim Biophys Acta BBA 1842:1910–1922. https://doi.org/10.1016/j.bbadis.2014.03.011
Karczewski KJ, Solomonson M, Chao KR et al (2022) Systematic single-variant and gene-based association testing of thousands of phenotypes in 426,370 UK Biobank exomes. Medrxiv. https://doi.org/10.1101/2021.06.19.21259117
Kemp JP, Morris JA, Medina-Gomez C et al (2017) Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis. Nat Genet 49:1468–1475. https://doi.org/10.1038/ng.3949
King EA, Davis JW, Degner JF (2019) Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLOS Genet 15:e1008489. https://doi.org/10.1371/journal.pgen.1008489
Lee S, Emond MJ, Bamshad MJ et al (2012) Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet 91:224–237. https://doi.org/10.1016/j.ajhg.2012.06.007
Lindsay MA (2003) Target discovery. Nat Rev Drug Discov 2:831–838. https://doi.org/10.1038/nrd1202
Mahajan A, Taliun D, Thurner M et al (2018) Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet 50:1505–1513. https://doi.org/10.1038/s41588-018-0241-6
Mirza AH, Kaur S, Brorsson CA, Pociot F (2014) Effects of GWAS-associated genetic variants on lncRNAs within IBD and T1D candidate loci. PLoS ONE 9:e105723. https://doi.org/10.1371/journal.pone.0105723
Morris JA, Kemp JP, Youlten SE et al (2019) An atlas of genetic influences on osteoporosis in humans and mice. Nat Genet 51:258–266. https://doi.org/10.1038/s41588-018-0302-x
Mountjoy E, Schmidt EM, Carmona M et al (2021) An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat Genet 53:1527–1533. https://doi.org/10.1038/s41588-021-00945-5
Nasser J, Bergman DT, Fulco CP et al (2021) Genome-wide enhancer maps link risk variants to disease genes. Nature 593:238–243. https://doi.org/10.1038/s41586-021-03446-x
Nelson MR, Tipney H, Painter JL et al (2015) The support of human genetic evidence for approved drug indications. Nat Genet 47:856–860. https://doi.org/10.1038/ng.3314
Nicolae DL, Gamazon E, Zhang W et al (2010) Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet 6:e1000888. https://doi.org/10.1371/journal.pgen.1000888
Ochoa D, Karim M, Ghoussaini M et al (2022) Human genetics evidence supports two-thirds of the 2021 FDA-approved drugs. Nat Rev Drug Discov. https://doi.org/10.1038/d41573-022-00120-3
Paul SM, Mytelka DS, Dunwiddie CT et al (2010) How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 9:203–214. https://doi.org/10.1038/nrd3078
Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. https://doi.org/10.1086/519795
Schriml LM, Mitraka E, Munro J et al (2019) Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. https://doi.org/10.1093/nar/gky1032
Seyhan AA (2019) Lost in translation: the valley of death across preclinical and clinical divide identification of problems and overcoming obstacles. Transl Med Commun. https://doi.org/10.1186/s41231-019-0050-7
Stranger BE, Nica AC, Forrest MS et al (2007) Population genomics of human gene expression. Nat Genet 39:1217–1224. https://doi.org/10.1038/ng2142
Wang Q, Dhindsa RS, Carss K et al (2021) Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 597:527–532. https://doi.org/10.1038/s41586-021-03855-y
Weeks EM, Ulirsch JC, Cheng NY et al (2020) Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. MedRxiv. https://doi.org/10.1101/2020.09.08.20190561
Wishart DS, Feunang YD, Guo AC et al (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. https://doi.org/10.1093/nar/gkx1037
Xu Y, Li Z (2020) CRISPR-Cas systems: overview, innovations and applications in human disease research and gene therapy. Comput Struct Biotechnol J 18:2401–2415. https://doi.org/10.1016/j.csbj.2020.08.031
Zhang F, Lupski JR (2015) Non-coding genetic variants in human disease. Hum Mol Genet 24:R102–R110. https://doi.org/10.1093/hmg/ddv259
Funding
The Richards research group is supported by the Canadian Institutes of Health Research (CIHR: 365825; 409511, 100558, 169303), the McGill Interdisciplinary Initiative in Infection and Immunity (MI4), the Lady Davis Institute of the Jewish General Hospital, the Jewish General Hospital Foundation, the Canadian Foundation for Innovation, the NIH Foundation, Cancer Research UK, Genome Québec, the Public Health Agency of Canada, McGill University, Cancer Research UK [grant umber C18281/A29019] and the Fonds de Recherche Québec Santé (FRQS). JBR is supported by a FRQS Mérite Clinical Research Scholarship. Support from Calcul Québec and Compute Canada is acknowledged. TwinsUK is funded by the Welcome Trust, Medical Research Council, European Union, the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. These funding agencies had no role in the design, implementation or interpretation of this study.
Author information
Authors and Affiliations
Contributions
Conception and design: KL and JBR. Data analyses: KL, YF, and VF. Manuscript writing: KL, YF, YC, SY, and JBR. Supervision: JBR. Interpretation of data: all authors. All authors were involved in the preparation and revision of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
JBR’s institution has received investigator-initiated grant funding from Eli Lilly, GlaxoSmithKline and Biogen for projects unrelated to this research. JBR is the CEO of 5 Prime Sciences (www.5primesciences.com), which provides research services for biotech, pharma and venture capital companies for projects unrelated to this research. VF, YF, and TL are employees of 5 Prime Sciences. Authors KYHL, YC, SY declares that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
This article does not contain any studies with human participants.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, K.Y.H., Farjoun, Y., Forgetta, V. et al. Predicting ExWAS findings from GWAS data: a shorter path to causal genes. Hum Genet 142, 749–758 (2023). https://doi.org/10.1007/s00439-023-02548-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-023-02548-y