A genomic mutational constraint map using variation in 76,156 human genomes

Chen, Siwei; Francioli, Laurent C.; Goodrich, Julia K.; Collins, Ryan L.; Kanai, Masahiro; Wang, Qingbo; Alföldi, Jessica; Watts, Nicholas A.; Vittal, Christopher; Gauthier, Laura D.; Poterba, Timothy; Wilson, Michael W.; Tarasova, Yekaterina; Phu, William; Grant, Riley; Yohannes, Mary T.; Koenig, Zan; Farjoun, Yossi; Banks, Eric; Donnelly, Stacey; Gabriel, Stacey; Gupta, Namrata; Ferriera, Steven; Tolonen, Charlotte; Novod, Sam; Bergelson, Louis; Roazen, David; Ruano-Rubio, Valentin; Covarrubias, Miguel; Llanwarne, Christopher; Petrillo, Nikelle; Wade, Gordon; Jeandet, Thibault; Munshi, Ruchi; Tibbetts, Kathleen; O’Donnell-Luria, Anne; Solomonson, Matthew; Seed, Cotton; Martin, Alicia R.; Talkowski, Michael E.; Rehm, Heidi L.; Daly, Mark J.; Tiao, Grace; Neale, Benjamin M.; MacArthur, Daniel G.; Karczewski, Konrad J.

doi:10.1038/s41586-023-06045-0

Article
Published: 06 December 2023

A genomic mutational constraint map using variation in 76,156 human genomes

Siwei Chen^1,2^na1,
Laurent C. Francioli^1,2^na1,
Julia K. Goodrich¹,
Ryan L. Collins^1,3,4,
Masahiro Kanai ORCID: orcid.org/0000-0001-5165-4408^1,2,
Qingbo Wang ORCID: orcid.org/0000-0002-9110-5830^1,5,
Jessica Alföldi ORCID: orcid.org/0000-0001-9713-6200^1,2,
Nicholas A. Watts^1,2,
Christopher Vittal^1,2,
Laura D. Gauthier⁶,
Timothy Poterba^1,2,7,
Michael W. Wilson^1,2,
Yekaterina Tarasova¹,
William Phu ORCID: orcid.org/0000-0002-5569-1000^1,8,
Riley Grant¹,
Mary T. Yohannes¹,
Zan Koenig^2,7,
Yossi Farjoun ORCID: orcid.org/0000-0002-7002-2868⁹,
Eric Banks⁶,
Stacey Donnelly¹⁰,
Stacey Gabriel¹¹,
Namrata Gupta^1,11,
Steven Ferriera¹¹,
Charlotte Tolonen⁶,
Sam Novod⁶,
Louis Bergelson⁶,
David Roazen⁶,
Valentin Ruano-Rubio⁶,
Miguel Covarrubias⁶,
Christopher Llanwarne⁶,
Nikelle Petrillo⁶,
Gordon Wade⁶,
Thibault Jeandet⁶,
Ruchi Munshi⁶,
Kathleen Tibbetts⁶,
Genome Aggregation Database Consortium,
Anne O’Donnell-Luria ORCID: orcid.org/0000-0001-6418-9592^1,3,8,
Matthew Solomonson^1,2,
Cotton Seed^2,7,
Alicia R. Martin ORCID: orcid.org/0000-0003-0241-3522^1,2,7,
Michael E. Talkowski^1,3,7,
Heidi L. Rehm ORCID: orcid.org/0000-0002-6025-0015^1,3,
Mark J. Daly ORCID: orcid.org/0000-0002-0949-8752^1,2,15,
Grace Tiao^1,2,
Benjamin M. Neale ORCID: orcid.org/0000-0003-1513-6077^1,2^na1,
Daniel G. MacArthur^1,16,17^na1 &
…
Konrad J. Karczewski ORCID: orcid.org/0000-0003-2878-4671^1,2,7

Nature volume 625, pages 92–100 (2024)Cite this article

19k Accesses
33 Citations
151 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 15 January 2024

This article has been updated

Abstract

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders^1,2,3,4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)—the largest public open-access human genome allele frequency reference dataset—and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Distribution of Gnocchi scores across the genome.**

**Fig. 2: Correlation between Gnocchi and functional non-coding annotations.**

**Fig. 3: Performance of Gnocchi and other predictive metrics in prioritizing non-coding variants.**

**Fig. 4: Contribution of non-coding constraint in evaluating CNVs.**

**Fig. 5: Correlation of constraint between non-coding regulatory elements and protein-coding genes.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

Data availability

The aggregated allele frequency dataset is available in a browser at https://gnomad.broadinstitute.org, with bulk downloads for VCF files and Hail tables, as well as all constraint statistics described in this manuscript. Additionally, we provide a subset of the dataset that includes individual-level data for the HGDP⁸⁵ and 1000 Genomes projects⁸⁶—the generation and use of this dataset is described in a companion manuscript⁷⁵. There are no restrictions on the aggregate data released. External datasets used in this study are available in the following public resources: ENCODE cCREs, https://screen-v2.wenglab.org/; super enhancers, http://www.licpathway.net/sedb/download.php; FANTOM5 enhancers, https://fantom.gsc.riken.jp/5/datafiles/reprocessed/hg38_latest/extra/enhancer/; miRNA, https://genome.ucsc.edu/cgi-bin/hgTables (All GENCODE V32 track); FANTOM5 lncRNA, https://fantom.gsc.riken.jp/cat/v1/#/genes; GWAS Catalog, https://genome.ucsc.edu/cgi-bin/hgTables (GWAS Catalog track); GWAS fine-mapping, https://www.finucanelab.org/data; CNV morbidity map of DD, https://genome.ucsc.edu/cgi-bin/hgTables (Development Delay track); ClinVar, https://genome.ucsc.edu/cgi-bin/hgTables (ClinVar Variants track); TOPMed, https://bravo.sph.umich.edu/freeze8/hg38/downloads; ClinGen, https://genome.ucsc.edu/cgi-bin/hgTables (ClinGen track); MGI, https://www.informatics.jax.org/; OMIM, https://www.omim.org/; Roadmap Epigenomics Enhancer-Gene Linking, https://ernstlab.biolchem.ucla.edu/roadmaplinking/; GTEx https://gtexportal.org/home/datasets.

Code availability

All code to perform quality control of the resource is publicly available at https://github.com/broadinstitute/gnomad_qc, and many of the functions are documented in a Python package (gnomad) at https://broadinstitute.github.io/gnomad_methods/index.html. The code to compute the constraint statistics is available at https://github.com/atgu/gnomad_nc_constraint.

Change history

15 January 2024
A Correction to this paper has been published: https://doi.org/10.1038/s41586-024-07050-7

References

Short, P. J. et al. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature 555, 611–616 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584.e523 (2020).
Article CAS PubMed PubMed Central Google Scholar
Singh, T. et al. The contribution of rare variants to risk of schizophrenia in individuals with and without intellectual disability. Nat. Genet. 49, 1167–1173 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ganna, A. et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. Am. J. Hum. Genet. 102, 1204–1211 (2018).
Article CAS PubMed PubMed Central Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Article CAS PubMed PubMed Central Google Scholar
Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Lanyi, J. K. Photochromism of halorhodopsin. cis/trans isomerization of the retinal around the 13–14 double bond. J. Biol. Chem. 261, 14025–14030 (1986).
Article CAS PubMed Google Scholar
Mathelier, A., Shi, W. & Wasserman, W. W. Identification of altered cis-regulatory elements in human disease. Trends Genet. 31, 67–76 (2015).
Article CAS PubMed Google Scholar
Spielmann, M. & Mundlos, S. Looking beyond the genes: the role of non-coding variants in human disease. Hum. Mol. Genet. 25, R157–R165 (2016).
Article CAS PubMed Google Scholar
Zhang, F. & Lupski, J. R. Non-coding genetic variants in human disease. Hum. Mol. Genet. 24, R102–R110 (2015).
Article CAS PubMed PubMed Central Google Scholar
Seplyarskiy, V. B. & Sunyaev, S. The origin of human mutation in light of genomic data. Nat. Rev. Genet. 22, 672–686 (2021).
Article CAS PubMed Google Scholar
Seplyarskiy, V. B. et al. Population sequencing data reveal a compendium of mutational processes in the human germ line. Science 373, 1030–1035 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Gussow, A. B. et al. Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics. PLoS ONE 12, e0181604 (2017).
Article PubMed PubMed Central Google Scholar
di Iulio, J. et al. The human noncoding genome defined by genetic diversity. Nat. Genet. 50, 333–337 (2018).
Article PubMed Google Scholar
Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Ritchie, G. et al. Functional annotation of noncoding sequence variants. Nat. Methods 11, 294–296 (2014).
Vitsios, D., Dhindsa, R. S., Middleton, L., Gussow, A. B. & Petrovski, S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat. Commun. 12, 1504 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS PubMed PubMed Central Google Scholar
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Halldorsson, B. V. et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043 (2019).
Article CAS PubMed Google Scholar
An, J. Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018).
Article ADS PubMed PubMed Central Google Scholar
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
The ENCODE Project Consortium. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Article ADS CAS Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Jiang, Y. et al. SEdb: a comprehensive human super-enhancer database. Nucleic Acids Res. 47, D235–D243 (2019).
Article CAS PubMed Google Scholar
Pott, S. & Lieb, J. D. What are super-enhancers? Nat. Genet. 47, 8–12 (2015).
Article CAS PubMed Google Scholar
Bartel, D. P. Metazoan microRNAs. Cell 173, 20–51 (2018).
Article CAS PubMed PubMed Central Google Scholar
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Article CAS PubMed Google Scholar
Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).
Jung, R. G. et al. Association between plasminogen activator inhibitor-1 and cardiovascular events: a systematic review and meta-analysis. Thromb. J. 16, 12 (2018).
Article PubMed PubMed Central Google Scholar
Song, C., Burgess, S., Eicher, J. D., O’Donnell, C. J. & Johnson, A. D. Causal effect of plasminogen activator inhibitor type 1 on coronary heart disease. J. Am. Heart Assoc. 6, e004918 (2017).
Article PubMed PubMed Central Google Scholar
Schaefer, A. S. et al. Genetic evidence for PLASMINOGEN as a shared genetic risk factor of coronary artery disease and periodontitis. Circ. Cardiovasc. Genet. 8, 159–167 (2015).
Article CAS PubMed Google Scholar
Li, Y. Y. Plasminogen activator inhibitor-1 4G/5G gene polymorphism and coronary artery disease in the Chinese Han population: a meta-analysis. PLoS ONE 7, e33511 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Drinane, M. C., Sherman, J. A., Hall, A. E., Simons, M. & Mulligan-Kehoe, M. J. Plasminogen and plasmin activity in patients with coronary artery disease. J. Thromb. Haemost. 4, 1288–1295 (2006).
Article CAS PubMed Google Scholar
Lowe, G. D. et al. Tissue plasminogen activator antigen and coronary heart disease. Prospective study and meta-analysis. Eur. Heart J. 25, 252–259 (2004).
Article CAS PubMed Google Scholar
Wang, Q. S. et al. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat. Commun. 12, 3394 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Article CAS PubMed Google Scholar
Stenson, P. D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003).
Article CAS PubMed Google Scholar
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Article PubMed PubMed Central Google Scholar
Greenway, S. C. et al. De novo copy number variants identify new genes and loci in isolated sporadic tetralogy of Fallot. Nat. Genet. 41, 931–935 (2009).
Article CAS PubMed PubMed Central Google Scholar
Mefford, H. C. et al. Recurrent reciprocal genomic rearrangements of 17q12 are associated with renal disease, diabetes, and epilepsy. Am. J. Hum. Genet. 81, 1057–1069 (2007).
Article CAS PubMed PubMed Central Google Scholar
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Stefansson, H. et al. Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Walsh, T. et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008).
Article ADS CAS PubMed Google Scholar
Wright, C. F. et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015).
Article PubMed PubMed Central Google Scholar
Spielmann, M., Lupianez, D. G. & Mundlos, S. Structural variation in the 3D genome. Nat. Rev. Genet. 19, 453–467 (2018).
Article CAS PubMed Google Scholar
Spielmann, M. & Mundlos, S. Structural variations, the regulatory landscape of the genome and their alteration in human disease. Bioessays 35, 533–543 (2013).
Article CAS PubMed Google Scholar
Coe, B. P. et al. Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nat. Genet. 46, 1063–1071 (2014).
Article CAS PubMed PubMed Central Google Scholar
Cooper, G. M. et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 43, 838–846 (2011).
Article CAS PubMed PubMed Central Google Scholar
Klopocki, E. et al. Copy-number variations involving the IHH locus are associated with syndactyly and craniosynostosis. Am. J. Hum. Genet. 88, 70–75 (2011).
Article CAS PubMed PubMed Central Google Scholar
Barroso, E. et al. Identification of the fourth duplication of upstream IHH regulatory elements, in a family with craniosynostosis Philadelphia type, helps to define the phenotypic characterization of these regulatory elements. Am. J. Med. Genet. A 167A, 902–906 (2015).
Article PubMed Google Scholar
Will, A. J. et al. Composition and dosage of a multipartite enhancer cluster control developmental expression of Ihh (Indian hedgehog). Nat. Genet. 49, 1539–1545 (2017).
Article CAS PubMed PubMed Central Google Scholar
Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article PubMed Central Google Scholar
Rehm, H. L. et al. ClinGen—the Clinical Genome Resource. N. Engl. J. Med. 372, 2235–2242 (2015).
Article CAS PubMed PubMed Central Google Scholar
Blake, J. A. et al. The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res. 39, D842–D848 (2011).
Article CAS PubMed Google Scholar
McKusick, V. A. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).
Article CAS PubMed PubMed Central Google Scholar
Consortium, G. T. The Genotype–Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article Google Scholar
Xu, H. et al. Elevated ASCL2 expression in breast cancer is associated with the poor prognosis of patients. Am. J. Cancer Res. 7, 955–961 (2017).
CAS PubMed PubMed Central Google Scholar
Jubb, A. M. et al. Achaete-scute like 2 (ascl2) is a target of Wnt signalling and is upregulated in intestinal neoplasia. Oncogene 25, 3445–3457 (2006).
Article CAS PubMed Google Scholar
Tian, Y. et al. MicroRNA-200 (miR-200) cluster regulation by achaete scute-like 2 (Ascl2): impact on the epithelial-mesenchymal transition in colon cancer cells. J. Biol. Chem. 289, 36101–36115 (2014).
Article CAS PubMed PubMed Central Google Scholar
Guo, M. H. et al. Inferring compound heterozygosity from large-scale exome sequencing data. Nat. Genet. https://doi.org/10.1038/s41588-023-01608-3 (2023).
Zhu, P. et al. Single-cell DNA methylome sequencing of human preimplantation embryos. Nat. Genet. 50, 12–19 (2018).
Article CAS PubMed Google Scholar
Tang, W. W. et al. A unique gene regulatory network resets the human germline epigenome for development. Cell 161, 1453–1467 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ross, D. A., Lim, J., Lin, R.-S. & Yang, M.-H. Incremental learning for robust visual tracking. Int. J. Comput. Vision 77, 125–141 (2008).
Article Google Scholar
Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).
Article CAS PubMed PubMed Central Google Scholar
Davis, C. A. et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 46, D794–D801 (2018).
Article CAS PubMed Google Scholar
Goldmann, J. M. et al. Germline de novo mutation clusters arise during oocyte aging in genomic regions with high double-strand-break incidence. Nat. Genet. 50, 487–492 (2018).
Article CAS PubMed Google Scholar
Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014).
Article PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010).
Article CAS PubMed PubMed Central Google Scholar
Koenig, Z. et al. A harmonized public resource of deeply sequenced diverse human genomes. Preprint at bioRxiv https://doi.org/10.1101/2023.01.23.525248 (2023).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hon, C. C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine-mapping. J. R. Stat. Soc. B 82, 1273–1300 (2020).
Article MathSciNet Google Scholar
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Budescu, D. V. Dominance analysis: a new approach to the problem of relative importance of predictors in multiple regression. Psych. Bull. 114, 542 (1993).
Article Google Scholar
Azen, R. & Budescu, D. V. The dominance analysis approach for comparing predictors in multiple regression. Psych. Methods 8, 129 (2003).
Article Google Scholar
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, Y., Sarkar, A., Kheradpour, P., Ernst, J. & Kellis, M. Evidence of reduced recombination rate in human regulatory domains. Genome Biol. 18, 193 (2017).
Article PubMed PubMed Central Google Scholar
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 1–8 (2011).
Article Google Scholar
Bergstrom, A. et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 367, eaay5012 (2020).
Article CAS PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar

Download references

Acknowledgements

The authors thank the individuals whose data is in gnomAD for their contributions to research. Development of the Genome Aggregation Database was supported by NIDDK U54DK105566 and the NHGRI of the National Institutes of Health under award number U24HG011450. Additional funding for Genome Aggregation Database Consortium members is listed in the Supplementary Information. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

These authors contributed equally: Siwei Chen, Laurent C. Francioli, Benjamin M. Neale, Daniel G. MacArthur

Authors and Affiliations

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Siwei Chen, Laurent C. Francioli, Julia K. Goodrich, Ryan L. Collins, Masahiro Kanai, Qingbo Wang, Jessica Alföldi, Nicholas A. Watts, Christopher Vittal, Timothy Poterba, Michael W. Wilson, Yekaterina Tarasova, William Phu, Riley Grant, Mary T. Yohannes, Namrata Gupta, Irina M. Armean, Samantha M. Baxter, Sarah E. Calvo, Katherine R. Chao, Sinéad Chapman, Beryl B. Cummings, Phil Darnowsky, Patrick T. Ellinor, Eleina England, Tõnu Esko, Emily Evangelista, Jack Fu, Sanna Gudmundsson, Daniel King, Kristen M. Laricchia, Emily Lipscomb, Wenhan Lu, Steven A. Lubitz, James B. Meigs, Eric V. Minikel, Vamsi K. Mootha, Dan Rhodes, Andrea Saltzman, Kaitlin E. Samocha, Jeremiah Scharf, Molly Schleicher, Eleanor G. Seaby, Moriel Singer-Berk, Rachel G. Son, Christine Stevens, Lily Wang, Arcturus Wang, James S. Ware, Nicola Whiffin, Anne O’Donnell-Luria, Matthew Solomonson, Alicia R. Martin, Michael E. Talkowski, Heidi L. Rehm, Mark J. Daly, Grace Tiao, Benjamin M. Neale, Daniel G. MacArthur & Konrad J. Karczewski
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
Siwei Chen, Laurent C. Francioli, Masahiro Kanai, Jessica Alföldi, Nicholas A. Watts, Christopher Vittal, Timothy Poterba, Michael W. Wilson, Zan Koenig, Irina M. Armean, Sam Bryant, Katherine R. Chao, Sinéad Chapman, Sanna Gudmundsson, Kristen M. Laricchia, Aarno Palotie, Christine Stevens, Arcturus Wang, Matthew Solomonson, Cotton Seed, Alicia R. Martin, Mark J. Daly, Grace Tiao, Benjamin M. Neale & Konrad J. Karczewski
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Ryan L. Collins, Harrison Brand, Sarah E. Calvo, Jack Fu, Sekar Kathiresan, Kaitlin E. Samocha, Alba Sanchis-Juan, Jeremiah Scharf, Anne O’Donnell-Luria, Michael E. Talkowski & Heidi L. Rehm
Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
Ryan L. Collins & Beryl B. Cummings
Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
Qingbo Wang & Yukinori Okada
Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Laura D. Gauthier, Eric Banks, Charlotte Tolonen, Sam Novod, Louis Bergelson, David Roazen, Valentin Ruano-Rubio, Miguel Covarrubias, Christopher Llanwarne, Nikelle Petrillo, Gordon Wade, Thibault Jeandet, Ruchi Munshi, Kathleen Tibbetts, David Benjamin, Ted Brookings, Kristian Cibulskis, James Emery, Kiran Garimella, Jeff Gentry, Andrea Haessly, Diane Kaplan, Trevyn Langsford, Nareh Sahakian, Megan Shand, Ted Sharpe, Jonathan T. Smith, Jose Soto & Ben Weisburd
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Timothy Poterba, Zan Koenig, Sinéad Chapman, Steven McCarroll, Aarno Palotie, Jeremiah Scharf, Christine Stevens, Arcturus Wang, Cotton Seed, Alicia R. Martin, Michael E. Talkowski & Konrad J. Karczewski
Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
William Phu, Sanna Gudmundsson & Anne O’Donnell-Luria
Richards Lab, Lady Davis Institute, Montreal, Quebec, Canada
Yossi Farjoun
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Stacey Donnelly & Samuli Ripatti
Broad Genomics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Stacey Gabriel, Namrata Gupta & Steven Ferriera
University of Miami Miller School of Medicine, Gastroenterology, Miami, FL, USA
Maria Abreu
Unidad de Investigacion de Enfermedades Metabolicas, Instituto Nacional de Ciencias Medicas y Nutricion, Mexico City, Mexico
Carlos A. Aguilar Salinas
Peninsula College of Medicine and Dentistry, Exeter, UK
Tariq Ahmad
Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
Mark J. Daly
Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, New South Wales, Australia
Daniel G. MacArthur
Centre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
Daniel G. MacArthur
Division of Preventive Medicine, Brigham and Women’s Hospital, Boston, MA, USA
Christine M. Albert & Daniel I. Chasman
Division of Cardiovascular Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Christine M. Albert
Department of Cardiology, University Hospital, Parma, Italy
Diego Ardissino
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
Elizabeth G. Atkinson
Stanley Center for Psychiatric Research, The Broad Intitute of MIT and Harvard, Cambridge, MA, USA
Elizabeth G. Atkinson & Sam Bryant
Department of Biology, Faculty of Natural Sciences, University of Haifa, Haifa, Israel
Gil Atzmon
Departments of Medicine and Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
Gil Atzmon
Department of Quantitative Health Sciences, Lerner Research Institute Cleveland Clinic, Cleveland, OH, USA
John Barnard
Gastroenterology Department, Saint Antoine Hospital, Sorbonne Université, APHP, Paris, France
Laurent Beaugerie
Framingham Heart Study, NHLBI and Boston University, Framingham, MA, USA
Emelia J. Benjamin
Department of Medicine, Boston University Chobanian and Avedisian School of Medicine, Boston, MA, USA
Emelia J. Benjamin
Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
Emelia J. Benjamin
Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
Michael Boehnke
National Human Genome Research Institute, National Institutes of Health Bethesda, Bethesda, MD, USA
Lori L. Bonnycastle
The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Erwin P. Bottinger, Judy Cho & Ruth J. F. Loos
Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
Donald W. Bowden & Nicholette D. Palmer
Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, NC, USA
Donald W. Bowden
Center for Diabetes Research, Wake Forest School of Medicine, Winston-Salem, NC, USA
Donald W. Bowden
Department of Cardiovascular Sciences and NIHR Leicester Biomedical Research Centre, University of Leicester, Leicester, UK
Matthew J. Bown
NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester, UK
Matthew J. Bown & Nilesh J. Samani
Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
Harrison Brand & Jack Fu
Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
Steven Brant
Department of Genetics and the Human Genetics Institute of New Jersey, School of Arts and Sciences, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
Steven Brant
Meyerhoff Inflammatory Bowel Disease Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Steven Brant
Fulcrum Genomics, Boulder, CO, USA
Ted Brookings
Harvard School of Public Health, Boston, MA, USA
Hannia Campos
Central American Population Center, San Pedro, Costa Rica
Hannia Campos
Department of Epidemiology and Biostatistics, Imperial College London, London, UK
John C. Chambers
Department of Cardiology, Ealing Hospital, NHS Trust, Southall, UK
John C. Chambers & Jaspal Kooner
Imperial College, Healthcare NHS Trust Imperial College London, London, UK
John C. Chambers
Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, China
Juliana C. Chan & Ronald C. W. Ma
Department of Medicine, Harvard Medical School, Boston, MA, USA
Daniel I. Chasman, Jose Florez, Gad Getz, Sekar Kathiresan, James B. Meigs & Dost Ongur
Northwestern University, Evanston, IL, USA
Rex Chisholm
University of Cambridge, Cambridge, UK
Rajiv Chowdhury & John Danesh
Department of Cardiovascular Medicine, Cleveland Clinic, Cleveland, OH, USA
Mina K. Chung
Department of Pediatrics, Columbia University Irving Medical Center, New York, NY, USA
Wendy K. Chung
Herbert Irving Comprehensive Cancer Center, Columbia University Medical Center, New York, NY, USA
Wendy K. Chung
Department of Medicine, Columbia University Medical Center, New York, NY, USA
Wendy K. Chung
McLean Hospital, Belmont, MA, USA
Bruce Cohen & Dost Ongur
Department of Psychiatry, Harvard Medical School, Boston, MA, USA
Bruce Cohen
Genomics Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Kristen M. Connolly
Department of Medicine, University of Mississippi Medical Center, Jackson, MI, USA
Adolfo Correa
Department of Epidemiology Colorado School of Public Health Aurora, Aurora, CO, USA
Dana Dabelea
Department of Medicine and Pharmacology, University of Illinois at Chicago, Chicago, IL, USA
Dawood Darbar
Vanderbilt University Medical Center, Nashville, TN, USA
Joshua Denny
Department of Life Sciences, College of Arts and Scienecs, Texas A&M University—San Antonio, San Antonio, TX, USA
Ravindranath Duggirala
Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
Josée Dupuis
Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
Josée Dupuis
Cardiac Arrhythmia Service and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
Patrick T. Ellinor & Steven A. Lubitz
Cardiovascular Epidemiology and Genetics, Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
Roberto Elosua
Centro de Investigación Biomédica en Red Enfermedades Cardiovasculares (CIBERCV), Madrid, Spain
Roberto Elosua
Departament of Medicine, Faculty of Medicine, University of Vic—Central University of Catalonia, Vic, Spain
Roberto Elosua
Clalit Genomics Center, Ramat-Gan, Israel
Eleina England
Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany
Jeanette Erdmann
German Research Centre for Cardiovascular Research Hamburg/Lübeck/Kiel, Lübeck, Germany
Jeanette Erdmann
University Heart Center Lübeck, Lübeck, Germany
Jeanette Erdmann
Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
Tõnu Esko & Andres Metspalu
Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia
Diane Fatkin
Faculty of Medicine, UNSW Sydney, Kensington, New South Wales, Australia
Diane Fatkin
Cardiology Department, St Vincent’s Hospital, Darlinghurst, New South Wales, Australia
Diane Fatkin
Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Jose Florez
Programs in Metabolism and Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Jose Florez
Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
Andre Franke
University Hospital Schleswig-Holstein, Kiel, Germany
Andre Franke
Clinic of Gastroenterology, Helsinki University and Helsinki University Hospital, Helsinki, Finland
Martti Färkkilä
Helsinki University and Helsinki University Hospital, Helsinki, Finland
Martti Färkkilä
Abdominal Center, Helsinki, Finland
Martti Färkkilä
Bioinformatics Program, MGH Cancer Center and Department of Pathology, Massachusets General Hospital, Boston, MA, USA
Gad Getz
Cancer Genome Computational Analysis, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Gad Getz
Department of Psychiatry and Behavioral Sciences, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
David C. Glahn
Harvard Medical School Teaching Hospital, Boston, MA, USA
David C. Glahn
Department of Endocrinology and Metabolism, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Benjamin Glaser
Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
Stephen J. Glatt
Institute for Genomic Medicine, Columbia University Medical Center Hammer Health Sciences, New York, NY, USA
David Goldstein
Department of Genetics and Development Columbia University Medical Center, Hammer Health Sciences, New York, NY, USA
David Goldstein
Centro de Investigacion en Salud Poblacional, Instituto Nacional de Salud Publica, Cuernavaca, Mexico
Clicerio Gonzalez
Lund University Sweden, Lund, Sweden
Leif Groop
Institute for Molecular Medicine Finland, (FIMM) HiLIFE University of Helsinki, Helsinki, Finland
Leif Groop, Jaakko Kaprio, Aarno Palotie, Samuli Ripatti, Tiinamaija Tuomi & Maija Wessman
Center for Genetic Epidemiology, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
Christopher Haiman
Washington School of Medicine, St Louis, MI, USA
Ira Hall
Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, USA
Craig L. Hanis
Department of Neurology, Columbia University, New York, NY, USA
Matthew Harms
Institute of Genomic Medicine, Columbia University, New York, NY, USA
Matthew Harms
Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
Mikko Hiltunen
Department of Psychiatry, Helsinki University Central Hospital Lapinlahdentie, Helsinki, Finland
Matti M. Holi
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Christina M. Hultman & Patrick F. Sullivan
Icahn School of Medicine at Mount Sinai, New York, NY, USA
Christina M. Hultman
Bonei Olam, Center for Rare Jewish Genetic Diseases, Brooklyn, NY, USA
Chaim Jalas
Department of Neurology, Helsinki University, Central Hospital, Helsinki, Finland
Mikko Kallela
Cardiovascular Disease Initiative and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Sekar Kathiresan
Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Eimear E. Kenny
Division of Genome Science, Department of Precision Medicine, National Institute of Health, Cheongju-si, Republic of Korea
Bong-Jo Kim & Young Jin Kim
MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, UK
George Kirov
Imperial College, Healthcare NHS Trust, London, UK
Jaspal Kooner
National Heart and Lung Institute Cardiovascular Sciences, Imperial College London, London, UK
Jaspal Kooner
Department of Health, THL–National Institute for Health and Welfare, Helsinki, Finland
Seppo Koskinen
Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
Harlan M. Krumholz
Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT, USA
Harlan M. Krumholz
Division of Pediatric Gastroenterology, Emory University School of Medicine, Atlanta, GA, USA
Subra Kugathasan
Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
Soo Heon Kwak & Kyong Soo Park
The University of Eastern Finland, Institute of Clinical Medicine, Kuopio, Finland
Markku Laakso
Kuopio University Hospital, Kuopio, Finland
Markku Laakso
Department of Genetics, Yale School of Medicine, New Haven, CT, USA
Nicole Lake & Monkol Lek
Department of Clinical Chemistry, Tampere University, Tampere, Finland
Terho Lehtimäki & Kari M. Mattila
Fimlab Laboratories, Tampere, Finland
Terho Lehtimäki & Kari M. Mattila
Finnish Cardiovascular Research Center, Tampere Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
Terho Lehtimäki & Kari M. Mattila
The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Ruth J. F. Loos
The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Ruth J. F. Loos
National Autonomous University of Mexico, Mexico City, Mexico
Teresa Tusie Luna
Salvador Zubirán National Institute of Health Sciences and Nutrition, Mexico City, Mexico
Teresa Tusie Luna
Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China
Ronald C. W. Ma
Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Hong Kong, China
Ronald C. W. Ma
Division of Cardiology, University of California San Francisco, San Francisco, CA, USA
Gregory M. Marcus
Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
Jaume Marrugat
CIBERCV, Madrid, Spain
Jaume Marrugat
Department of Genetics, Harvard Medical School, Boston, MA, USA
Steven McCarroll
Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
Mark I. McCarthy
Welcome Centre for Human Genetics, University of Oxford, Oxford, UK
Mark I. McCarthy
Oxford NIHR Biomedical Research Centre, Oxford University Hospitals, NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK
Mark I. McCarthy
John P. Hussman Institute for Human Genomics, Leonard M. Miller School of Medicine, University of Miami, Miami, FL, USA
Jacob L. McCauley
The Dr. John T. Macdonald Foundation Department of Human Genetics, Leonard M. Miller School of Medicine, University of Miami, Miami, FL, USA
Jacob L. McCauley
F. Widjaja Foundation Inflammatory Bowel and Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
Dermot McGovern
Atherogenomics Laboratory, University of Ottawa Heart Institute, Ottawa, Canada
Ruth McPherson
Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA
James B. Meigs
Department of Clinical Sciences University, Hospital Malmo Clinical Research Center, Lund University, Malmö, Sweden
Olle Melander
University of Arizona Health Science, Tuscon, AZ, USA
Deborah Meyers
University of Maryland School of Medicine, Baltimore, MD, USA
Braxton D. Mitchell
Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, USA
Vamsi K. Mootha
International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
Aliya Naheed
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Saman Nazarian & Dan Rader
Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
Saman Nazarian
Department of Clinical Sciences, Skåne University Hospital, Lund University, Malmö, Sweden
Peter M. Nilsson
Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, UK
Michael C. O’Donovan & Michael J. Owen
Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
Yukinori Okada
Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan
Yukinori Okada
Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City, Mexico
Lorena Orozco
Laboratory of Immunogenomics and Metabolic Diseases, INMEGEN, Mexico City, Mexico
Lorena Orozco
Medical Research Institute, Ninewells Hospital and Medical School University of Dundee, Dundee, UK
Colin Palmer
Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea
Kyong Soo Park
Department of Psychiatry, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Carlos Pato
Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Ann E. Pulver
Children’s Hospital of Philadelphia, Philadelphia, PA, USA
Dan Rader
Division of Genetics and Epidemiology, Institute of Cancer Research, London, UK
Nazneen Rahman
University of Washington, Seattle, WA, USA
Alex Reiner
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Alex Reiner
Medical Research Center, Oulu University Hospital, Oulu, Finland
Anne M. Remes
Research Unit of Clinical Neuroscience, Neurology University of Oulu, Oulu, Finland
Anne M. Remes
Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
Stephen Rich
Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
Stephen Rich
Research Center Montreal Heart Institute, Montreal, Quebec, Canada
John D. Rioux
Department of Medicine, Faculty of Medicine, Université de Montréal, Quebec, Canada
John D. Rioux
Department of Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Samuli Ripatti & Erkki Vartiainen
Departments of Medicine, Pharmacology and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
Dan M. Roden
Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
Dan M. Roden
The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor–UCLA Medical Center, Torrance, CA, USA
Jerome I. Rotter & Kent D. Taylor
Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Danish Saleheen
Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Danish Saleheen
Center for Non-Communicable Diseases, Karachi, Pakistan
Danish Saleheen
National Institute for Health and Welfare, Helsinki, Finland
Veikko Salomaa & Jaana Suvisaari
Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
Nilesh J. Samani
Department of Cardiology, Deutsches Herzzentrum München, Technical University of Munich, DZHK Munich Heart Alliance, Munich, Germany
Heribert Schunkert
Technische Universität München, Munich, Germany
Heribert Schunkert
Institute of Genetic Epidemiology, Department of Genetics, Medical University of Innsbruck, Innsbruck, Austria
Sebastian Schönherr
Faculty of Medicine, University of Southampton, Southampton, UK
Eleanor G. Seaby
Duke Molecular Physiology Institute, Durham, NC, USA
Svati H. Shah
Division of Cardiology, Department of Medicine, Duke University School of Medicine, Durham, NC, USA
Svati H. Shah
Division of Cardiovascular Medicine, Nashville VA Medical Center, Vanderbilt University School of Medicine, Nashville, TN, USA
Moore B. Shoemaker
Division of Endocrinology, National University Hospital, Singapore, Singapore
Tai Shyong
NUS Saw Swee Hock School of Public Health, Singapore, Singapore
Tai Shyong
Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
Edwin K. Silverman
Harvard Medical School, Boston, MA, USA
Edwin K. Silverman
Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Pamela Sklar
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Pamela Sklar
Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Pamela Sklar
The Wallenberg Laboratory, Department of Molecular and Clinical Medicine, Institute of Medicine, Gothenburg University, Gothenburg, Sweden
J. Gustav Smith
Department of Cardiology, Wallenberg Center for Molecular Medicine and Lund University Diabetes Center, Clinical Sciences, Lund University and Skåne University Hospital, Lund, Sweden
J. Gustav Smith
Department of Cardiology, Sahlgrenska University Hospital, Gothenburg, Sweden
J. Gustav Smith
Institute of Clinical Medicine Neurology, University of Eastern Finad, Kuopio, Finland
Hilkka Soininen
Gastroenterology Department, Centre de Recherche Saint-Antoine, CRSA, AP-HP, Saint Antoine Hospital, Sorbonne Université, INSERM, Paris, France
Harry Sokol
INRA, UMR1319 Micalis, Jouy en Josas, France
Harry Sokol
Paris Center for Microbiome Medicine (PaCeMM) FHU, Paris, France
Harry Sokol
AgroParisTech, Jouy en Josas, France
Harry Sokol
Department of Twin Research and Genetic Epidemiology, King’s College London, London, UK
Tim Spector
Department of Medicine, Washington University School of Medicine, St Louis, MO, USA
Nathan O. Stitziel
Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
Nathan O. Stitziel
The McDonnell Genome Institute at Washington University, St Louis, MO, USA
Nathan O. Stitziel
Departments of Genetics and Psychiatry, University of North Carolina, Chapel Hill, NC, USA
Patrick F. Sullivan
Saw Swee Hock School of Public Health, National University of Singapore, National University Health System, Singapore, Singapore
E. Shyong Tai & Yik Ying Teo
Department of Medicine,Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
E. Shyong Tai
Duke–NUS Graduate Medical School, Singapore, Singapore
E. Shyong Tai
Life Sciences Institute, National University of Singapore, Singapore, Singapore
Yik Ying Teo
Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
Yik Ying Teo
Center for Behavioral Genomics, Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Ming Tsuang
Institute of Genomic Medicine, University of California San Diego, San Diego, CA, USA
Ming Tsuang
Endocrinology, Abdominal Center, Helsinki University Hospital, Helsinki, Finland
Tiinamaija Tuomi
Institute of Genetics, Folkhalsan Research Center, Helsinki, Finland
Tiinamaija Tuomi
Juliet Keidan Institute of Pediatric Gastroenterology, Shaare Zedek Medical Center, The Hebrew University of Jerusalem, Jerusalem, Israel
Dan Turner
Instituto de Investigaciones Biomédicas, UNAM, Mexico City, Mexico
Teresa Tusie-Luna
Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
Teresa Tusie-Luna
Department of Psychiatry and Human Behavior, University of California Irvine, Irvine, CA, USA
Marquis Vawter
Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
Lily Wang
National Heart and Lung Institute, Imperial College London, London, UK
James S. Ware
Royal Brompton and Harefield Hospitals, Guy’s and St. Thomas’ NHS Foundation Trust, London, UK
James S. Ware
MRC London Institute of Medical Sciences, Imperial College London, London, UK
James S. Ware
Radcliffe Department of Medicine, University of Oxford, Oxford, UK
Hugh Watkins
Department of Gastroenterology and Hepatology, University of Groningen and University Medical Center Groningen, Groningen, Netherlands
Rinse K. Weersma
Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland
Maija Wessman
Big Data Institute, University of Oxford, Oxford, UK
Nicola Whiffin
Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
Nicola Whiffin
Division of Cardiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
James G. Wilson
Program in Infectious Disease and Microbiome, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Ramnik J. Xavier
Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA
Ramnik J. Xavier

Authors

Siwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Laurent C. Francioli
View author publications
You can also search for this author in PubMed Google Scholar
Julia K. Goodrich
View author publications
You can also search for this author in PubMed Google Scholar
Ryan L. Collins
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Kanai
View author publications
You can also search for this author in PubMed Google Scholar
Qingbo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Alföldi
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas A. Watts
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Vittal
View author publications
You can also search for this author in PubMed Google Scholar
Laura D. Gauthier
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Poterba
View author publications
You can also search for this author in PubMed Google Scholar
Michael W. Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Yekaterina Tarasova
View author publications
You can also search for this author in PubMed Google Scholar
William Phu
View author publications
You can also search for this author in PubMed Google Scholar
Riley Grant
View author publications
You can also search for this author in PubMed Google Scholar
Mary T. Yohannes
View author publications
You can also search for this author in PubMed Google Scholar
Zan Koenig
View author publications
You can also search for this author in PubMed Google Scholar
Yossi Farjoun
View author publications
You can also search for this author in PubMed Google Scholar
Eric Banks
View author publications
You can also search for this author in PubMed Google Scholar
Stacey Donnelly
View author publications
You can also search for this author in PubMed Google Scholar
Stacey Gabriel
View author publications
You can also search for this author in PubMed Google Scholar
Namrata Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Steven Ferriera
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte Tolonen
View author publications
You can also search for this author in PubMed Google Scholar
Sam Novod
View author publications
You can also search for this author in PubMed Google Scholar
Louis Bergelson
View author publications
You can also search for this author in PubMed Google Scholar
David Roazen
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Ruano-Rubio
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Covarrubias
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Llanwarne
View author publications
You can also search for this author in PubMed Google Scholar
Nikelle Petrillo
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Wade
View author publications
You can also search for this author in PubMed Google Scholar
Thibault Jeandet
View author publications
You can also search for this author in PubMed Google Scholar
Ruchi Munshi
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen Tibbetts
View author publications
You can also search for this author in PubMed Google Scholar
Anne O’Donnell-Luria
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Solomonson
View author publications
You can also search for this author in PubMed Google Scholar
Cotton Seed
View author publications
You can also search for this author in PubMed Google Scholar
Alicia R. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Michael E. Talkowski
View author publications
You can also search for this author in PubMed Google Scholar
Heidi L. Rehm
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. Daly
View author publications
You can also search for this author in PubMed Google Scholar
Grace Tiao
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin M. Neale
View author publications
You can also search for this author in PubMed Google Scholar
Daniel G. MacArthur
View author publications
You can also search for this author in PubMed Google Scholar
Konrad J. Karczewski
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

Genome Aggregation Database Consortium

Maria Abreu
, Carlos A. Aguilar Salinas
, Tariq Ahmad
, Christine M. Albert
, Jessica Alföldi
, Diego Ardissino
, Irina M. Armean
, Elizabeth G. Atkinson
, Gil Atzmon
, Eric Banks
, John Barnard
, Samantha M. Baxter
, Laurent Beaugerie
, Emelia J. Benjamin
, David Benjamin
, Louis Bergelson
, Michael Boehnke
, Lori L. Bonnycastle
, Erwin P. Bottinger
, Donald W. Bowden
, Matthew J. Bown
, Harrison Brand
, Steven Brant
, Ted Brookings
, Sam Bryant
, Sarah E. Calvo
, Hannia Campos
, John C. Chambers
, Juliana C. Chan
, Katherine R. Chao
, Sinéad Chapman
, Daniel I. Chasman
, Siwei Chen
, Rex Chisholm
, Judy Cho
, Rajiv Chowdhury
, Mina K. Chung
, Wendy K. Chung
, Kristian Cibulskis
, Bruce Cohen
, Ryan L. Collins
, Kristen M. Connolly
, Adolfo Correa
, Miguel Covarrubias
, Beryl B. Cummings
, Dana Dabelea
, Mark J. Daly
, John Danesh
, Dawood Darbar
, Phil Darnowsky
, Joshua Denny
, Stacey Donnelly
, Ravindranath Duggirala
, Josée Dupuis
, Patrick T. Ellinor
, Roberto Elosua
, James Emery
, Eleina England
, Jeanette Erdmann
, Tõnu Esko
, Emily Evangelista
, Yossi Farjoun
, Diane Fatkin
, Steven Ferriera
, Jose Florez
, Laurent C. Francioli
, Andre Franke
, Jack Fu
, Martti Färkkilä
, Stacey Gabriel
, Kiran Garimella
, Laura D. Gauthier
, Jeff Gentry
, Gad Getz
, David C. Glahn
, Benjamin Glaser
, Stephen J. Glatt
, David Goldstein
, Clicerio Gonzalez
, Julia K. Goodrich
, Riley Grant
, Leif Groop
, Sanna Gudmundsson
, Namrata Gupta
, Andrea Haessly
, Christopher Haiman
, Ira Hall
, Craig L. Hanis
, Matthew Harms
, Mikko Hiltunen
, Matti M. Holi
, Christina M. Hultman
, Chaim Jalas
, Thibault Jeandet
, Mikko Kallela
, Masahiro Kanai
, Diane Kaplan
, Jaakko Kaprio
, Konrad J. Karczewski
, Sekar Kathiresan
, Eimear E. Kenny
, Bong-Jo Kim
, Young Jin Kim
, Daniel King
, George Kirov
, Zan Koenig
, Jaspal Kooner
, Seppo Koskinen
, Harlan M. Krumholz
, Subra Kugathasan
, Soo Heon Kwak
, Markku Laakso
, Nicole Lake
, Trevyn Langsford
, Kristen M. Laricchia
, Terho Lehtimäki
, Monkol Lek
, Emily Lipscomb
, Christopher Llanwarne
, Ruth J. F. Loos
, Wenhan Lu
, Steven A. Lubitz
, Teresa Tusie Luna
, Ronald C. W. Ma
, Daniel G. MacArthur
, Gregory M. Marcus
, Jaume Marrugat
, Alicia R. Martin
, Kari M. Mattila
, Steven McCarroll
, Mark I. McCarthy
, Jacob L. McCauley
, Dermot McGovern
, Ruth McPherson
, James B. Meigs
, Olle Melander
, Andres Metspalu
, Deborah Meyers
, Eric V. Minikel
, Braxton D. Mitchell
, Vamsi K. Mootha
, Ruchi Munshi
, Aliya Naheed
, Saman Nazarian
, Benjamin M. Neale
, Peter M. Nilsson
, Sam Novod
, Anne O’Donnell-Luria
, Michael C. O’Donovan
, Yukinori Okada
, Dost Ongur
, Lorena Orozco
, Michael J. Owen
, Colin Palmer
, Nicholette D. Palmer
, Aarno Palotie
, Kyong Soo Park
, Carlos Pato
, Nikelle Petrillo
, William Phu
, Timothy Poterba
, Ann E. Pulver
, Dan Rader
, Nazneen Rahman
, Heidi L. Rehm
, Alex Reiner
, Anne M. Remes
, Dan Rhodes
, Stephen Rich
, John D. Rioux
, Samuli Ripatti
, David Roazen
, Dan M. Roden
, Jerome I. Rotter
, Valentin Ruano-Rubio
, Nareh Sahakian
, Danish Saleheen
, Veikko Salomaa
, Andrea Saltzman
, Nilesh J. Samani
, Kaitlin E. Samocha
, Alba Sanchis-Juan
, Jeremiah Scharf
, Molly Schleicher
, Heribert Schunkert
, Sebastian Schönherr
, Eleanor G. Seaby
, Cotton Seed
, Svati H. Shah
, Megan Shand
, Ted Sharpe
, Moore B. Shoemaker
, Tai Shyong
, Edwin K. Silverman
, Moriel Singer-Berk
, Pamela Sklar
, Jonathan T. Smith
, J. Gustav Smith
, Hilkka Soininen
, Harry Sokol
, Matthew Solomonson
, Rachel G. Son
, Jose Soto
, Tim Spector
, Christine Stevens
, Nathan O. Stitziel
, Patrick F. Sullivan
, Jaana Suvisaari
, E. Shyong Tai
, Michael E. Talkowski
, Yekaterina Tarasova
, Kent D. Taylor
, Yik Ying Teo
, Grace Tiao
, Kathleen Tibbetts
, Charlotte Tolonen
, Ming Tsuang
, Tiinamaija Tuomi
, Dan Turner
, Teresa Tusie-Luna
, Erkki Vartiainen
, Marquis Vawter
, Christopher Vittal
, Gordon Wade
, Lily Wang
, Qingbo Wang
, Arcturus Wang
, James S. Ware
, Hugh Watkins
, Nicholas A. Watts
, Rinse K. Weersma
, Ben Weisburd
, Maija Wessman
, Nicola Whiffin
, Michael W. Wilson
, James G. Wilson
, Ramnik J. Xavier
& Mary T. Yohannes

Contributions

S.C., L.C.F., J.K.G., Q.W., A.O.-L., H.L.R., M.J.D., B.M.N., D.G.M. and K.J.K. contributed to the writing of the manuscript and generation of figures. S.C., R.L.C., M.K. and K.J.K. contributed to the analysis of data. L.C.F., Q.W., C.V., L.D.G., T.P., C.S., M.E.T., B.M.N. and K.J.K. developed tools and methods. L.C.F., J.K.G., J.A., M.W.W., Y.T., W.P., M.T.Y., Z.K., Y.F., E.B., S.D., S.G., N.G., S.F., C.T., S.N., L.B., D.R., V.R.-R., M.C., C.L., N.P., G.W., T.J., R.M., K.T., A.R.M., G.T. and K.J.K. contributed to the production and quality control of the gnomAD dataset. N.A.W., R.G., M.S. and K.J.K. contributed to the gnomAD browser. All authors listed under The Genome Aggregation Database Consortium contributed to the generation of the primary data incorporated into the gnomAD resource. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Siwei Chen or Konrad J. Karczewski.

Ethics declarations

Competing interests

K.J.K. is a consultant for Vor Biopharma, Tome Biosciences, and is on the Scientific Advisory Board of Nurture Genomics. D.G.M. is a paid advisor to GSK, Insitro, Variant Bio and Overtone Therapeutics, and has previously received research support from AbbVie, Astellas, Biogen, BioMarin, Eisai, Merck, Pfizer and Sanofi-Genzyme.

Peer review

Peer review information

Nature thanks Slavé Petrovski, Ryan Dhindsa and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Construction of mutational model and Gnocchi score.

a,b, Estimation of trinucleotide context-specific mutation rates. The proportion of possible variants observed for each substitution and context in 76,156 gnomAD genomes (y-axis) is exponentially correlated with the absolute mutation rate estimated from 1,000 downsampled genomes (x-axis). Fit lines were modeled separately for human autosomes (a) and chromosome X (b). c, Estimation of the effects of regional genomic features on mutation rates. The effects of 13 genomic features at four scales (window sizes 1kb-1Mb; x-axis) on the mutation rate of 32 trinucleotide contexts (y-axis) are shown, colored by the coefficient from regressing de novo mutations (DNMs) on each specific feature and window size. Red/Blue color indicates a positive/negative effect of increasing the feature value on mutation rates; grey crosses indicate significant features at the smallest possible window size after Bonferroni correction for 13×4 = 52 tests. Abbreviations: LCR=low-complexity region, SINE/LINE=short/long interspersed nuclear element, Dist=Distance, Recomb=Recombination, Methyl=Methylation. d,e, The distribution of Gnocchi score as a function of expected and observed variation. Each point represents the Gnocchi score of a 1kb window on the genome (N = 1,984,900 on autosomes (d) and N = 57,729 on chromosome X (e)), which quantifies the deviation of observed variation from expectation. A positive Gnocchi score (red) indicates depletion of variation (observed<expected) and the higher the score the stronger the depletion; the red dashed line indicates the 99^th percentile of Gnocchi scores across the autosomes (d) or chromosome X (e).

Extended Data Fig. 2 Comparison of Gnocchi score between coding and non-coding regions.

a, The proportion of highly constrained windows (Gnocchi ≥ 4) as a function of the percentage of coding sequences in a window (left to right: N = 1,906/49,525, 3,244/55,676, 2,240/18,461, 1,506/7,094, 969/3,519, 569/1,946, 364/1,223, 283/910, 243/724, 10,392/30,138). The intervals (x-axis) are left exclusive and right inclusive. “Exonic only” refers to the 1kb windows created from directly concatenating coding exons into 1kb sequences. Error bars indicate standard errors of the proportions. b, The exonic-only regions (N = 27,875; purple) present a significantly higher Gnocchi score than regions that are exclusively non-coding (N = 1,843,559; blue). Dashed lines indicate the medians. c, The proportion of highly constrained windows (Gnocchi≥4) as a function of the proportion of exonic windows being added to the dataset of non-coding windows. d, Gnocchi score percentiles of non-coding versus exonic windows. About 0.05% (100-99.95%) and 3.12% (100-96.88%) of the non-coding windows exhibit similar constraint to the 90^th and 50^th of exonic regions, respectively.

Extended Data Fig. 3 Estimation of constraint for aggregated regulatory annotations.

a,b, Gnocchi scores of aggregated promoter (dark purple), enhancer (light purple), microRNA (miRNA; dark blue), and long non-coding RNA (lncRNA; light blue) annotations are compared against those of exonic (a) and non-coding (b) regions at a 1kb scale. The Gnocchi score percentiles of each annotation (y-axis) are benchmarked by the score deciles of exonic or non-coding regions (10–100 percentiles; x-axis); the grey dashed vertical line indicates the median (50^th percentile).

Extended Data Fig. 4 Applications of Gnocchi for characterizing non-coding regions in addition to existing functional annotations.

a, Use of Gnocchi for prioritizing non-coding regions with or without a regulatory annotation (N = 464,504 and 1,379,055, respectively). Constrained non-coding regions are enriched for GWAS variants, independent of the candidate cis-regulatory element (cCRE) annotation from ENCODE. Error bars indicate 95% confidence intervals of the odds ratios. b, Use of Gnocchi in statistical fine-mapping. The increase in posterior inclusion probability (PIP) when incorporating Gnocchi score as a functional prior into previous fine-mapping results (that used a uniform prior; denoted as PIP_Gnocchi and PIP_unif, respectively) is shown for 164 new likely causal associations with a PIP_Gnocchi ≥0.8 as a function of PIP_Gnocchi.

Extended Data Fig. 5 Comparison of Gnocchi and other predictive metrics in prioritizing non-coding variants.

a, Receiver operating characteristic (ROC) curves of Gnocchi and other seven metrics in classifying putative functional non-coding variants (“positive” variant set) – left to right: 9,229 GWAS Catalog variants, 2,191 GWAS fine-mapping variants, a subset of 140 high-confidence fine-mapped variants, and 1,026 likely pathogenic variants – against “negative” variant set randomly drew from the population with a similar allele frequency (AF). AF>5% and allele count (AC) = 1 were applied respectively for matching the three GWAS variant sets and the likely pathogenic variant set, based on their AF distributions in TOPMed (shown in b). b, AUCs of the classification with a varying AF threshold for the negative variant set. As most GWAS variants are common and most likely pathogenic variants are very rare (not seen in the population), AF>5% and AC = 1 were applied respectively in the primary analyses shown in a.

Extended Data Fig. 6 Comparison of constraint scores built from different mutational models and genomic windows.

Gnocchi (presented in this study) outperforms the scores rebuilt from mutational models that only consider local sequence context – trinucleotide (trimer-only) or heptanucleotide (heptamer-only) – without adjustment on mutation rate by regional genomic features, and the performance is robust to the artificial break of genomic windows when computed at a 1kb sliding by 100bp scale.

Extended Data Fig. 7 Pairwise correlations between different constraint/conservation metrics.

The Spearman’s rank correlation between each pair of the eight metrics was computed based on the mean value of each score on 1kb windows across the genome.

Extended Data Fig. 8 Power of constraint detection.

a,b, The sample size required for well-powered non-coding constraint detection. The percentage of non-coding regions powered to detect constraint (Gnocchi ≥ 4) at a 1kb (a) and 100bp (b) scale under varying levels of selection (depletion of variation) is shown as a function of log-scaled sample size. Lighter color indicates milder deletion of variation (weaker selection), which requires a larger sample size to detect constraint; the grey dashed vertical line indicates the current sample size of 76,156 genomes. Dotted curves (left to right) benchmark the 95^th, 90^th, and 50^th percentile of depletion of variation observed in coding exons of similar size. The number of samples required to obtain an 80% detection power is labeled at corresponding benchmarks. c, AUCs of Gnocchi scores computed on different window sizes in identifying putative functional non-coding variants. 1kb (used in this study) presents the optimal window size with high performance while maintaining reasonable resolution. d, AUCs of Gnocchi scores computed from different subsets of gnomAD in identifying putative functional non-coding variants. While with an equal sample size, the downsampled dataset with diverse ancestries presents higher performance than the Non-Finnish European (NFE)-only dataset.

Supplementary information

Supplementary Information

This file provides detailed information about the aggregation, processing, and release of 76,156 human genomes from the Genome Aggregation Database (gnomAD), including Supplementary Figs. 1–8, Supplementary Tables 1–3, and descriptions of supplementary datasets.

Reporting Summary

Peer Review File

Supplementary Datasets

This zipped file contains supplementary dataset items 1–6: see Supplementary Information for supplementary dataset guide.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, S., Francioli, L.C., Goodrich, J.K. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024). https://doi.org/10.1038/s41586-023-06045-0

Download citation

Received: 16 March 2022
Accepted: 03 April 2023
Published: 06 December 2023
Issue Date: 04 January 2024
DOI: https://doi.org/10.1038/s41586-023-06045-0

This article is cited by

Multilocus pathogenic variants contribute to intrafamilial clinical heterogeneity: a retrospective study of sibling pairs with neurodevelopmental disorders
- Tugce Bozkurt-Yozgatli
- Davut Pehlivan
- Zeynep Coban-Akdemir
BMC Medical Genomics (2024)
A de novo missense mutation in synaptotagmin-1 associated with neurodevelopmental disorder desynchronizes neurotransmitter release
- Maaike A. van Boven
- Marta Mestroni
- L. Niels Cornelisse
Molecular Psychiatry (2024)
An expanded genomic database for identifying disease-related variants
- Ryan S. Dhindsa
- Slavé Petrovski
Nature (2024)
Workshop report: the clinical application of data from multiplex assays of variant effect (MAVEs), 12 July 2023
- Sophie Allen
- Alice Garrett
- Clare Turnbull
European Journal of Human Genetics (2024)
Lung cancer in patients who have never smoked — an emerging disease
- Jaclyn LoPiccolo
- Alexander Gusev
- Pasi A. Jänne
Nature Reviews Clinical Oncology (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.