Annotating and prioritizing human non-coding variants with RegulomeDB v.2

Dong, Shengcheng; Zhao, Nanxiang; Spragins, Emma; Kagda, Meenakshi S.; Li, Mingjie; Assis, Pedro; Jolanki, Otto; Luo, Yunhai; Cherry, J. Michael; Boyle, Alan P.; Hitz, Benjamin C.

doi:10.1038/s41588-023-01365-3

Download PDF

Correspondence
Published: 25 April 2023

Annotating and prioritizing human non-coding variants with RegulomeDB v.2

Nature Genetics volume 55, pages 724–726 (2023)Cite this article

9961 Accesses
13 Citations
63 Altmetric
Metrics details

Subjects

Nearly 90% of the disease risk-associated variants identified by genome-wide association studies are in non-coding regions of the genome. The annotations obtained by analyzing functional genomics assays can provide additional information to pinpoint causal variants, which are often not the lead variants identified from association studies. However, the lack of available annotation tools limits the use of such data. To address the challenge, we previously built the ‘RegulomeDB database’ to prioritize and annotate variants in non-coding regions¹, which has been a highly utilized resource for the research community (Supplementary Fig. 1).

Here we present an update of the RegulomeDB web server, RegulomeDB v.2 (http://regulomedb.org). RegulomeDB annotates a variant by intersecting its position with genomic intervals identified from functional genomic assays and computational approaches. It also incorporates variant hits into a heuristic ranking score, representing its potential to be functional in regulatory elements. We improve and boost annotation power by incorporating thousands of newly processed data from functional genomic assays in GRCh38 assembly and include probabilistic scores from the SURF algorithm that was the top performing non-coding variant predictor in the Fifth Critical Assessment of Genome Interpretation (CAGI-5)².

The update of RegulomeDB now includes more than 650 million and 1.5 billion genomic intervals in hg19 and GRCh38, respectively — a fivefold increase compared with the previous version (Supplementary Fig. 2). We included approximately 5,000 chromatin immunoprecipitation followed by sequencing experiments targeting transcription factors (TF ChIP–seq), and chromatin accessibility experiments from the ENCODE project³, the Roadmap Epigenomics program⁴, and the Genomics of Gene Regulation project. We also produced a comprehensive set of footprint predictions using over 800 chromatin accessibility experiments and 591 transcription factor motifs in GRCh38 using the TRACE pipeline⁵. In addition, we refined the included transcription factor motifs by using the non-redundant vertebrates set from the JASPAR database⁶. We also integrated approximately 71 million variant–gene pairs in expression quantitative trait loci (eQTL) studies from the GTEx project⁷, and 450,000 chromatin-accessibility QTLs (caQTLs) from 9 recent publications (Supplementary Information). Finally, we included chromatin state annotations known as from chromHMM in EpiMap for 833 biosamples⁸.

RegulomeDB accepts any query variants genome-wide in either GRCh38 or hg19 genome assembly by rsID or genome coordinates. The query variants can then be prioritized by functional prediction scores shown in a sortable table. For any variant of interest, an information page on five types of supported genomic evidence, as well as a genome browser view is displayed. Each of the six sections can be clicked to show more detail for functionality exploration (Supplementary Figs. 3–5).

RegulomeDB enables researchers to quickly separate functional variants from a large pool of variants and assign tissue or organ specificity for each variant. Here we showcase this using four verified variants from recent literature^{9,10,11,12,13}, and demonstrate the applicability of RegulomeDB to annotate those variants based on various sources of data (Fig. 1).

**Fig. 1: Prioritization of functional variants with RegulomeDB version 2.**

Transcription factor motifs and ChIP–seq data together provide evidence about how a variant is likely to affect phenotype in a cell-specific context. For example, rs213641 is known to affect behavioral responses to fear and anxiety stimuli⁹. The POLR2A binding and the active transcriptional start site (TSS) state in the brain indicate that rs213641 is likely to function in the brain by disrupting the TSS of STMN1. We also examined rs7789585, in which RegulomeDB transcription factor motif evidence suggests that mutation to the reference allele G would disrupt the binding of GCM1, which may interrupt the active enhancer state at the locus in the heart. Hocker et al.¹⁰ recently confirmed this hypothesis using reporter assays, and discovered that rs7789585 disrupts a KCNH2 enhancer and affects cardiomyocyte electrophysiologic function.

DNase-seq assays and underlying footprint predictions identify open chromatin regions with mapped transcription factor binding sites in hundreds of biosamples and can also be used to assign putative function to variants. rs190509934 has been associated with the risk of COVID-19 infection by affecting ACE2 expression¹¹. RegulomeDB shows hits to several DNase-seq peaks in lung-related biosamples. Furthermore, RegulomeDB extends this tissue effect with the hypothesis that ACE2 expression may be regulated by CEBP by its overlap with DNase footprints in the lung found in the upstream promoter region of ACE2¹². In addition, eQTL studies provide correlation evidence between the variants and their target genes. For example, rs72635708 is predicted as a regulatory variant by RegulomeDB with a high probability of 0.91 due to its locus overlapping with DNase and ChIP–seq peaks, footprints, and it is an eQTL that associates with LINC01714 gene expression in the right lobe liver. Because rs72635708 lies in the FOS motif, it is likely to be a functional variant in the liver by modulating the binding of the AP-1 complex¹³.

In summary, RegulomeDB provides a user-friendly tool to annotate and prioritize variants in non-coding regions of the human genome, which can aid variant function interpretation and guide follow-up experiments. We welcome user feedback through regulomedb@mailman.stanford.edu.

Reporting Summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

RegulomeDB v.2 can be accessed through the web server at https://regulomedb.org. All datasets collected in RegulomeDB are accessible through the ENCODE portal https://www.encodeproject.org/search/?internal_tags=RegulomeDB_2_2.

Code availability

The code RegulomeDB uses is available on GitHub repository at https://github.com/ENCODE-DCC/regulome-encoded/releases/tag/v2.2 and https://github.com/ENCODE-DCC/genomic-data-service/releases/tag/v2.2.

References

Boyle, A. P. et al. Genome Res. 22, 1790–1797 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dong, S. & Boyle, A. P. Hum. Mutat. 40, 1292–1298 (2019).
Article CAS PubMed PubMed Central Google Scholar
ENCODE Project Consortium. Nature 583, 699–710 (2020).
Article Google Scholar
Roadmap Epigenomics Consortium. Nature 518, 317–330 (2015).
Article PubMed Central Google Scholar
Ouyang, N. & Boyle, A. P. Genome Res. 30, 1040–1046 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fornes, O. et al. Nucleic Acids Res. 48, D87–D92 (2020).
Article CAS PubMed Google Scholar
GTEx Consortium. Science 369, 1318–1330 (2020).
Article Google Scholar
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Nature 590, 300–307 (2021).
Article CAS PubMed PubMed Central Google Scholar
Brocke, B. et al. Am. J. Med. Genet. B Neuropsychiatr. Genet. 153B, 243–251 (2010).
CAS PubMed Google Scholar
Hocker, J. D. et al. Sci. Adv. 7, eabf1444 (2021).
Article CAS PubMed PubMed Central Google Scholar
Horowitz, J. E. et al. Nat. Genet. 54, 382–392 (2022).
Article CAS PubMed PubMed Central Google Scholar
Beacon, T. H., Delcuve, G. P. & Davie, J. R. Genome 64, 386–399 (2021).
Article CAS PubMed Google Scholar
Kubota, N. & Suyama, M. BMC Med. Genomics 13, 8 (2020).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the RegulomeDB users and the scientific community for producing and sharing functional genomic experiments. We also thank all members in the Cherry and Boyle laboratories for constructive feedbacks. This research was supported by US National Institutes of Health (NIH) grants U24 HG009293 (A.P.B. and J.M.C).

Author information

These authors contributed equally: Shengcheng Dong, Nanxiang Zhao.

Authors and Affiliations

Department of Genetics, Stanford University, Stanford, CA, USA
Shengcheng Dong, Emma Spragins, Meenakshi S. Kagda, Mingjie Li, Pedro Assis, Otto Jolanki, Yunhai Luo, J. Michael Cherry & Benjamin C. Hitz
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Nanxiang Zhao & Alan P. Boyle
Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
Alan P. Boyle

Authors

Shengcheng Dong
View author publications
You can also search for this author in PubMed Google Scholar
Nanxiang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Emma Spragins
View author publications
You can also search for this author in PubMed Google Scholar
Meenakshi S. Kagda
View author publications
You can also search for this author in PubMed Google Scholar
Mingjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Assis
View author publications
You can also search for this author in PubMed Google Scholar
Otto Jolanki
View author publications
You can also search for this author in PubMed Google Scholar
Yunhai Luo
View author publications
You can also search for this author in PubMed Google Scholar
J. Michael Cherry
View author publications
You can also search for this author in PubMed Google Scholar
Alan P. Boyle
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin C. Hitz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Alan P. Boyle or Benjamin C. Hitz.

Ethics declarations

Competing interests

The authors have no competing interests.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, S., Zhao, N., Spragins, E. et al. Annotating and prioritizing human non-coding variants with RegulomeDB v.2. Nat Genet 55, 724–726 (2023). https://doi.org/10.1038/s41588-023-01365-3

Download citation

Published: 25 April 2023
Issue Date: May 2023
DOI: https://doi.org/10.1038/s41588-023-01365-3

This article is cited by

Characterizing the pathogenicity of genetic variants: the consequences of context
- Timothy H. Ciesielski
- Giorgio Sirugo
- Scott M. Williams
npj Genomic Medicine (2024)
Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis
- Sneha Mitra
- Rohan Malik
- Christina S. Leslie
Nature Genetics (2024)
Genetic Determinants of Selenium Availability, Selenium-Response, and Risk of Polycystic Ovary Syndrome
- Priya Sharma
- Preeti Khetarpal
Biological Trace Element Research (2024)
GWAS for systemic sclerosis identifies six novel susceptibility loci including one in the Fcγ receptor region
- Yuki Ishikawa
- Nao Tanaka
- Chikashi Terao
Nature Communications (2024)
Deciphering the genetic landscape of obesity: a data-driven approach to identifying plausible causal genes and therapeutic targets
- Mia Yang Ang
- Fumihiko Takeuchi
- Norihiro Kato
Journal of Human Genetics (2023)

Annotating and prioritizing human non-coding variants with RegulomeDB v.2

Subjects

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

About this article

Cite this article

This article is cited by

Characterizing the pathogenicity of genetic variants: the consequences of context

Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis

Genetic Determinants of Selenium Availability, Selenium-Response, and Risk of Polycystic Ovary Syndrome

GWAS for systemic sclerosis identifies six novel susceptibility loci including one in the Fcγ receptor region

Deciphering the genetic landscape of obesity: a data-driven approach to identifying plausible causal genes and therapeutic targets

Search

Quick links

Subjects

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Characterizing the pathogenicity of genetic variants: the consequences of context

Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis

Genetic Determinants of Selenium Availability, Selenium-Response, and Risk of Polycystic Ovary Syndrome

GWAS for systemic sclerosis identifies six novel susceptibility loci including one in the Fcγ receptor region

Deciphering the genetic landscape of obesity: a data-driven approach to identifying plausible causal genes and therapeutic targets

Search

Quick links