Abstract
Decoding the epigenomic landscapes in diverse tissues and cell types is fundamental to understanding molecular mechanisms underlying many essential cellular processes and human diseases. Recent advances in artificial intelligence provide new methods and strategies for imputing unknown epigenomes based on existing data, yet how to reveal the predictive relationships among epigenetic marks remains largely unexplored. Here we present a machine learning approach for epigenomic imputation and interpretation. Through dissection of the spatial contributions from six histone marks, we reveal the prevalent and asymmetric cross-prediction relationships among these marks. Meanwhile, our approach achieved high predictive performance on held-out prospective epigenomes and outperformed the state of the art. To facilitate future research, we further applied this approach to impute a total of 527 and 2,455 unavailable genome-wide histone modification signal tracks for the ENCODE3 and Roadmap datasets, respectively.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The ENCODE Imputation Challenge data were downloaded from: (1) training data (https://www.synapse.org/#!Synapse:syn18143300) and (2) testing data (http://mitra.stanford.edu/kundaje/ic/blind/). The ENCODE3 histone modification data were downloaded from: https://www.encodeproject.org/ based on the accession numbers listed in Supplementary Table 15. The Roadmap histone modification data were downloaded from: https://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidated/macs2signal/pval/. Our epigenome imputation for the missing entries in the ENCODE3 and Roadmap datasets are available at: https://guanfiles.dcmb.med.umich.edu/Ocelot/imputation_encode3/ and https://guanfiles.dcmb.med.umich.edu/Ocelot/imputation_roadmap/.
Code availability
The code of Ocelot is available in the GitHub repository at https://github.com/GuanLab/Ocelot and https://doi.org/10.5281/zenodo.5847578 (ref. 45).
References
Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Rivera, C. M. & Ren, B. Mapping human epigenomes. Cell 155, 39–55 (2013).
Smith, Z. D. & Meissner, A. DNA methylation: roles in mammalian development. Nat. Rev. Genet. 14, 204–220 (2013).
Thiele, I. et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 31, 419–425 (2013).
Wittkopp, P. J. & Kalay, G. cis-Regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2011).
Barrat, F. J., Crow, M. K. & Ivashkiv, L. B. Interferon target-gene expression and epigenomic signatures in health and disease. Nat. Immunol. 20, 1574–1583 (2019).
Hekselman, I. & Yeger-Lotem, E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 21, 137–150 (2020).
Lukong, K. E., Chang, K.-W., Khandjian, E. W. & Richard, S. RNA-binding proteins in human genetic disease. Trends Genet. 24, 416–425 (2008).
ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Roadmap Epigenomics Consortiumet al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Ernst, J. & Kellis, M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. 33, 364–376 (2015).
Durham, T. J., Libbrecht, M. W., Howbert, J. J., Bilmes, J. & Noble, W. S. PREDICTD PaRallel epigenomics data imputation with cloud-based tensor decomposition. Nat. Commun. 9, 1402 (2018).
Schreiber, J., Durham, T., Bilmes, J. & Noble, W. S. Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome. Genome Biol. 21, 81 (2020).
Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).
Hoffman, M. M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012).
Ke, G. et al. LightGBM: a highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (eds. Guyon, I. et al.) Vol. 30 (Curran Associates, 2017).
Li, H. & Guan, Y. Fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution. Genome Res. 31, 721–731 (2021).
Shapley, L. S. 17. in Contributions to the Theory of Games (AM-28) Vol. II (eds. Kuhn, H. W. & Tucker, A. W.) 307–318 (Princeton Univ. Press, 1953).
Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).
Katoh, N. et al. Reciprocal changes of H3K27ac and H3K27me3 at the promoter regions of the critical genes for endometrial decidualization. Epigenomics 10, 1243–1257 (2018).
Juan, A. H. et al. Roles of H3K27me2 and H3K27me3 examined during fate specification of embryonic stem cells. Cell Rep. 17, 1369–1382 (2016).
Liu, L., Zhao, W. & Zhou, X. Modeling co-occupancy of transcription factors using chromatin features. Nucleic Acids Res. 44, e49 (2016).
Zhou, M., Li, H., Wang, X. & Guan, Y. Evidence of widespread, independent sequence signature for transcription factor cobinding. Genome Res. https://doi.org/10.1101/gr.267310.120 (2020).
Ghandi, M., Lee, D., Mohammad-Noori, M. & Beer, M. A. Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10, e1003711 (2014).
Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).
Li, H., Quang, D. & Guan, Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res. 29, 281–292 (2019).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
Zhang, L., Xue, G., Liu, J., Li, Q. & Wang, Y. Revealing transcription factor and histone modification co-localization and dynamics across cell lines by integrating ChIP-seq and RNA-seq data. BMC Genomics 19, 914 (2018).
Xin, B. & Rohs, R. Relationship between histone modifications and transcription factor binding is protein family specific. Genome Res. https://doi.org/10.1101/gr.220079.116 (2018).
Liu, L., Jin, G. & Zhou, X. Modeling the relationship of epigenetic modifications to transcription factor binding. Nucleic Acids Res. 43, 3873–3885 (2015).
Benveniste, D., Sonntag, H.-J., Sanguinetti, G. & Sproul, D. Transcription factor binding predicts histone modifications in human cell lines. Proc. Natl Acad. Sci. USA 111, 13367–13372 (2014).
Ngo, V. et al. Epigenomic analysis reveals DNA motifs regulating histone modifications in human and mouse. Proc. Natl Acad. Sci. USA 116, 3668–3677 (2019).
Wang, M. et al. Identification of DNA motifs that regulate DNA methylation. Nucleic Acids Res. 47, 6753–6768 (2019).
Cochran, K. et al. Domain adaptive neural networks improve cross-species prediction of transcription factor binding. Genome Res. https://doi.org/10.1101/2021.02.13.431115 (2021).
Chen, L., Fish, A. E. & Capra, J. A. Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties. PLoS Comput. Biol. 14, e1006484 (2018).
Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).
Schreiber, J., Hegde, D. & Noble, W. Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics. Proc. 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (ACM, 2020).
Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).
Li, H. & Guan, Y. DeepSleep convolutional neural network allows accurate and fast detection of sleep arousal. Commun Biol 4, 18 (2021).
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
Lizio, M. et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 16, 22 (2015).
Li, H. GuanLab/Ocelot: The 1st Release (Zenodo, 2022); https://doi.org/10.5281/zenodo.5847578
Acknowledgements
This work is supported by NIH/NIGMS R35GM133346 and NSF/DBI (grant no.1452656) to Y.G.
Author information
Authors and Affiliations
Contributions
H.L. and Y.G. conceived and designed this project. H.L. implemented the method, performed the experiments and prepared the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–38 and captions for Tables 1–15.
Supplementary Table 1
Supplementary Tables 1–15.
Rights and permissions
About this article
Cite this article
Li, H., Guan, Y. Asymmetric predictive relationships across histone modifications. Nat Mach Intell 4, 288–299 (2022). https://doi.org/10.1038/s42256-022-00455-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-022-00455-x