Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Asymmetric predictive relationships across histone modifications

Abstract

Decoding the epigenomic landscapes in diverse tissues and cell types is fundamental to understanding molecular mechanisms underlying many essential cellular processes and human diseases. Recent advances in artificial intelligence provide new methods and strategies for imputing unknown epigenomes based on existing data, yet how to reveal the predictive relationships among epigenetic marks remains largely unexplored. Here we present a machine learning approach for epigenomic imputation and interpretation. Through dissection of the spatial contributions from six histone marks, we reveal the prevalent and asymmetric cross-prediction relationships among these marks. Meanwhile, our approach achieved high predictive performance on held-out prospective epigenomes and outperformed the state of the art. To facilitate future research, we further applied this approach to impute a total of 527 and 2,455 unavailable genome-wide histone modification signal tracks for the ENCODE3 and Roadmap datasets, respectively.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of experimental design.
Fig. 2: Schematic illustration of the tree-based model and neural network model design.
Fig. 3: Ocelot reveals the asymmetric and spatial cross-regulation of multiple histone modifications in epigenome imputation.
Fig. 4: Predictive performance comparisons between Ocelot, ChromImpute and Avocado.
Fig. 5: An imputation example on the ENCODE Imputation Challenge dataset.
Fig. 6: Application of Ocelot to impute missing entries and complete the ENCODE3 histone mark dataset.

Similar content being viewed by others

Data availability

The ENCODE Imputation Challenge data were downloaded from: (1) training data (https://www.synapse.org/#!Synapse:syn18143300) and (2) testing data (http://mitra.stanford.edu/kundaje/ic/blind/). The ENCODE3 histone modification data were downloaded from: https://www.encodeproject.org/ based on the accession numbers listed in Supplementary Table 15. The Roadmap histone modification data were downloaded from: https://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidated/macs2signal/pval/. Our epigenome imputation for the missing entries in the ENCODE3 and Roadmap datasets are available at: https://guanfiles.dcmb.med.umich.edu/Ocelot/imputation_encode3/ and https://guanfiles.dcmb.med.umich.edu/Ocelot/imputation_roadmap/.

Code availability

The code of Ocelot is available in the GitHub repository at https://github.com/GuanLab/Ocelot and https://doi.org/10.5281/zenodo.5847578 (ref. 45).

References

  1. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  Google Scholar 

  2. Rivera, C. M. & Ren, B. Mapping human epigenomes. Cell 155, 39–55 (2013).

    Article  Google Scholar 

  3. Smith, Z. D. & Meissner, A. DNA methylation: roles in mammalian development. Nat. Rev. Genet. 14, 204–220 (2013).

    Article  Google Scholar 

  4. Thiele, I. et al. A community-driven global reconstruction of human metabolism. Nat. Biotechnol. 31, 419–425 (2013).

    Article  Google Scholar 

  5. Wittkopp, P. J. & Kalay, G. cis-Regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2011).

    Article  Google Scholar 

  6. Barrat, F. J., Crow, M. K. & Ivashkiv, L. B. Interferon target-gene expression and epigenomic signatures in health and disease. Nat. Immunol. 20, 1574–1583 (2019).

    Article  Google Scholar 

  7. Hekselman, I. & Yeger-Lotem, E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 21, 137–150 (2020).

    Article  Google Scholar 

  8. Lukong, K. E., Chang, K.-W., Khandjian, E. W. & Richard, S. RNA-binding proteins in human genetic disease. Trends Genet. 24, 416–425 (2008).

    Article  Google Scholar 

  9. ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).

    Article  Google Scholar 

  10. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  Google Scholar 

  11. Roadmap Epigenomics Consortiumet al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  Google Scholar 

  12. Ernst, J. & Kellis, M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. 33, 364–376 (2015).

    Article  Google Scholar 

  13. Durham, T. J., Libbrecht, M. W., Howbert, J. J., Bilmes, J. & Noble, W. S. PREDICTD PaRallel epigenomics data imputation with cloud-based tensor decomposition. Nat. Commun. 9, 1402 (2018).

    Article  Google Scholar 

  14. Schreiber, J., Durham, T., Bilmes, J. & Noble, W. S. Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome. Genome Biol. 21, 81 (2020).

    Article  Google Scholar 

  15. Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).

    Article  Google Scholar 

  16. Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

    Article  Google Scholar 

  17. Hoffman, M. M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012).

    Article  Google Scholar 

  18. Ke, G. et al. LightGBM: a highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (eds. Guyon, I. et al.) Vol. 30 (Curran Associates, 2017).

  19. Li, H. & Guan, Y. Fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution. Genome Res. 31, 721–731 (2021).

    Article  Google Scholar 

  20. Shapley, L. S. 17. in Contributions to the Theory of Games (AM-28) Vol. II (eds. Kuhn, H. W. & Tucker, A. W.) 307–318 (Princeton Univ. Press, 1953).

  21. Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).

    Article  Google Scholar 

  22. Katoh, N. et al. Reciprocal changes of H3K27ac and H3K27me3 at the promoter regions of the critical genes for endometrial decidualization. Epigenomics 10, 1243–1257 (2018).

    Article  Google Scholar 

  23. Juan, A. H. et al. Roles of H3K27me2 and H3K27me3 examined during fate specification of embryonic stem cells. Cell Rep. 17, 1369–1382 (2016).

    Article  Google Scholar 

  24. Liu, L., Zhao, W. & Zhou, X. Modeling co-occupancy of transcription factors using chromatin features. Nucleic Acids Res. 44, e49 (2016).

    Article  Google Scholar 

  25. Zhou, M., Li, H., Wang, X. & Guan, Y. Evidence of widespread, independent sequence signature for transcription factor cobinding. Genome Res. https://doi.org/10.1101/gr.267310.120 (2020).

  26. Ghandi, M., Lee, D., Mohammad-Noori, M. & Beer, M. A. Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10, e1003711 (2014).

    Article  Google Scholar 

  27. Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).

    Article  Google Scholar 

  28. Li, H., Quang, D. & Guan, Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res. 29, 281–292 (2019).

    Article  Google Scholar 

  29. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  Google Scholar 

  30. Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).

    Article  Google Scholar 

  31. Zhang, L., Xue, G., Liu, J., Li, Q. & Wang, Y. Revealing transcription factor and histone modification co-localization and dynamics across cell lines by integrating ChIP-seq and RNA-seq data. BMC Genomics 19, 914 (2018).

    Article  Google Scholar 

  32. Xin, B. & Rohs, R. Relationship between histone modifications and transcription factor binding is protein family specific. Genome Res. https://doi.org/10.1101/gr.220079.116 (2018).

  33. Liu, L., Jin, G. & Zhou, X. Modeling the relationship of epigenetic modifications to transcription factor binding. Nucleic Acids Res. 43, 3873–3885 (2015).

    Article  Google Scholar 

  34. Benveniste, D., Sonntag, H.-J., Sanguinetti, G. & Sproul, D. Transcription factor binding predicts histone modifications in human cell lines. Proc. Natl Acad. Sci. USA 111, 13367–13372 (2014).

    Article  Google Scholar 

  35. Ngo, V. et al. Epigenomic analysis reveals DNA motifs regulating histone modifications in human and mouse. Proc. Natl Acad. Sci. USA 116, 3668–3677 (2019).

    Article  Google Scholar 

  36. Wang, M. et al. Identification of DNA motifs that regulate DNA methylation. Nucleic Acids Res. 47, 6753–6768 (2019).

    Article  Google Scholar 

  37. Cochran, K. et al. Domain adaptive neural networks improve cross-species prediction of transcription factor binding. Genome Res. https://doi.org/10.1101/2021.02.13.431115 (2021).

  38. Chen, L., Fish, A. E. & Capra, J. A. Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties. PLoS Comput. Biol. 14, e1006484 (2018).

    Article  Google Scholar 

  39. Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).

    Article  MathSciNet  Google Scholar 

  40. Schreiber, J., Hegde, D. & Noble, W. Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics. Proc. 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (ACM, 2020).

  41. Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

    Article  Google Scholar 

  42. Li, H. & Guan, Y. DeepSleep convolutional neural network allows accurate and fast detection of sleep arousal. Commun Biol 4, 18 (2021).

    Article  Google Scholar 

  43. Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).

    Article  Google Scholar 

  44. Lizio, M. et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 16, 22 (2015).

    Article  Google Scholar 

  45. Li, H. GuanLab/Ocelot: The 1st Release (Zenodo, 2022); https://doi.org/10.5281/zenodo.5847578

Download references

Acknowledgements

This work is supported by NIH/NIGMS R35GM133346 and NSF/DBI (grant no.1452656) to Y.G.

Author information

Authors and Affiliations

Authors

Contributions

H.L. and Y.G. conceived and designed this project. H.L. implemented the method, performed the experiments and prepared the manuscript.

Corresponding authors

Correspondence to Hongyang Li or Yuanfang Guan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–38 and captions for Tables 1–15.

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–15.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Guan, Y. Asymmetric predictive relationships across histone modifications. Nat Mach Intell 4, 288–299 (2022). https://doi.org/10.1038/s42256-022-00455-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-022-00455-x

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing