Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Fast, sensitive and accurate integration of single-cell data with Harmony

This article has been updated

Abstract

The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~106 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of Harmony algorithm.
Fig. 2: Quantitative assessment of dataset mixing and cell-type accuracy with cell-line datasets.
Fig. 3: Computational efficiency benchmarks. BBKNN, Scanorama, MNN Correct and MultiCCA are compared on five downsampled HCA datasets of increasing sizes.
Fig. 4: Fine-grained subpopulation identification in PBMCs across technologies.
Fig. 5: Integration of pancreatic islet cells by both donor and technology.
Fig. 6: Harmony integrates spatially resolved transcriptomic with dissociated scRNAseq datasets.

Data availability

All data analyzed in this article are publicly available through online sources. We included links to all data sources in Supplementary Table 8.

Code availability

Harmony and LISI are available as R packages on https://github.com/immunogenomics/harmony and https://github.com/immunogenomics/lisi. Scripts to reproduce results of the primary analyses will be made available on https://github.com/immunogenomics/harmony2019. Additionally, vignettes are included as Supplementary Notes. Supplementary Note 1 provides a detailed walkthrough of Harmony, connecting theoretical algorithm components to their code implementations. Supplementary Note 2 demonstrates the LISI metric and how to evaluate its statistical significance. Supplementary Note 1 uses Harmony with simulated datasets.

Change history

  • 26 August 2020

    In the supplementary information originally posted for this article, the Supplementary Results and Supplementary Notes 1–3 were missing. The error has been corrected online.

References

  1. Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protocols 13, 599–604 (2018).

    Article  CAS  Google Scholar 

  2. Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).

  3. Zhang, F. et al. Defining inflammatory cell states in rheumatoid arthritis joint synovial tissues by integrating single-cell transcriptomics and mass cytometry. Nat. Immunol. 20, 928–942 (2019).

  4. Arazi, A. et al. The immune cell landscape in kidneys of lupus nephritis patients. Nat. Immunol. 20, 902–914 (2019).

    Article  CAS  Google Scholar 

  5. Der, E. et al. Tubular cell and keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I IFN and fibrosis relevant pathways. Nat. Immunol. 20, 915–927 (2019).

    Article  CAS  Google Scholar 

  6. Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2017).

    Article  Google Scholar 

  7. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  Google Scholar 

  8. Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    Article  CAS  Google Scholar 

  9. Hie, B. L., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2018).

    Article  CAS  Google Scholar 

  10. Polanski, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics https://doi.org/10.1093/bioinformatics/btz625 (2019).

  11. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  Google Scholar 

  12. Li, B. et al. HCA Data Portal: census of immune cells (Human Cell Atlas, 2019).

  13. Segerstolpe, A. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).

    Article  CAS  Google Scholar 

  14. Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).

  15. Lawlor, N. et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. 27, 208–222 (2017).

    Article  CAS  Google Scholar 

  16. Grun, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).

    Article  CAS  Google Scholar 

  17. Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).

  18. Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  CAS  Google Scholar 

  19. Gao, T. et al. Pdx1 maintains β cell identity and function by repressing an α cell program. Cell Metab. 19, 259–271 (2014).

    Article  CAS  Google Scholar 

  20. Jia, S. et al. Insm1 cooperates with neurod1 and foxa2 to maintain mature pancreatic β-cell function. EMBO J. 34, 1417–1433 (2015).

    Article  CAS  Google Scholar 

  21. Sachdeva, M. M. et al. Pdx1 (MODY4) regulates pancreatic beta cell susceptibility to ER stress. Proc. Natl Acad. Sci. USA 106, 19090–19095 (2009).

    Article  Google Scholar 

  22. Katoh, M. C. et al. MafB is critical for glucagon production and secretion in mouse pancreatic α cells in vivo. Mol. Cell. Biol. 38, e00504–e00517 (2018).

    Article  Google Scholar 

  23. Liu, J. et al. Islet-1 regulates arx transcription during pancreatic islet α-cell development. J. Biol. Chem. 286, 15352–15360 (2011).

    Article  CAS  Google Scholar 

  24. Akiyama, M. et al. X-box binding protein 1 is essential for insulin regulation of pancreatic α-cell function. Diabetes 62, 2439–2449 (2013).

    Article  CAS  Google Scholar 

  25. Burcelin, R., Knauf, C. & Cani, P. D. Pancreatic alpha-cell dysfunction in diabetes. Diabetes Metab. 34, S49–S55 (2008).

    Article  CAS  Google Scholar 

  26. Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).

    Article  CAS  Google Scholar 

  27. Moffitt, J. R.et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).

  28. Moffitt, J. et al. Data from: Molecular, Spatial and Functional Single-cell Profiling of the Hypothalamic Preoptic Region (Dryad, Dataset, 2018); https://doi.org/10.5061/dryad.8t8s248

  29. Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).

    Article  CAS  Google Scholar 

  30. Close, J. et al. Satb1 is an activity-modulated transcription factor required for the terminal differentiation and connectivity of medial ganglionic eminence-derived cortical interneurons. J. Neurosci. 32, 17690–17705 (2012).

    Article  CAS  Google Scholar 

  31. Lein, E. S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007).

    Article  CAS  Google Scholar 

  32. Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expressionstudies by surrogate variable analysis. PloS Genet. 3, e161 (2007).

    Article  CAS  Google Scholar 

  33. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nature Protocols 7, 500–507 (2012).

    Article  CAS  Google Scholar 

  34. Mizoguchi, F. et al. Functionally distinct disease-associated fibroblast subsets in rheumatoid arthritis. Nat. Commun. 9, 789 (2018).

    Article  CAS  Google Scholar 

  35. Manno, G. L. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

    Article  CAS  Google Scholar 

  36. Mao, Q., Wang, L., Goodison, S. & Sun, Y. Dimensionality reduction via graph structure learning. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, 765–774 (ACM, 2015).

  37. Dhillon, I. S. & Modha, D. S. Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001).

    Article  Google Scholar 

  38. Jordan, M. I. & Jacobs, R. A. Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6, 181–214 (1994).

    Article  Google Scholar 

  39. Buttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).

    Article  CAS  Google Scholar 

  40. Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).

  41. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  Google Scholar 

  42. McInnes, L. & Healy, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).

  43. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).

    Article  CAS  Google Scholar 

  44. Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with bioconductor. F1000 Res. 5, 2122 (2016).

    Google Scholar 

  45. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008, P10008 (2008).

  46. Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).

    Article  Google Scholar 

  47. Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).

    Article  CAS  Google Scholar 

  48. The Gene Ontology Consortium. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 45, D331–D338 (2017).

    Article  CAS  Google Scholar 

  49. Ashburner, M. et al. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported in part by funding from the National Institutes of Health (grant nos. UH2AR067677 and U19AI111224 and no. 1R01AR063759 (to S.R.) and T32 AR007530-31 (to I.K.)). We thank members of the Raychaudhuri and Brenner labs for comments and discussion. I.K. and K.W. were funded as part of a collaborative research agreement with F. Hoffmann-La Roche Ltd (Basel, Switzerland), to S.R. and M.B.B.

Author information

Authors and Affiliations

Authors

Contributions

S.R. and I.K. conceived the research. I.K. led computational work under the guidance of S.R., assisted by N.M., P.L., J.F. and K.S. All authors participated in interpretation and writing the manuscript.

Corresponding author

Correspondence to Soumya Raychaudhuri.

Ethics declarations

Competing interests

I.K. does paid bioinformatics consulting through Brilyant LLC.

Additional information

Peer review information Nicole Rusk was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–19, Supplementary Results and Supplementary Notes 1–3.

Reporting Summary

Supplementary Software 1

Harmony R package. Software to perform Harmony integration analysis.

Supplementary Software 2

LISI R package. Software to compute the Local Inverse Simpson’s Index.

Supplementary Tables 1–8

Jurkat LISI, Time benchmark, Memory Benchmark, HCA LISI, PBMC LISI, Inhibitory, Excitatory, Data Sources.

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Korsunsky, I., Millard, N., Fan, J. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289–1296 (2019). https://doi.org/10.1038/s41592-019-0619-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-019-0619-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing