Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses

Abstract

Whole-transcriptome spatial profiling of genes at single-cell resolution remains a challenge. To address this limitation, spatial gene expression prediction methods have been developed to infer the spatial expression of unmeasured transcripts, but the quality of these predictions can vary greatly. Here we present Transcript Imputation with Spatial Single-cell Uncertainty Estimation (TISSUE) as a general framework for estimating uncertainty for spatial gene expression predictions and providing uncertainty-aware methods for downstream inference. Leveraging conformal inference, TISSUE provides well-calibrated prediction intervals for predicted expression values across 11 benchmark datasets. Moreover, it consistently reduces the false discovery rate for differential gene expression analysis, improves clustering and visualization of predicted spatial transcriptomics and improves the performance of supervised learning models trained on predicted gene expression profiles. Applying TISSUE to a MERFISH spatial transcriptomics dataset of the adult mouse subventricular zone, we identified subtypes within the neural stem cell lineage and developed subtype-specific regional classifiers.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Cell-centric variability and calibration scores for conformal inference.
Fig. 2: Prediction intervals for spatial gene expression.
Fig. 3: Uncertainty-aware differential gene expression analysis with TISSUE.
Fig. 4: Uncertainty-aware supervised learning, clustering and visualization.
Fig. 5: TISSUE discovers subtypes in neural stem cell lineage of the SVZ.

Similar content being viewed by others

Data availability

All processed spatial transcriptomics and RNA-seq dataset pairings, including the final annotated adult mouse SVZ MERFISH dataset, have been deposited at https://doi.org/10.5281/zenodo.8259942. Other data files (raw images and large intermediate data files) can be provided upon reasonable request. Raw data were accessed from existing benchmark datasets7 and are also available from the following studies:

Mouse hippocampus: Spatial transcriptomics (seqFISH) at https://content.cruk.cam.ac.uk/jmlab/SpatialMouseAtlas2020/; RNA-seq (10x Chromium) at GSE158450 in the Gene Expression Omnibus (GEO) for ‘HIPP_sc_Rep1_10X sample’.

Mouse primary visual cortex: Spatial transcriptomics (MERFISH) at https://github.com/spacetx-spacejam/data; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq for mouse primary visual cortex.

Mouse prefrontal cortex: Spatial transcriptomics (STARmap) at ‘20180419_BZ9_control’ in https://www.starmapresources.com/data; RNA-seq (10x Chromium) at GSE158450 in the GEO for ‘PFC_sc_Rep2_10X’.

Human middle temporal gyrus: Spatial transcriptomics (ISS) at https://github.com/spacetx-spacejam/data; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-smart-seq.

Mouse primary visual cortex: Spatial transcriptomics (ISS) at https://github.com/spacetx-spacejam/data; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq for mouse primary visual cortex.

Drosophila embryo: Spatial transcriptomics (FISH) at https://github.com/rajewsky-lab/distmap/; RNA-seq (Drop-seq) at GSE95025 in GEO.

Mouse somatosensory cortex: Spatial transcriptomics (osmFISH) at http://linnarssonlab.org/osmFISH/ for cortical region subset; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-smart-seq for mouse somatosensory cortex.

Mouse primary visual cortex: Spatial transcriptomics (ExSeq) at https://github.com/spacetx-spacejam/data; RNA-seq (Smart-seq) at https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq for mouse primary visual cortex.

Mouse gastrulation: Spatial transcriptomics (seqFISH) at https://content.cruk.cam.ac.uk/jmlab/SpatialMouseAtlas2020/; RNA-seq (10x Chromium) ‘Sample 21’ in the MouseGastrulationData R package.

Human U2OS: Spatial transcriptomics (MERFISH) at https://www.pnas.org/doi/suppl/10.1073/pnas.1912459116/suppl_file/pnas.1912459116.sd12.csv; RNA-seq (10x Chromium) at ’BC22’ in GSE152048 in the GEO database.

Axolotl brain: Spatial transcriptomics (Stereo-seq) at ‘Stage44.h5ad’ in https://db.cngb.org/stomics/artista/download/; RNA-seq (10x Chromium) at ‘animal1’ in ‘all_nuclei_clustered_highlevel_anno.rds’ at https://zenodo.org/records/6390083.

Code availability

The TISSUE Python package and associated code and documentation are available at https://github.com/sunericd/TISSUE/, and all code for generating figures and analyses is separately available at https://github.com/sunericd/tissue-figures-and-analyses/.

References

  1. Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Asp, M. et al. A spatiotemporal organ-wide gene expression and cell atlas of the developing human heart. Cell 179, 1647–1660 (2019).

    Article  Google Scholar 

  3. Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).

    Article  CAS  PubMed  Google Scholar 

  4. Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514 (2020).

    Article  Google Scholar 

  5. Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods https://doi.org/10.1038/s41592-022-01409-2 (2022).

  6. Wei, R. et al. Spatial charting of single-cell transcriptomes in tissues. Nat. Biotechnol. 40, 1190–1199 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat Methods 19, 662–670 (2022).

    Article  CAS  PubMed  Google Scholar 

  8. Abdelaal, T., Mourragui, S., Mahfouz, A. & Reinders, M. J. T. SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res. 48, e107 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Shengquan, C., Boheng, Z., Xiaoyang, C., Xuegong, Z. & Rui, J. stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics 37, i299–i307 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Allen, W. E., Blosser, T. R., Sullivan, Z. A., Dulac, C. & Zhuang, X. Molecular and spatial signatures of mouse brain aging at single-cell resolution. Cell 186, 194–208(2023).

    Article  Google Scholar 

  11. Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).

    Article  Google Scholar 

  12. Lopez, R. et al. A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements. ICML Workshop on Computational Biology (2019).

  13. Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Vahid, M. R. et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01697-9 (2023).

  15. Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 11, 2084 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  16. Moriel, N. et al. NovoSpaRc: flexible spatial reconstruction of single-cell gene expression with optimal transport. Nat. Protoc. 16, 4177–4200 (2021).

    Article  CAS  PubMed  Google Scholar 

  17. Mourragui, S., Loog, M., van de Wiel, M. A., Reinders, M. J. T. & Wessels, L. F. A. PRECISE: a domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors. Bioinformatics 35, i510–i519 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M. & Cai, L. Single-cell in situ RNA profiling by sequential hybridization. Nat. Methods 11, 360–361 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods 10, 857–860 (2013).

    Article  CAS  PubMed  Google Scholar 

  22. Langer-Safer, P. R., Levine, M. & Ward, D. C. Immunological method for mapping genes on Drosophila polytene chromosomes. Proc. Natl Acad. Sci. USA 79, 4381–4385 (1982).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Codeluppi, S. et al. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat. Methods 15, 932–935 (2018).

    Article  CAS  PubMed  Google Scholar 

  24. Alon, S. et al. Expansion sequencing: spatially precise in situ transcriptomics in intact biological systems. Science 371, eaax2656 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Wei, X. et al. Single-cell Stereo-seq reveals induced progenitor cells involved in axolotl brain regeneration. Science 377, eabp9444 (2022).

    Article  CAS  PubMed  Google Scholar 

  26. Long, B., Miller, J. & Consortium, T. S. SpaceTx: a roadmap for benchmarking spatial transcriptomics exploration of the brain. Preprint at http://arxiv.org/abs/2301.08436 (2023).

  27. Joglekar, A. et al. A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain. Nat. Commun. 12, 463 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  28. Booeshaghi, A. S. et al. Isoform cell-type specificity in the mouse primary motor cortex. Nature 598, 195–199 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  29. Gyllborg, D. et al. Hybridization-based in situ sequencing (HybISS) for spatially resolved transcriptomics in human and mouse brain tissue. Nucleic Acids Res. 48, e112 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Karaiskos, N. et al. The Drosophila embryo at single-cell transcriptome resolution. Science 358, 194–199 (2017).

    Article  ADS  CAS  PubMed  Google Scholar 

  31. Nitzan, M., Karaiskos, N., Friedman, N. & Rajewsky, N. Gene expression cartography. Nature 576, 132–137 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  32. Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Yao, Z. et al. A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell 184, 3222–3241 (2021).

    Article  Google Scholar 

  34. Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  35. Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lohoff, T. et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 40, 74–85 (2022).

    Article  CAS  PubMed  Google Scholar 

  37. Lust, K. et al. Single-cell analyses of axolotl telencephalon organization, neurogenesis, and regeneration. Science 377, eabp9262 (2022).

    Article  CAS  PubMed  Google Scholar 

  38. Zhou, Y. et al. Single-cell RNA landscape of intratumoral heterogeneity and immunosuppressive microenvironment in advanced osteosarcoma. Nat. Commun. 11, 6322 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  39. Xia, C., Fan, J., Emanuel, G., Hao, J. & Zhuang, X. Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression. Proc. Natl Acad. Sci. USA 116, 19490–19499 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  40. Angelopoulos, A. N. & Bates, S. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. Preprint at http://arxiv.org/abs/2107.07511 (2022).

  41. Shafer, G. & Vovk, V. A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008).

    MathSciNet  Google Scholar 

  42. Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J. & Wasserman, L. Distribution-free predictive inference for regression. J. Am. Stat. Assoc. 113, 1094–1111 (2018).

    Article  MathSciNet  CAS  Google Scholar 

  43. Wieslander, H. et al. Deep learning with conformal prediction for hierarchical analysis of large-scale whole-slide tissue images. IEEE J. Biomed. Health Informatics 25, 371–380 (2021).

    Article  Google Scholar 

  44. Alvarsson, J., Arvidsson McShane, S., Norinder, U. & Spjuth, O. Predicting with confidence: using conformal prediction in drug discovery. J. Pharm. Sci. 110, 42–49 (2021).

    Article  CAS  PubMed  Google Scholar 

  45. Jin, Y., Ren, Z. & Candès, E. J. Sensitivity analysis of individual treatment effects: a robust conformal inference approach. Proc. Natl Acad. Sci. USA 120, e2214889120 (2023).

    Article  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  46. Wang, Y. et al. Sprod for de-noising spatially resolved transcriptomics data based on position and image information. Nat. Methods 19, 950–958 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Palmer, C. & Pe’er, I. Bias characterization in probabilistic genotype data and improved signal detection with multiple imputation. PLoS Genet. 12, e1006091 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Allison, P. D. Missing Data https://methods.sagepub.com/book/missing-data (SAGE Publications, 2002).

  49. Little, R. J. A. & Rubin, D. B. Bayes and Multiple Imputation. In Statistical Analysis with Missing Data (eds Little, R. J. A. & Rubin, D. B.) 200–220 (John Wiley & Sons, Inc., 2002); https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119013563.ch10

  50. Licht, C. New methods for generating significance levels from multiply-imputed data. Ph.D. thesis, Otto-Friedrich-Universität Bamberg, Fakultät Sozial- und Wirtschaftswissenschaften https://fis.uni-bamberg.de/handle/uniba/263 (2010).

  51. Zhu, J., Shang, L. & Zhou, X. SRTsim: spatial pattern preserving simulations for spatially resolved transcriptomics. Genome Biol. 24, 39 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially variable genes. Nat. Methods 15, 343–346 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Yang, C. B., Kiser, P. J., Zheng, Y. T., Varoqueaux, F. & Mower, G. D. Bidirectional regulation of Munc13-3 protein expression by age and dark rearing during the critical period in mouse visual cortex. Neuroscience 150, 603–608 (2007).

    Article  CAS  PubMed  Google Scholar 

  54. Miller, J. A., Woltjer, R. L., Goodenbour, J. M., Horvath, S. & Geschwind, D. H. Genes and pathways underlying regional and cell type changes in Alzheimer’s disease. Genome Med. 5, 48 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Artegiani, B. et al. A single-cell RNA sequencing study reveals cellular and molecular dynamics of the hippocampal neurogenic niche. Cell Rep. 21, 3271–3284 (2017).

    Article  CAS  PubMed  Google Scholar 

  56. Siddiqui, T. J. et al. An LRRTM4-HSPG complex mediates excitatory synapse development on dentate gyrus granule cells. Neuron 79, 680–695 (2013).

    Article  CAS  PubMed  Google Scholar 

  57. Buckley, M. T. et al. Cell-type-specific aging clocks to quantify aging and rejuvenation in neurogenic regions of the brain. Nat. Aging 3, 121–137 (2023).

    Article  PubMed  Google Scholar 

  58. Scialdone, A. et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).

    Article  CAS  PubMed  Google Scholar 

  59. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article  CAS  PubMed  Google Scholar 

  60. Sun, E. D., Ma, R. & Zou, J. Dynamic visualization of high-dimensional data. Nat. Comput. Sci. 3, 86–100 (2023).

    Article  PubMed  Google Scholar 

  61. Delchambre, L. Weighted principal component analysis: a weighted covariance eigendecomposition approach. Mon. Not. R. Astron. Soc. 446, 3545–3555 (2015).

    Article  ADS  Google Scholar 

  62. Navarro Negredo, P., Yeo, R. W. & Brunet, A. Aging and rejuvenation of neural stem cells and their niches. Cell Stem Cell 27, 202–223 (2020).

    Article  CAS  PubMed  Google Scholar 

  63. Doetsch, F. A niche for adult neural stem cells. Curr. Opin. Genet. Dev. 13, 543–550 (2003).

    Article  CAS  PubMed  Google Scholar 

  64. Alvarez-Buylla, A. & Garcıia-Verdugo, J. M. Neurogenesis in adult subventricular zone. J. Neurosci. 22, 629–634 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Dulken, B. W. et al. Single-cell analysis reveals T cell infiltration in old neurogenic niches. Nature 571, 205–210 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Liu, L. et al. Exercise reprograms the inflammatory landscape of multiple stem cell compartments during mammalian aging. Cell Stem Cell 30, 689–705 (2023).

    Article  Google Scholar 

  67. Cebrian-Silla, A. et al. Single-cell analysis of the ventricular-subventricular zone reveals signatures of dorsal and ventral adult neurogenesis. eLife 10, e67436 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Chaker, Z., Codega, P. & Doetsch, F. A mosaic world: puzzles revealed by adult neural stem cell heterogeneity. Wiley Interdiscip. Rev. Dev. Biol. 5, 640–658 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  70. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).

    Article  Google Scholar 

  71. Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272–282 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Marshall, A., Altman, D. G., Holder, R. L. & Royston, P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med. Res. Methodol. 9, 57 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Funding support was provided by Knight-Hennessy Scholars program (to E.D.S.), Paul and Daisy Soros Fellowship for New Americans (to E.D.S.), the National Science Foundation Graduate Research Fellowship Program (to E.D.S.), D. Donoho at Stanford University (to R.M.), National Institutes of Health P01AG036695 (to A.B.), NSF CAREER 1942926 (to J.Z.), National Institutes of Health P30AG059307 (to J.Z.), 5RM1HG010023 (to J.Z.) and grants from the Silicon Valley Foundation (to J.Z.) and the Chan Zuckerberg Initiative (to J.Z.). We thank L. Xu, O. Zhou and M. Yuksekgonul for helpful discussions.

Author information

Authors and Affiliations

Authors

Contributions

E.D.S. and J.Z. conceived of the study. E.D.S. designed and implemented the method and ran all associated analyses with J.Z. and R.M providing input. P.N.N. and A.B. provided samples for the mouse SVZ MERFISH dataset and input on associated analyses. E.D.S. prepared a draft of the paper. R.M., P.N.N., A.B. and J.Z. edited the paper.

Corresponding author

Correspondence to James Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Nancy Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of datasets and prediction performance.

a, Visualization of cells in the eleven spatial transcriptomics datasets colored by the expression of the highest-expressed gene in each respective dataset. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), prefrontal cortex (PC), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.), U-2 OS cell line (U2OS). b,c, Performance of all three gene prediction methods (Harmony, SpaGE, Tangram) on all datasets as measured by (b) gene-wise mean absolute error between predicted and actual gene expression over 10-fold cross-validation, and (c) gene-wise Pearson correlation between predicted and actual gene expression over 10-fold cross-validation. Shown also are the number of cells (n) in the spatial transcriptomics datasets and the number of genes (p) shared between spatial and RNAseq datasets. In panels b-c, the inner box corresponds to quartiles of the metrics and the whiskers span up to 1.5 times the interquartile range of the metrics.

Extended Data Fig. 2 Evidence of gene expression similarity between spatial neighbors.

a, Cosine similarity of gene expression profiles for 250 cells paired with all their neighbors in the TISSUE spatial graph compared to pairings with randomly drawn cells across all eleven spatial transcriptomics datasets. The boxplot corresponds to the quartiles of the cosine similarity measurements. The center line corresponds to median cosine similarity, which was strictly higher in the neighbor-paired comparisons than the random-paired comparisons across all datasets. Whiskers span up to 1.5 times the interquartile range of the metrics and values outside this range are shown as dots. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), prefrontal cortex (PC), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.), U-2 OS cell line (U2OS). b, Scatter plots of the cosine similarities of gene expression profiles for 250 cells paired with their neighbors for either the training gene set or the test gene set determined by random train-test split of all genes (50% train, 50% test). Shown are cosine similarity pairs for 10 train-test splits for the two benchmark spatial transcriptomics datasets with the most measured genes.

Extended Data Fig. 3 Cell-centric variability and calibration score distributions for individual datasets and prediction methods.

a, Pearson correlation of all cell-centric variability measures obtained for different numbers of neighbors in building the TISSUE spatial graph compared to the default setting of 15 neighbors. b, Correlation of cell-centric variability and absolute prediction error shown individually for each dataset and prediction method combination computed over 10-fold cross-validation. Log density with added pseudocount (Log1p) is shown by color, with a maximum of 1000 cells and 300 genes sampled from each dataset to provide more uniform representation. c, Histograms showing the distribution of Pearson correlations between either gene-wise or cell-wise similarities of prediction errors and similarities of predicted expression values across all spatial transcriptomic datasets and across all prediction methods. d, Distribution of TISSUE calibration scores shown individually for each dataset and prediction method combination ((kg, kc) = (4, 1)). Details on each dataset and prediction method can be found in Methods. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), prefrontal cortex (PC), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.), U-2 OS cell line (U2OS).

Extended Data Fig. 4 Further evaluation of TISSUE prediction intervals.

a-c, Correlation plots across all dataset and prediction method combinations computed over 10-fold cross-validation for (a) the 67% prediction interval width and absolute prediction error, both normalized by the absolute value of the predicted expression; (b) 50% prediction interval width and absolute prediction error; (c) 80% prediction interval width and absolute prediction error. Log density with added pseudocount (Log1p) is shown by color, with a maximum of 1000 cells and 300 genes sampled from each dataset to provide more uniform representation. d, Gene-level calibration curves for TISSUE prediction intervals showing empirical coverage as a function of the specified confidence level across 10-fold cross-validation. Each line corresponds to an independent gene in the spatial transcriptomics dataset. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), prefrontal cortex (PC), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.), U-2 OS cell line (U2OS). e,f, Calibration curves for TISSUE prediction intervals showing empirical coverage as a function of the specified confidence level across 10-fold cross-validation (e) under automated setting of (kg, kc) for stratified grouping; and (f) for two technical replicates of the mouse gastrulation seqFISH dataset with (kg, kc) = (4, 1). The calibration error is annotated for each prediction method (see Methods). g, Calibration curves for TISSUE prediction intervals showing empirical coverage as a function of the specified confidence level across 10-fold cross-validation for the mouse somatosensory cortex osmFISH dataset with different combinations of Sprod de-noising or Sprod-based spatial similarity graph instead of the TISSUE spatial neighbors graph. The calibration error is annotated for each prediction method (see Methods). h, Correlation plot of 67% prediction interval width with TISSUE spatial neighbors graph with cosine similarity weighting and 67% prediction interval width with Sprod similarity graph and weighting for the mouse somatosensory cortex osmFISH dataset and all prediction methods computed over 10-fold cross-validation.

Extended Data Fig. 5 Additional differential gene expression analysis with TISSUE.

a, False discovery rate of differentially expressed genes between cell type or anatomic region labels (one versus all approach) using the differentially expressed genes on the measured gene expression profiles as the ground truth across different p-value cutoffs. P-values were computed using two-sided t-test. Discoveries are assessed across all genes for all class labels. Shown are results for all three prediction methods and all spatial transcriptomics datasets with cell type or region labels available. All calibration scores were generated with (kg, kc) = (4, 1) settings for stratified grouping. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.). b, False discovery rate of differentially expressed genes between cell type or anatomic region labels (one versus all approach) as a function of the number of discoveries and with automated stratified grouping. c, False discovery rate of differentially expressed genes between cell type or anatomic region labels (one versus all approach) as a function of the number of discoveries and with (kg, kc) = (4, 1) settings for stratified grouping for the alternative TISSUE multiple imputation framework using the ‘greater than’ one-sided Wilcoxon/Mann-Whitney test. d, False discovery rate of spatially variable genes as a function of the number of discoveries and with (kg, kc) = (4, 1) settings for stratified grouping for the alternative TISSUE multiple imputation framework using the SpatialDE test. e, Correlation plot of the log p-values obtained from the TISSUE multiple imputation t-test framework between two technical replicates of the mouse gastrulation seqFISH dataset.

Extended Data Fig. 6 Additional experiments for uncertainty-aware supervised learning, clustering, and visualization.

a-c, Downstream task performance metrics on the three most prominent anatomic region class labels for the mouse somatosensory osmFISH dataset. Shown are metrics for all three prediction methods with automated stratified grouping settings. P-value was computed using a paired two-sided t-test on n = 3 independent prediction methods. The box corresponds to quartiles of the metrics and the whiskers span up to 1.5 times the interquartile range of the metrics. (a) Accuracy, F1 score, and ROC-AUC (receiver-operator characteristic area under the curve) metrics for logistic regression models trained on the predicted gene expression, TISSUE-filtered predicted gene expression, or measured gene expression for classification. (b) Adjusted Rand index (ARI) for k-means clustering (k = 3) on the top 15 principal components obtained from the predicted gene expression, TISSUE-filtered predicted gene expression, or measured gene expression for classification. (c) Linear separability measured as classification accuracy of linear kernel support vector classifier fitted on the top 15 principal components obtained from the predicted gene expression, TISSUE-filtered predicted gene expression, or measured gene expression for classification. d, Average improvement of performance metrics using TISSUE-filtered approach in lieu of unfiltered approach on predicted expression for supervised learning (Accuracy, F1, ROC-AUC), clustering (adjusted Rand index (ARI)), and visualization (linear separability) for the top three classes across all dataset and class label combinations. Results were obtained using the 50% prediction interval width for filtering. Abbreviations are as follows: hippocampus (Hipp.) primary visual cortex (VISP), middle temporal gyrus (MTG), somatosensory cortex (SC), gastrulation (Gast.). Asterisks denote significant difference in performance metrics between TISSUE-filtered approach and unfiltered approach (p<0.05) with p-values computed using a paired two-sided t-test on n=3 independent prediction methods. e, Same as panel d except with the 80% prediction interval width for filtering.

Extended Data Fig. 7 Uncertainty-aware clustering and label separation with TISSUE-WPCA.

a, Schematic illustration of the weighted principal component analysis (WPCA) pipeline where the inverse TISSUE prediction interval width is used to obtain principal components from WPCA, which are then used for downstream tasks of clustering and label separation. b, Linear separability measured as the binary classification accuracy of a linear kernel support vector classifier fitted on the two cell clusters in the simulated spatial transcriptomics data as a function of the simulated mix-in proportion. The classifier was trained on the top 15 principal components obtained from the measured gene expression profiles with PCA, predicted gene expression profiles with PCA, and predicted gene expression profiles with TISSUE-WPCA. For TISSUE-WPCA, weights were determined by binarizing the inverse normalized 67% prediction interval width (see Methods). Results were obtained using automated stratified grouping. Bands represent the interquartile range and solid line denotes the median linear separability across 20 simulated datasets. c, Same as in panel b except with TISSUE-WPCA weighting using the log-transformed inverse normalized 67% prediction interval width. d, Adjusted Rand index (ARI) for k-means clustering (k = 3) on the top 15 principal components obtained from PCA on the predicted expression or TISSUE-WPCA on the predicted gene expression for six real spatial transcriptomics dataset and label pairings and all prediction methods. P-value was computed using a paired two-sided t-test on n=18 sets of predictions across 3 independent prediction methods and 6 independent dataset and class label combinations. The box corresponds to quartiles of the metrics and the whiskers span up to 1.5 times the interquartile range of the metrics.

Extended Data Fig. 8 TISSUE is necessary to identify ambiguous NSC lineage subtype.

a, Heatmap of the scaled log-normalized gene expression of original cell type markers in the adult mouse subventricular zone MERFISH dataset for each of the identified cell type clusters. The Ambiguous cell type cluster in the first row exhibits high expression of qNSC/astrocyte, aNSC/NPC, and neuroblast markers. b, Additional predicted marker genes for the second ambiguous subcluster are differentially expressed for all qNSC/astrocyte and aNSC/NPC markers under traditional hypothesis testing with two-sided t-test on the predicted gene expression (Predicted). With TISSUE multiple imputation two-sided t-test, there are substantially more aNSC/NPC markers that are differentially over-expressed in the ambiguous subcluster (TISSUE), permitting identification of this subcluster as an aNSC/NPC subtype cluster. P-values are shown for all predicted marker genes with significance threshold of Bonferroni-adjusted p < 0.1 for either two-sided t-test or TISSUE multiple imputation two-sided t-test. c, Table indicating whether each of the three cell subtypes of the NSC lineage could be resolved from predicted marker genes using baseline or TISSUE-based approaches. Green checks indicate successful identification of cell subtype and red crosses indicate unsuccessful identification of cell subtype. d, Relative proportion of each of the three TISSUE-identified subtypes in the neural stem cell lineage cluster for either the left or right lateral ventricle. e, Relative proportions of aNSC/NPC and neuroblast populations across the MERFISH dataset and three single-cell RNAseq datasets of the mouse subventricular zone. The qNSC/astrocyte proportions were not compared since they were aggregated with astrocytes of the striatum in the single-cell RNAseq datasets. f, Spatial visualization of the cells in the neural stem cell lineage cluster colored by dorsal or ventral spatial location labels. g, Dorsal versus ventral classification performance of TISSUE-filtered penalized logistic regression models and baseline unfiltered penalized logistic regression models evaluated using 10-fold cross-validation across F1 score, accuracy, area under the receiver-operator curve, and average precision.

Extended Data Fig. 9 Computational runtime for TISSUE.

a, Bar plots of total runtimes for spatial gene expression prediction computations over 10 predictions to generate estimated predictions on all calibration genes. Bars denote the mean runtime across 10 instances of TISSUE prediction and each dot represents the runtime for one instance of generating TISSUE predictions using 10-fold cross-validation. b, Bar plots of total runtimes for TISSUE prediction interval calculation including computation of cell-centric variability and calibration score sets. Bars denote the mean runtime across 10 instances of TISSUE prediction interval calculation and each dot represents the runtime for one instance of TISSUE prediction interval calculation.

Supplementary information

Supplementary Information

Supplementary Fig. 1.

Reporting Summary

Peer Review File

Supplementary Table 1

Overview of dataset pairings between spatial transcriptomics and RNA-seq used for TISSUE evaluation.

Supplementary Table 2

Downstream analysis benchmarking performances of TISSUE with different spatial gene expression prediction methods. The table is organized by groups of related downstream analysis benchmarking tasks (rows). The numbers at the end of task descriptions index unique data contexts (for example, dataset, dataset and label combination) within each group of tasks. TISSUE methods (with bold column titles) are compared to non-TISSUE methods (adjacent columns) and the superior performance (if any) is highlighted in green. Each cell in the table constitutes a unique benchmarking context (that is, imputation method, dataset, application and metric).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, E.D., Ma, R., Navarro Negredo, P. et al. TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses. Nat Methods 21, 444–454 (2024). https://doi.org/10.1038/s41592-024-02184-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-024-02184-y

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing