CellContrast: Reconstructing spatial relationships in single-cell RNA sequencing data via deep contrastive learning

Summary A vast amount of single-cell RNA sequencing (SC) data have been accumulated via various studies and consortiums, but the lack of spatial information limits its analysis of complex biological activities. To bridge this gap, we introduce CellContrast, a computational method for reconstructing spatial relationships among SC cells from spatial transcriptomics (ST) reference. By adopting a contrastive learning framework and training with ST data, CellContrast projects gene expressions into a hidden space where proximate cells share similar representation values. We performed extensive benchmarking on diverse platforms, including SeqFISH, Stereo-seq, 10X Visium, and MERSCOPE, on mouse embryo and human breast cells. The results reveal that CellContrast substantially outperforms other related methods, facilitating accurate spatial reconstruction of SC. We further demonstrate CellContrast’s utility by applying it to cell-type co-localization and cell-cell communication analysis with real-world SC samples, proving the recovered cell locations empower more discoveries and mitigate potential false positives.

Figure S8.Spatial reconstruction results for L2 of embryo 1, 2, 3 using Stereo-Seq data as the reference.Noted that for SageNet produces predicted pairwise cell-cell distance instead of coordinates, we generate the 2D distribution of cells using umap based on the distance matrix.We also combined our latent representation of cells as the input to NovoSpaRc, marked as "ours-NovoSparc".It is an effective way for 2D visualization but might not be the optimal for down-stream tasks.When the gap between the ST and SC is large, we recommend using our estimated SC-SC distances.Visium.Noted that for SageNet produces predicted pairwise cell-cell distance instead of coordinates, we generate the 2D distribution of cells using umap based on the distance matrix.We also combined our latent representation of cells as the input to NovoSpaRc, marked as "ours-NovoSpaRc".It is an effective way for 2D visualization but might not be the optimal for down-stream tasks.When the gap between the ST and SC is large, we recommend using our estimated SC-SC distances.

Figure S1 .
Figure S1.Evaluation of neighbor reconstruction in mouse gastrulation cells using SeqFISH ST reference.A, Distribution of hit number for reconstructed nearest 20 cell neighbors.B, Comparison of JSD distribution between our method and null distribution (randomly shuffled locations of the testing sample).

Figure S2 .
Figure S2.Analysis of cell type distribution and JSD in SeqFISH training and testing datasets.A, the cell type distribution for the training (embryo1 L1) and 3 testing datasets (L2 of embryo 1,2,3).B, The average JSD distribution for the four imbalanced cell-types between training and the testing sample embryo3 L2.

Figure S3 .
Figure S3.Spatial reconstruction results for L2 of embryo1,2,3 using embryo1 L2 of SeqFISH data as the reference.Noted that for SageNet produces pairwise cell-cell distance instead of coordinates, we generate the 2D distribution of cells using UMAP based on the distance matrix.

Figure S4 .
Figure S4.Spatial reconstruction of 3 query seqFISH samples mapped to reference locations of embryo1 L1.Cells are colored based on their cell type, and the transparency of each cell is determined by CellContrast's confidence score.

Figure S5 .
Figure S5.Distribution of mapping confidence scores for cell types.The three query seqFISH samples are mapped to the reference locations of embryo1 L1 by CellContrast's SC-ST mapping.

Figure S6 .
Figure S6.Evaluation of neighbor reconstruction in mouse gastrulation cells using array-based ST reference.A. Average hit number with varying k nearest neighbors for 3 testing samples: L2 of embryo 1,2, 3. B. Distribution of hit number for reconstructed nearest 20 cell neighbors.C. Comparison of JSD distribution between our method and null distribution (randomly shuffled locations of the testing sample).

Figure S7 .
Figure S7.Evaluation of local neighborhood cell-type heterogeneity in mouse gastrulation cells using array-based ST reference.A, Jessen-Shannon distance of cell types for testing dataset embryo1 L2.B, Jessen-Shannon distance of cell types for testing dataset embryo2 L2.C, Jessen-Shannon distance of cell types for testing dataset embryo3 L2.

Figure S9 .
Figure S9.The local reconstruction performance comparison with Null distribution for human breast 10X Visium data.A, Distribution of hit number for reconstructed the nearest 20 cell neighbors.B, Comparison of JSD distribution between our method and null distribution (randomly shuffled locations of the testing sample).

Figure
Figure S10.Spatial reconstruction results for results for human lung sample generated from 10X Visium.Noted that for SageNet produces predicted pairwise cell-cell distance instead of coordinates, we generate the 2D distribution of cells using umap based on the distance matrix.We also combined our latent representation of cells as the input to NovoSpaRc, marked as "ours-NovoSpaRc".It is an effective way for 2D visualization but might not be the optimal for down-stream tasks.When the gap between the ST and SC is large, we recommend using our estimated SC-SC distances.

Figure
Figure S11.Evaluation of neighbor reconstruction in mouse brain cells generated by MERCOPE.Noted that CytoSpace was excluded from the analysis due to the absence of cell type annotations, as cell type information is a required parameter for its single-cell mode.

Figure S12 .
Figure S12.Local reconstruction performance for human lung samples generated by 10X Visium.Noted that CytoSpace was excluded from the analysis due to the absence of cell type annotations, as cell type information is a required parameter.A, Evaluation of cell neighbor reconstruction by average neighbor hit within varying k nearest neighbors.B, Distribution of neighbor hit within nearest 20 neighbors.

Figure S13 .
Figure S13.Detected co-localization proportions in the SC Sample vs. Reference Patterns.The SC Sample was spatially reconstructed for the S37 from the mouse scRNA-seq atlas, which was at the same developmental stage (E8.5) with the 6 reference ST samples.

Figure S14 .
Figure S14.Distances of the cell-type pairs, that were calculated between all cells belonging to different cell types, using the median value to represent these cell-type distances.A, The predicted pairwise distances of cell-types for spatially reconstructed scRNA sample.B, The average pairwise distances between cell types for the five ST reference samples.

Figure S15 .
Figure S15.Evaluation of cell neighbor identification with multiple ST training samples.

Figure S16 .
Figure S16.Analysis of spatially related genes.A, the distribution of average rate of representation changes for all genes.B, the rate of representation changes for the En1 gene in the Embryo1 L1 of SeqFISH data.Each dot in the plot represents an individual cell, with higher values indicating a greater contribution of the En1 gene to the spatially related representation of that cell.C, the rate of representation changes for the Hoxb1 gene in the Embryo1 L1 of SeqFISH data.

Figure S17 .
Figure S17.Impact of Training Epochs on Spatial Reconstruction Performance.This experiment was conducted for the evaluation of training epochs in the context of spatial reconstruction for mouse gastrulation cells using single-cell ST reference (as shown in Figure 2A of the main text).

Table S1 .
Average spearman's rank correlation coefficient on the mouse right brain cells derived by MERSCOPE.

Table S2 .
Average spearman's rank correlation coefficient on the human lung samples derived by 10X Visium.

Table S3 .
Contingency table of detected cell-type co-localizations in reference datasets and SC sample that spatially reconstructed using our method.

Table S4 .
Contingency table of detected cell-type co-localizations in reference datasets and SC sample that spatially reconstructed using CeLEry.

Table S5 .
Contingency table of detected cell-type co-localizations in reference datasets and SC sample that spatially reconstructed using SageNet.

Table S6 .
Average spearman's rank correlation coefficient for all benchmarking scenarios by setting the m as 21 (Eq. 3 in the Methods).

Table S7 .
Average JSD (k=20) and Spearman's correlation coefficient on using 3 samples, including L1 of embryo1,2,3 as the reference for spatial reconstruction of mouse gastrulation cells.

Table S8 .
Runtime and memory usage for fitting CellContrast model.Note: the training epoch is 3000.All experiments were conducted on a machine with 24 cores using SeqFISH embryo1 L1 (10,150cells) as the training sample.