Reliable imputation of spatial transcriptomes with uncertainty estimation and spatial regularization

Summary Imputation of missing features in spatial transcriptomics is urgently needed due to technological limitations. However, most existing computational methods suffer from moderate accuracy and cannot estimate the reliability of the imputation. To fill this research gap, we introduce a computational model, TransImpute, that imputes the missing feature modality in spatial transcriptomics by mapping it from single-cell reference data. We derive a set of attributes that can accurately predict imputation uncertainty, enabling us to select reliably imputed genes. In addition, we introduce a spatial autocorrelation metric as a regularization to avoid overestimating spatial patterns. Multiple datasets from various platforms demonstrate that our approach significantly improves the reliability of downstream analyses in detecting spatial variable genes and interacting ligand-receptor pairs. Therefore, TransImpute offers a reliable approach to spatial analysis of missing features for both matched and unseen modalities, such as nascent RNAs.


Method
Table S1: The properties of TransImp in comparison to other methods.Note: (I) indicates implicit estimation of a full or low-rank mapping matrix.SpaGE and stPlus try to project both ST and SC datapoints into a common latent vector space, where similarity can be computed and hence the mapping between spots and cells.However, there is not an explicitly estimated mapping matrix as in Tangram or TransImp, where we indicated with (E).

Figure S1 :
Figure S1: Dual problem of imputing spatial transcriptomics data.A. Translate observed genes to unobserved genes.B. Translate cells to spots, as used in our method.

Figure S2 :
Figure S2: Cosine similarity score (y-axis) vs threshold (x-axis) of predicted variance (values are generally small after square).Similar plots can help determine a threshold for selecting genes based on the predicted uncertianty score for gene imputations.

Figure S3 :
Figure S3: Cellular transition grid of the Mouse Brain data set based on RNA velocity estimated at the single cell level.

Figure S4 :
Figure S4: Cellular transition grid of the Mouse Brain at both single-cell and spatial levels colored by four cell types: Glioblast, Neuroblast, Neuron and others.

Figure S5 :
Figure S5: Effect of latent dimension on model performance.

Figure S6 :Figure S7 :
Figure S6: Distributions of Moran's Is of observed and imputed ST data.The figure shows Moran's I measures of the observed and all the imputed ST genes.While observed Moran' Is have a mode close to zero (majority genes are less spatial highly variable), spaGE, stPlus and Tangram tend to exaggerate the spatial patterns in their imputations for most genes.By gradually strengthening spatial regularization (increase lambda), TransImp pushes the modes closer to that of the observed result, thus it provides a way towards inhibiting overestimation of spatial patterns.

Table S3 :
Performances of Agglomerative clustering with spatial adjacency matrices on SeqFISH raw and different subsets of imputed genes.
Note: Scores are computed agaist ground truth annotations provided in SeqFISH dataset.Numbers in brackets show the number of genes in the configuration.

Table S4 :
Significance test of TransImp(Top50%) imputation performance and other methods across different datasets.Note: Cosine scores of top-50% quality genes imputed by TransImpLR are compared with other methods with independent t tests and Mann-Whitney U tests.Results show that imputation quality can vary across genes, and that TransImp successfully selected high quality genes, which significantly outperform the full gene sets imputed by other methods, demonstrating its robustness.
Abbreviations: tstats: t statistics of independent t test; ttpval: p values of independent t test; ustats: statistics of Mann-Whitney U test; utpval: p values of Mann-Whitney U test

Table S6 :
Imputation performance measured in Cosine Similarity on SpatialScope Benchmark.