Cell type identification in spatial transcriptomics data can be improved by leveraging cell-type-informative paired tissue images using a Bayesian probabilistic model

Abstract Spatial transcriptomics technologies have recently emerged as a powerful tool for measuring spatially resolved gene expression directly in tissues sections, revealing cell types and their dysfunction in unprecedented detail. However, spatial transcriptomics technologies are limited in their ability to separate transcriptionally similar cell types and can suffer further difficulties identifying cell types in slide regions where transcript capture is low. Here, we describe a conceptually novel methodology that can computationally integrate spatial transcriptomics data with cell-type-informative paired tissue images, obtained from, for example, the reverse side of the same tissue section, to improve inferences of tissue cell type composition in spatial transcriptomics data. The underlying statistical approach is generalizable to any spatial transcriptomics protocol where informative paired tissue images can be obtained. We demonstrate a use case leveraging cell-type-specific immunofluorescence markers obtained on mouse brain tissue sections and a use case for leveraging the output of AI annotated H&E tissue images, which we used to markedly improve the identification of clinically relevant immune cell infiltration in breast cancer tissue. Thus, combining spatial transcriptomics data with paired tissue images has the potential to improve the identification of cell types and hence to improve the applications of spatial transcriptomics that rely on accurate cell type identification.


INTRODUCTION
In the last five years, sequencing-based spatial transcriptomics technologies (1)(2)(3)(4)(5) have emerged as a powerful tool to measure spatially resolved genome-wide gene expression directly within tissue sections, offering the potential to interrogate tissue biology in unprecedented detail (6,7). Novel computational methods have already begun to address several analytical challenges posed by these new data, with specific tools developed to identify spatially varying genes (8,9), spatial gene expression patterns (10,11) and cell-cell interactions (12,13). However, the most fundamental problem posed by spatial transcriptomics data--upon which almost all other applications of the data depend--is that of identifying the location and abundance of different cell types (herein referred to as 'cell type decomposition'). Several methods have already been developed for this task and generally function by leveraging the expression of a set of cell type-specific marker genes to infer the abundance of each cell type at each slide region (14)(15)(16)(17). However, due to statistical multicollinearity (18), cell type decomposition in spatial transcriptomics data will always struggle to differentiate between cell types that are transcriptionally similar. Additionally, low transcript capture, either across an entire slide or in specific regions, can completely impede accurate cell type decomposition.
Here, we present a conceptually novel computational methodology termed Guiding-Image Spatial Transcriptomics (GIST). This method improves cell type decomposition in spatial transcriptomics data by jointly leveraging gene expression data obtained from the spatial transcriptomics platform with cell-type-informative images, for example, AI annotated tissue images, or immunofluorescence (IF) stains for cell-type-specific marker proteins, which can be collected, for example, from the reverse side of a tissue section affixed to a spatial transcriptomics capture slide. In a particularly interesting use case, we ap-plied the method to integrate spatial transcriptomics data with deep learning-derived immune cell type annotations in breast cancer pathology slides, where we identified clinically relevant immune cell infiltration that was missed by an initial pathologist's manual annotation. However, the methodology is generalizable to any spatial transcriptomics platform where informative image-derived cell-type compositional estimates can be obtained. Thus, combining spatial transcriptomics and paired tissue images has the potential to improve all applications of spatial transcriptomics data that rely on the accurate annotation of cell types, such as estimating cell type specific differential expression (19).

Technical details of the GIST statistical model
The expression of gene i at each spatial transcriptomics mRNA capture spot j is assumed to be approximately a weighted sum of the average expression of that gene in each of the cell types captured by that spot. If our spatial transcriptomics data are arranged in a matrix Y, where the rows represent i = 1, . . . , m genes and the columns represent j = 1, . . . , n spots, then this relationship can be summarized by the following equation: where W is an m × p matrix of cell type specific gene expression signatures, approximating the average expression of each gene in each cell type in this tissue, with each column of W representing one of the p cell types and each row representing one of the m genes. H is a p × n matrix of cell type proportions (or probabilities if the data are subcellular resolution) where each column H ( j ) represents the proportions of each of p cell types at spot j .
In the datasets used in this study, each element of W was derived from , a reference single-cell RNA-seq dataset. Single-cell RNA-seq data is most often modelled using a negative binomial distribution (20) estimated for each gene i , in each cell type k, from the expression data of the available single-cells indexed by l. φ i,k represents the overdispersion parameter of such a distribution: In this study, we approximated the elements of W by taking the mean normalized (details below) expression of each gene in each cell type in the reference single-cell RNA-seq dataset , which in practical terms avoids having to include the entire single-cell RNA-seq dataset during the model fitting procedure, thus speeding up inference and likely having advantages on very large reference datasets.
Given Y and W, the following model was then used for estimating H: ( j ) , ν j , β 0, j , σ j ∼ t ν j , β 0, j + W i H ( j ) , σ j i = 1, . . . , m; j = 1, . . . , n; k = 1, . . . , p We placed a gamma prior (priors are denoted herein by π ) on the degrees of freedom parameter ν of the tdistribution, using shape and rate parameter values previously proposed by Juarez and Steele (21): π ν j ∼ Gamma (2, 0.1) We constrained the elements of H to be positive and to sum to one within each spot: This was achieved by placing a non-informative Dirichlet prior on the columns of H: Each σ j was assigned a non-informative prior. We used the image data to generate a prior estimate of the abundance of some cell type a (e.g. immune cells) at each spot j (details below), then we placed a beta distribution prior on the corresponding proportion of cell type a at spot j : π h a, j ∼ Beta τ j , λ Here, τ j is the mean of the beta distribution representing the image-derived prior estimate for the proportion of this cell type a at spot j . λ is a hyperparameter, representing the total count parameter of the beta distribution, determining how much weight is to be placed on the image data and how much to place on the transcriptomic data. Elements of H outside of the specific row-of-interest a (i.e. elements for which an informative image-derived prior is not available) are assigned non-informative priors.
In the notation above, vectors are shown using boldface and matrices bold capital letters. We assume m genes (indexed by i ), n spots, (indexed by j ) and p cell types (indexed by k).

Fitting the GIST and GIST base-model
The statistical model described above was implemented in the Stan programming language using the rstan package. The Hamiltonian Monte Carlo (HMC) algorithm was used to estimate the model parameters. The HMC algorithm was run for 2000 iterations where the first 1000 iterations were discarded as burn-in. The posterior mean was used as final parameter estimates.

Prior construction
Mouse brain dataset. To avoid outlier bias in the IF image data the pixel-level image intensity values were first capped PAGE 3 OF 14 Nucleic Acids Research, 2022, Vol. 50, No. 14 e80 at the 99th percentile and values below the first percentile were set to zero. These pixel-level intensity values were then rescaled from 0 to 1, by dividing all values by the maximum capped value. Pixels overlapping each spatial transcriptomics mRNA capture spot were defined as those centered around the middle of the spatial transcriptomics spot in a 70-pixel radius--the center of the spot was defined in an annotation file that was output by the 10x Genomics Spac-eRanger software. The rescaled pixel-level intensity values were then averaged over the slide regions corresponding to each spatial transcriptomics spot to obtain a single intensity value for each spot. This procedure was repeated for both IF channels--RBFOX3 (Neuron) and GFAP (Glia). Finally, the intensity values for each spot in each channel were mapped onto the quantiles of the cell type proportion estimates obtained from a first round of model fitting using the GIST base-model (i.e. where all parameters estimating cell-type abundance are assigned non-informative priors). These IF image-derived mapped spot level intensity values, which act as a proxy for the abundance of neurons or glia, were used as priors on the appropriate parameters in the GIST model.
Breast cancer dataset. The deep learning models used in the breast cancer analyses were previously published by Saltz et al. (22) and were obtained from the Quantitative Imaging in Pathology (QuIP) group's website (https://sbubmi.github.io/quip distro). These are convolutional neural network-based deep learning models, which had been pretrained to recognize tumor-infiltrating lymphocytes. The original authors had trained these models using pathologist annotated H&E-stained tissues sections from TCGA. We used the VGG16-based model provided by the group. The breast cancer H&E images were converted from JPEG format to tiled TIFF format and the software suite VIPS was used to encode the TIFF files with a micron per pixel (MPP) value for each slide. The encoded TIFF files were processed using QuIP's deep learning pipeline to generate a probability map over the entirety of each breast cancer H&E stained slide image. The deep learning model assigned probability values to patches of 50 × 50 microns. For a given spot, the assigned patch-level probability values were converted to spot-level probability values by taking a weighted sum of the patches, where the weight is the pixel overlap between the patch and the spot. This generated values for each spatial transcriptomics spot that approximately corresponded to the probability of immune cell infiltration. Similarly to the mouse brain IF dataset, these probability values were then mapped onto the distribution of total lymphocyte (T cell and B cell) content estimated from gene expression-derived proportions alone, obtained by an initial round of model fitting using the GIST base-model. These mapped values were used as informative priors on the appropriate model parameters in the GIST model.
The image processing code was implemented in Python using imaging libraries PIL.Image and imageio. Visualization and analysis of imaging data were carried out using the NumPy, pandas and Matplotlib libraries.

Quantifying the improvement achieved by the GIST model, compared to an expression-only model, by benchmarking against a pathologist-defined ground truth
For each slide in the breast cancer dataset, we quantified a model's ability to accurately estimate regions of immune cells by the median of immune cell proportions in spots labeled as immune-infiltrated by the original pathologist, divided by the median of immune cell proportions estimated in the other remaining spots: With better performance, the scalar value Q will increase, as the model's output better matches the pathologistdefined ground truth for this slide. Having defined this performance metric, we defined the improvement of the GIST model over the expression-only GIST base-model below as , a scalar representing the difference between this ratio statistic Q when immune cell proportions were estimated with the GIST model (Q G I ST ) or the GIST base-model (Q G I STBaseModel ): To assess whether the improved performance observed for the GIST model over the GIST base-model was statistically significant, we used a permutation-based strategy, building a null distribution by randomly shuffling the pathologist's spot level annotations. Specifically, for each permutation, the spots were randomly assigned as either immune infiltrated or non-immune, fixing the total number of immune infiltrated spots to the same number as the pathologist's annotation of that slide; we then computed the improvement in the performance perm of the GIST model over the GIST base-model using the same procedure that was applied to the real arrangement of the pathologist's annotations. This was repeated for 100 000 permutations, generating a null distribution against which to compare the observed test statistic . A P-value was then calculated by the proportion of permuted values perm that achieved a value at least as extreme as , the test statistic observed in the pathologist's real annotations. In the cases where no permutated value more extreme than the original test statistic was observed (G1 and H1), a P-value was calculated by approximating the null distribution using a normal distribution, with a mean and standard deviation equal to that of the perm values from the 100,000 permutations. Note that while the statistical validation procedure described in this section assumes a pathologist's manual annotation of the slides as the hold-out ground truth, these same procedures could be applied to any assumed ground truth annotation given any class of informative paired image, for example, substituting the pathologist's annotation for an orthogonal IF or pathology stain marking the cell type of interest.

Second pathologist's re-annotation of the breast cancer spatial transcriptomics slides
A second pathologist was asked to assign new immune infiltration grades from H&E images of spots for three spatial transcriptomics breast cancer slides--B1, C1 and H1. The pathologist (co-author Dr Heather Tillman) was asked to blindly score H&E images of slide regions overlapping the spatial transcriptomics mRNA capture spots from three groups of spots: These were (i) spots that were annotated as immune cell infiltrated by the original pathologist (slide H1 only), (ii) spots that were identified as highconfidence immune infiltrated by the GIST model or (iii) other randomly chosen spots. High-confidence immunecell-infiltrated spots from the GIST model were selected as the spots having a predicted proportion of immune cells that was greater than the upper quartile plus 1.5 times the interquartile range of the data, a de facto metric used to define outliers. For each slide, the number of random spots selected was equal to the number of spots included from the GIST model. This second pathologist was then asked to score/grade an H&E stain image of each spot, scoring immune cell infiltration levels as low, middle, or high, while remaining blinded to the group from which the spot image was selected. This provided a new score for each spot from each of the three groups (annotated, GIST, random). We then applied a one-sided Wilcoxon rank-sum test to assess whether these scores were significantly higher in the group of spots predicted as high confidence immune infiltrated by the GIST model compared to the randomly selected spots or the immune infiltrated spots from the initial pathologist's annotation, where low, middle and high scores were encoded on an ordinal scale as 1, 2 and 3, respectively.
Simulations to assess the ability of the GIST base-model to accurately identify cell type composition in gene expression data from a mixture of cell types Splatter. The accuracy of cell type proportions estimated from the various computational methods was compared to the GIST base-model by first creating synthetic mixtures of gene expression data using the popular Splatter model (23). We used the Splatter model with a slight modification, which was recently proposed by Zhang et al. (24), who reported that the native Splatter model did not capture the empirical distribution of log fold changes observed in real data. The enhanced Splatter model was obtained from the GitHub repository of Zhang et al. (https://github. com/Irrationone/splatter), where the authors had learned the simulation parameters from the counts matrix of a publicly available PBMC single-cell RNA-seq dataset generated by 10X Genomics. The parameters for log fold changes were learned by fitting a truncated Student's t-distribution to the log fold changes between B cells and CD4 T cells in this same PBMC dataset.
Using the enhanced Splatter framework, we generated a dataset with 100 gene expression samples, each created from mixtures of cell types, along with a simulated paired reference single-cell RNA-seq dataset. The paired single-cell RNA-seq data were collapsed by their mean to create the required reference signature matrix W, which was passed to each of the computational methods. Each expression mixture sample was generated by taking a weighted average of gene expression across 100 cells (generated independently of the reference single-cell RNA-seq data) from each of six synthetic cell types. Ground truth cell type proportions for the 100 simulated mixture samples were randomly generated from a flat Dirichlet distribution, where each cell type was assigned equal weight.
Immune cell deconvolution. We performed a second set of benchmarking simulations using the framework developed by Strum et al. (25), which rather than relying entirely on simulation, created a mixture gene expression dataset by computationally mixing real single-cell RNA-seq data, previously generated by Schelker et al. (26). In this benchmark, ground truth was established by mixing gene expression counts from 500 single-cells from each of eight immune cell types in known proportions and the simulated mixture was created by taking an average across cells. For the fairest comparison, we supplied each of the methods the LM22 cell type signature matrix (27) (corresponding to W in our notation herein), which is a signature matrix created by the developers of CIBERSORT that represents average gene expression values in each of 22 immune cell types. Note this was not possible for Stereoscope, which only accepts singlecell RNA-seq data as the reference input, from which it estimates the cell type signature matrix internally. Because the LM22 cell types do not have a strict one-to-one correspondence with the cell types annotated in Schelker et al., the results were mapped to the most relevant cell type using the same mappings previously employed by Strum et al.
In all simulations, the performance of each method was summarized by the mean absolute error (MAE), which is the average of the absolute value of the difference between each predicted cell type proportion and the known simulated ground truth proportion: where y i is a predicted cell type proportion, x i is the ground truth proportion, e i is the error associated with the prediction, and n is the total number of predicted data points generated by a given method.

Data preprocessing, filtering, normalization and imputation
All public datasets were obtained as preprocessed counts matrices, which had been processed according to the previous authors. Generally, spatial transcriptomics data displayed greater sparsity than the single-cell RNA-seq data, which arises because of differences in platform-specific mRNA capture efficiency. To alleviate this difference, we used a non-parametric imputation approach. Specifically, we used the knnSmooth (28) algorithm (available at the GitHub repository https://github.com/yanailab/knnsmoothing) to impute the spatial transcriptomics data. For the IF mouse brain dataset, we set the 'number of nearest neighbors to aggregate' parameter k to 5 and the 'number of principal components' parameter d to 10 (author's suggested default). For the breast cancer dataset, we

PAGE 5 OF 14
Nucleic Acids Research, 2022, Vol. 50, No. 14 e80 used the same approach with slight modifications. The resolution of spots on the breast cancer slides was coarser than on the Visium array and transcript capture was poorer. Thus, to overcome these limitations, we combined the spots from all the breast cancer spatial transcriptomics slides and imputed them together using the knnSmooth algorithm with a k parameter of 10, mitigating the lower transcript capture efficiency in the breast cancer dataset.
Thereafter, both the spatial transcriptomics and singlecell RNA-seq data were normalized separately by using Seurat's SCTransform (29), which importantly removes technical effects such as library size effects. We restricted the single-cell RNA-seq and spatial transcriptomics data to the intersection of their 2000 most highly variable genes, yielding totals of 1024 and 837 genes used for GIST model fitting in the mouse and breast cancer datasets respectively.

Software and code availability
The GIST model has been made available as an R package, which can be obtained at: https://github.com/asifzubair/GIST. All the code for the analyses presented in this manuscript are available on GitHub: https://github.com/asifzubair/ GIST-paper.

Guiding-Image Spatial Transcriptomics (GIST) jointly leverages spatial transcriptomics and paired tissue images to improve cell type decomposition
GIST attempts to improve cell type decomposition in spatial transcriptomics data by leveraging prior estimates of cell type composition from paired tissue images. The method relies on Bayesian probabilistic modeling, a statistical approach that naturally lends itself to integrating multiple sources of information, jointly leveraging spatial transcriptomics and imaging information to improve cell type decomposition estimates. Intuitively, the approach uses the imaging data to provide an initial 'suggestion' as to the cell types in a particular region of the spatial transcriptomics slide, but this suggestion can be overcome if outweighed by the evidence from the transcriptomic data (schematic representation in Figure 1A, see Materials and Methods for technical details of model).

A Bayesian probabilistic model for cell type decomposition performs competitively when compared to existing methods in simulations when no paired image information is leveraged
Existing methods for cell type decomposition in spatial transcriptomics data are related to previous models for bulk gene expression deconvolution and can be broadly conceptualized as a matrix decomposition, where some reference basis matrix of expression data from purified cells W (e.g. derived from single-cell RNA-seq) is used to estimate the proportion of each cell type H in the bulk mixture Y (Figure 1B for schematic representation). At subcellular resolution, the H matrix can be thought of as probability estimates, rather than proportion estimates (15), although for simplicity we use the term 'proportion' throughout this manuscript.
The statistical model underlying GIST is related to these existing approaches but includes the ability to leverage prior information derived from paired tissue images. Thus, we were first interested in assessing whether our model performed competitively when compared to existing approaches in the absence of prior information derived from images (henceforth referred to as the 'GIST base-model').
To test this, we first developed two complementary unbiased benchmarking simulations, one based on the existing tool Splatter (23) and one based on a published benchmarking dataset (25), which evaluates methods on a simulated mixture of immune cell types from a real singlecell RNA-seq dataset. We compared the GIST base-model to two methods originally designed for bulk gene expression data (CIBERSORT (30), DeconRNASeq (31)), two methods tailored specifically for spatial transcriptomics data (Stereoscope (17) and SpatialDWLS (32) Table S2; MAE = 0.09 for CIBER-SORT and 0.06 for the GIST base-model). However, given the conceptual similarity of the underlying models, it is not surprising that none of these existing methods produce markedly dissimilar results in either simulation, suggesting that, rather than further model tweaking and optimization, a new conceptual advance may be necessary to achieve meaningful progress on the cell type decomposition problem.
The GIST base-model performs competitively on spatial transcriptomics data obtained from mouse brain sections when cell type specific immunofluorescence markers are treated as a ground truth We were next interested in comparing the performance of the GIST base-model to other methods using real spatial transcriptomics data. To do this, we leveraged a publicly available dataset (see Data Availability), which measured gene expression in the mouse brain using the popular 10x Genomics Visium spatial transcriptomics platform, and where immunofluorescence (IF) staining was performed on the reverse side of the tissue section. These IF stains were conducted for two proteins, RBFOX3 and GFAP, which are protein markers unique to neurons and glia respectively ( Figure 3A). We calculated the average pixel intensity of each of these two markers in all image pixels overlapping each spatially barcoded mRNA capture spot on the Visium slide ( Figure 3B; see Materials and Methods), then we used these spot-level intensity estimates to represent an independent ground-truth approximating the abundance of neurons and glia in regions of the slide overlapping each of the Visium array's 4992 spots. This matrix is decomposed into a basis matrix W and a matrix H that contains the proportion of each of p cell types on each spot or (at subcellular resolution) the probability that a spot matches a cell type (shown for three hypothetical cell types A, B and C). The basis matrix W is typically known and can be derived for example from single-cell RNA-seq data from the same or similar tissue. Given this, all existing cell type decomposition algorithms, be they designed specifically for spatial transcriptomics data or not, aim to estimate H.
Next, using the GIST base-model, we estimated the cell type composition on each spot from the spatial transcriptomics data by leveraging a single-cell RNA-seq dataset that was available from a similar region of a mouse brain, allowing us to estimate the abundance of glial and neuronal cell types from the spatial transcriptomics expression data alone ( Figure 3C). We compared the results obtained from the GIST base-model to popular spatial transcriptomics cell type decomposition methods Spotlight (14), RCTD (15), Stereoscope (17) and cell2location (33), treating the IFderived estimates of neurons and glia at each spot as ground truth. Consistent with our simulations, the GIST basemodel, RCTD, cell2location, and Spotlight all performed quite similarly in these benchmarks on real data; however, we note that the GIST base-model had slightly better performance than the other methods, achieving Spearman's rank correlations of 0.49 and 0.77, compared to 0.33 and 0.77 for RCTD (the second best performing method), for the glial and neuronal comparisons respectively ( Figure 3D; P < 2.2 × 10 -16 from Spearman's correlation against IFderived ground truth for all five methods; Supplementary  Figures S3-S7). Overall, these results suggest that the GIST base-model performs competitively when compared to existing methods for cell type decomposition in real spatial transcriptomics data.
Incorporating image-derived prior information from matched immunofluorescence stains has the potential to improve cell type decomposition in spatial transcriptomics data Even though our GIST base-model performed well compared to existing methods, the results above also showed that the best-performing methods were not markedly different and fall well short of an optimal performance when compared to the IF-derived ground truth. Thus, we next hypothesized that it should be possible to markedly improve our performance by leveraging our model's Bayesian implementation and supplying the model with informative image-derived prior information (henceforth referred to as the 'GIST model'). We reasoned that we could first demonstrate this principle on this mouse brain dataset, leveraging the IF-derived estimates of cell type abundance. However, IF-derived pixel intensity estimates do not represent proportions on a 0-1 scale and thus it is not obvious how this information could be leveraged as prior estimates of cell type composition in the GIST model. To solve this problem, we first normalized the IF-derived estimates by mapping them onto the quantiles of the spatial transcriptomicsderived cell type proportion estimates, generated by an initial round of model fitting using the GIST base-model (Figure 3E-G; see Materials and Methods). We then refit our GIST model, incorporating this prior knowledge derived  (25). Points have been colored by the immune cell type and the y-axis shows the deviation from ground truth, quantified by the difference between the estimated cell type proportions in a sample and the true proportion used as ground truth for the simulation. The Mean Absolute Error (MAE), summarizing the overall performance of each method is as follows: Linear regression = 0.14, CIBERSORT = 0.09, DeconRNAseq = 0.11, SpatialDWLS = 0.1, GIST base-model = 6.4 × 10 -2 . Note: Stereoscope was not included in this second set of simulations because it was not possible to pass the CIBERSORT LM22 signature matrix, which was used as the cell-type reference in this simulation, to Stereoscope (see Materials and Methods). In all boxplots, the center line represents the median, bound of box is upper and lower quartiles and the whiskers are 1.5 × the interquartile range. from the RBFOX3 IF data, providing 'suggestions' of the abundance of neuronal cell types over each spatial transcriptomics spot. We specified these priors using a beta distribution applied to the appropriate group of model parameters corresponding to neuronal cell type estimates. The beta distribution was parameterized by its mean (τ ; the point estimate of the normalized cell type proportion estimate from the IF image) and the total-count parameter (λ; the strength of the prior, corresponding to the weight placed on the IF image)--any beta distribution is naturally constrained to a 0-1 scale, meaning it is appropriate for specifying image-derived prior estimates of cell type composition. The key modeling question is then determining how much weight to place on these image-derived priors and how much to place on the spatial transcriptomics data itself. This must be determined by selecting the hyperparameter λ, where a value that is too small will mean there is little to no influence of the image-derived cell type information on the model's output, but selecting a value that is too large will cause the model to over rely on the image and degrade performance.
We chose this hyperparameter λ by observing how the estimates of glial cell type composition compared to IFderived glial-cell ground-truth (GFAP stain) when fitting the model with ever-increasing values of λ for the IF-derived neuronal cell type prior (RBFOX3 stain), only placing priors on the neuronal cell types. As expected, when increasing the value of λ and placing more weight on the image-derived prior for neuronal cells, the model's output progressively more closely matched these IF-derived estimates for the neuronal cell types ( Figure 3H). However, as we continued to increase λ, placing more and more weight on the image-derived estimates of neuronal cells, we eventually observed a monotonic drop-off in the model's performance, as measured by the agreement between the glial cell type estimates from the GIST model and the IF-derived ground truth from the GFAP glial marker protein ( Figure  3H). This monotonic drop-off begins at λ=50, suggesting that beyond this point this prior is too strong, providing us a stopping criterion and a reasonable initial value of λ for image-derived priors. This value of λ concentrates most of the prior probability mass within approximately ±10% of the mean. At this λ value, the Spearman's rank correlation between the model-derived neuronal cell type estimates and the IF-derived ground truth increased from 0.7 to 0.85, but this increase would be expected given that the model's posterior predictions are being compared to the specified prior ( Figure 3I and J). While these analyses specify a statistical model (including reasonable hyperparameter estimates) and normalization procedure for directly leveraging cell type informative image-derived information, determining whether this improves performance would involve comparison against an independent data modality, which is not available for this dataset. We address this problem below, leveraging this initial estimate of the key hyperparameter λ in a new out-of-batch independent dataset and assessing the model performance against a pathologist manual annotation of paired H&E stained pathology images, which are available in this additional dataset.

Incorporating prior information derived from deep learning models applied to matched H&E-stained images improves estimates of immune cell infiltration in breast cancer spatial transcriptomics data
The results above suggest it should be possible to improve cell type decomposition in spatial transcriptomics data by leveraging matched images. However, while IF stains can provide reliable markers of cell types, they are restricted to a small number of proteins and are currently less commonly collected than the H&E stain. Thus, we also wondered whether it would be possible to leverage cell-typeinformative information derived from deep learning models applied to H&E stains--the principal pathology stain that is collected as a part of almost all sequencing-based spatial transcriptomics protocols. Deep learning models have already been developed that can output numerous clinically relevant annotations from H&E-stained tissue section images alone, which could theoretically be usefully propagated in the spatial transcriptomics assay. These annotations include cell type composition, expression of signaling pathways, chromosomal ploidy, and immune cell infiltration (22,34,35). To test whether such information could be usefully exploited in spatial transcriptomics assays, we obtained 8 previously published spatial transcriptomics tissue slides, which had measured gene expression in biologically independent breast cancer tumors. Critically, each of these tissue sections had also been H&E stained ( Figure 4A, panel (a) in Supplementary Figures S8-S12), and regions of immune cell infiltration had been annotated by a previous pathologist ( Figure 4B, panel (b) in Supplementary  Figures S8-S12), providing an independent ground truth against which to assess our model predictions (notably this was not available for the previous IF dataset). Identifying immune cell infiltration has prognostic value (36) and is predictive of response to cancer immunotherapy (37), hence represents a particularly interesting use case of the GIST model.
Thus, we applied a previously published deep convolutional neural network (22), which had been trained using images collected as part of TCGA to identify regions of tumor-infiltrating lymphocytes from H&E stained tumor tissue sections. This yielded patches of deep learningderived predictions of immune cell infiltration across each of our breast cancer tumor tissue sections ( Figure 4C, panel (c) in Supplementary Figures S8-S12), where gene expression had also been measured using spatial transcriptomics. We then averaged these deep learning derived predictions over the pixels overlapping each of the spatial transcriptomics mRNA capture spots, yielding a deep-learningderived per-spot estimate of immune cell composition in each tumor ( Figure 4D, panel (d) in Supplementary Figures S8-S12, similar to the approach applied above for IF data; see Materials and Methods). Initial immune cell proportions at each spot were then estimated using the GIST base-model ( Figure 4E, panel (e) in Supplementary Figures  S8-S12). We applied a similar normalization approach as we described for the IF data, mapping the deep learning de-rived estimates to the quantiles of the initial gene expression derived estimates, then applied these deep-learning-derived immune cell compositional estimates as informative priors, again specified as a beta distribution on the appropriate GIST model parameters. We used a λ value of 50, which was derived from the previous independent dataset ( Figure 3H). If the GIST model performs better than the expression-only GIST base-model, the expectation is that we should identify more immune cells in pathologist-annotated immune cell regions, but less in other regions of the slides. Thus, we quantified model performance by the ratio of immune cells identified within the pathologist's annotated regions of immune infiltration, compared to all other regions of the tissue slide (this ratio is defined herein as Q (see Methods); note that regions of immune cells had been identified by the pathologist in six of eight slides). When compared to the pathologist-derived ground truth, the GIST model, leveraging deep learning-derived prior information, performed better than the expression-only GIST base-model in four out of the six slides ( Figure 4F, panel (f) in Supplementary  Figures S8-S12). The performance increase over the GIST base-model was particularly large for two slides ( Figure 4G Figure  4H, black arrowhead, and Figure 4I) and regions where estimates of immune cell composition increased to match the pathologist ( Figure 4H, green arrowhead). Given that the false positive region highlighted in Figure 4I is directly adjacent to a true positive region of immune cells, it is plausible this represents diffusion of immune cell mRNA from the adjacent spots, a common issue in spatial transcriptomics assays that can seemingly be mitigated by jointly leveraging the paired tissue image. Thus, leveraging deep learning derived prior information has the potential to markedly improve cell type decomposition in data generated from spatial transcriptomics technologies.

The GIST model identified large regions of immune cell infiltration that were missed by the initial pathologist
Surprisingly, one of the six breast cancer slides assessed demonstrated a statistically significant decrease in performance when we leveraged the image-derived prior estimates of immune cell infiltration (slide H1 in Figure 4F, P = 3.56 × 10 -11 , Supplementary Figure S12g). However, closer inspection of this slide's results revealed that there was a large region of this tumor that was identified as immune cell infiltrated by both the spatial transcriptomics assay and the deep learning model, but this region was not marked by the initial pathologist's annotation (Supplementary Figure S12a-e and h). Unsurprisingly, this region was predicted as heavily immune cell infiltrated by the GIST model, which also correctly identified the original pathologist's annotated regions of immune infiltration in this slide ( Figure 5A, Supplementary Figure S12f).  Thus, we hypothesized that the apparent decrease in performance may have represented an oversight in the initial pathologist's annotation, and thus a deficiency in the assumed ground truth, rather than a deficiency in the GIST model's prediction. To test this, we devised a procedure that would allow a second independent pathologist (see Author's contributions) to re-examine the relevant regions of this slide, while remaining blinded to the GIST model's output and the original pathologist's annotation. The second pathologist was presented with (n = 115) 100 × 100-micron subregions from this slide and asked to categorize them as either low, middle, or high levels of immune cell infiltration. These subregions were chosen either from (i) the first pathologist's annotated immune cell regions (ii) high-confidence immune cell regions identified by the GIST model but not the first pathologist or (iii) other randomly chosen regions (representative examples shown in Figure 5B; see Methods). Remarkably, the second pathologist's reannotation determined no statistical difference between the high-confidence regions of immune cell infiltration annotated by the first pathologist and the additional high-confidence regions identified by the GIST model, which were missed by the first pathologist ( Figure 5C; P = 0.15 from two-sided Wilcoxon ranksum test). However, the high-confidence regions of immune cell infiltration identified by GIST were much more likely to be marked as high probability regions of immune cell infiltration when compared to randomly chosen slide regions ( Figure 5C, P = 3.5 × 10 -9 from one-sided Wilcoxon rank-sum test). Additionally, the second pathologist's high confidence immune infiltrated regions were mirrored by higher estimated proportions by GIST ( Figure  5D). These results support the notion that the additional regions identified by the GIST model were true regions of immune cell infiltration and that the poor performance on this slide arose from an omission in the original pathologist's annotation, not falsely identified regions by the GIST model.
We also reexamined the two available spatial transcriptomics slides where the original pathologist's annotation of the H&E images had not identified any regions of immune cell infiltration (Supplementary Figure S13a, b). Surprisingly, for both slides the deep learning model ( Figure  5E, Supplementary Figure S13c) and the expression-only cell type predictions from the spatial transcriptomics assay ( Figure 5F, Supplementary Figure S13d) agreed that there were in fact regions of immune cell infiltration (Figure 5G, Spearman's correlation = 0.46, P < 2.2 × 10 -16 ; Supplementary Figure S13e, Spearman's correlation = 0.25, P < 2.2 × 10 -4 ). Unsurprisingly, these same regions were identified by the GIST model ( Figure 5H, Supplementary Figure S13f) and thus it seemed plausible that the initial pathologist had also missed these immune infiltrated regions in their initial examination of these two slides. We used the same scoring procedure outlined above to reannotate these slides by the second pathologist, who convincingly annotated these predicted regions as true regions of immune cell infiltration (Figure 5i, P = 1.5 × 10 -9 ; Supplementary Figure S13g, P = 4.5 × 10 -2 ; see Materials and Methods), which were also mirrored by higher proportions estimated by GIST (Supplementary Figure S13h, i).
Taken together, the presented use cases suggest that our GIST model, which can jointly leverage image-derived cell type annotations with spatial transcriptomics data, has the potential to improve cell type decomposition in spatial transcriptomics data, and that such a strategy can be used to identify predictive and prognostically important features in human tissue sections.

DISCUSSION
We have presented a conceptually novel computational methodology that can leverage cell-type-informative data derived from paired tissue images to improve inferences of cell type composition in spatial transcriptomics data. For the spatial transcriptomics platforms used in this study, these images were obtained from the reverse side of the slide-affixed tissue section (schematic in Figure 1A), but it is also likely feasible to obtain informative images from an adjacent tissue section. One exciting application of the methodology is the ability to leverage inferences from deeplearning models applied to tissue images, for example H&E stains, which itself has recently reached close to pathologist level performance in annotating clinically relevant features of tissue sections (22,34,35). However, the statistical methodology is highly generalizable and could be applied to any image-derived prior information, which we have demonstrated for immunofluorescence. We have also described a set of generalizable statistical procedures for determining the value-added by incorporating paired celltype-informative tissue images in spatial transcriptomics analysis (see Materials and Methods). These procedures were applied to our detailed example of leveraging deep learning-derived immune cell predictions in breast cancer, where a pathologist's annotation was held-out as an assumed ground truth, but this same validation procedure could be applied to any imaging modality where a reasonable hold-out annotation is available, for example, in the context of IF data, derived from an orthogonal IF stain marking the cell type of interest. Generally, our proposed integrated approach has the potential to improve all downstream applications of spatial transcriptomics that rely on accurate cell type annotations, including identification of cell-cell or gene-gene interactions, or cell type specific differential expression (19).
Our framework will also spur the development of future similar computational approaches. Theoretically, any cell type decomposition method that could be re-implemented in a Bayesian framework could be adapted to leverage image-derived cell-type-informative prior information and this is likely possible for most of the existing models used in our comparisons-of-methods (Figures 2 and 3). Thus, there is enormous scope for future model development and optimization within our novel framework. We also anticipate that our framework will lead to new modes of spatial transcriptomics experimental design. For example, we showed that immunofluorescence data could be leveraged. This opens the possibility of a priori staining for a few particularly informative protein markers, knowing that such markers can be used in downstream analyses to directly influence and improve the results of the spatial transcriptomics data analysis. This may be particularly useful for sep-PAGE 13 OF 14 Nucleic Acids Research, 2022, Vol. 50, No. 14 e80 arating cell types when multicollinearity affects the performance of conventional models for cell type decomposition (18).
Additionally, while we have shown some illustrative examples, the Bayesian implementation allows enormous flexibility in how prior information is specified. It is theoretically possible to, for example, apply one prior to groups of cell types, or apply multiple partially overlapping priors derived from various sources of information. For the breast cancer dataset shown, we also fixed the λ hyperparameter to 50, using information obtained in the previous dataset. This is likely a conservative means by which to choose this key value and also assumes that this hyperparameter should be assigned the same value at all regions of the slide--almost certainly an oversimplification. Methods could likely be devised to adaptively adjust the value of the λ hyperparameter on a per-spot or per-slide-region basis, such that, for example, the differences in uncertainty associated with the deep learning-based outputs could be accounted for at each tissue region. Additionally, given that spatial transcriptomics is still a very new technology, the availability of datasets upon which to properly benchmark or test the GIST approach is still quite limited. Thus, particularly with increasing data availability, it is likely that creative applications within the described framework will eventually yield improvements over the results presented here.
In conclusion, we anticipate that jointly leveraging spatial transcriptomics and cell-type-informative images collected from the same or adjacent tissue sections will represent an important conceptually novel computational methodology, which has the potential to improve many applications of emerging spatial transcriptomics technologies, including potential translational applications in clinical and diagnostic pathology.
As a cell type reference W for these data, we used the curated mouse brain single-cell RNA-seq data provided by Andersson et al. (17). This data had been originally retrieved from http://www.mousebrain.org and was processed by Andersson et al. for use in spatial transcriptomics analysis: https://github.com/almaan/ stereoscope/tree/master/data/mousebrain.

Breast cancer
The eight separate breast cancer spatial transcriptomics slides, previously generated by Andersson et al., were downloaded from https://github.com/almaan/her2st. This repository contained count matrices generated from the spatial transcriptomics assays, H&E images of the tissue sections (with and without pathologist annotation), and matrices detailing the location of the spots.
The single-cell RNA-seq breast cancer dataset, used to generate the cell type reference matrix W for all breast cancer analyses, was previously generated by Karaayvaz et al. (38) and obtained from: https://github.com/Michorlab/ tnbc scrnaseq.

Software and code availability
The GIST model has been made available as an R package, which can be obtained at: https://github.com/asifzubair/GIST. All the code for the analyses presented in this manuscript are available on GitHub: https://github.com/asifzubair/ GIST-paper.