Deciphering spatial domains from spatially resolved transcriptomics with Siamese graph autoencoder

Abstract Background Cell clustering is a pivotal aspect of spatial transcriptomics (ST) data analysis as it forms the foundation for subsequent data mining. Recent advances in spatial domain identification have leveraged graph neural network (GNN) approaches in conjunction with spatial transcriptomics data. However, such GNN-based methods suffer from representation collapse, wherein all spatial spots are projected onto a singular representation. Consequently, the discriminative capability of individual representation feature is limited, leading to suboptimal clustering performance. Results To address this issue, we proposed SGAE, a novel framework for spatial domain identification, incorporating the power of the Siamese graph autoencoder. SGAE mitigates the information correlation at both sample and feature levels, thus improving the representation discrimination. We adapted this framework to ST analysis by constructing a graph based on both gene expression and spatial information. SGAE outperformed alternative methods by its effectiveness in capturing spatial patterns and generating high-quality clusters, as evaluated by the Adjusted Rand Index, Normalized Mutual Information, and Fowlkes–Mallows Index. Moreover, the clustering results derived from SGAE can be further utilized in the identification of 3-dimensional (3D) Drosophila embryonic structure with enhanced accuracy. Conclusions Benchmarking results from various ST datasets generated by diverse platforms demonstrate compelling evidence for the effectiveness of SGAE against other ST clustering methods. Specifically, SGAE exhibits potential for extension and application on multislice 3D reconstruction and tissue structure investigation. The source code and a collection of spatial clustering results can be accessed at https://github.com/STOmics/SGAE/.


Background
Spatial transcriptomics (ST) represents a newly emerging technology that revolutionizes the comprehensive characterization of tissue organization and architecture [1,2].By profiling the spatially-resolved gene expression patterns, ST technologies allow scientists to delve into the intricate cellular dynamics within tissues.Based on the underlying methodology, these techniques can be categorized into two main categories: (1) imaging-based methods (MERFISH [3] and seqFISH [4]), and (2) sequencing-based methods (Slide-seq [5] and 10x Visium [6]).As the need for higher-resolution analysis to unravel cellular diversity becomes imperative, advancements such as Stereo-seq [7] have been developed to provide improved resolution over the years.The advent of ST technologies holds immense potential to drive biological discoveries in development, physiology and a broad range of diseases [8,9].
In parallel with single-cell RNA sequencing (scRNA-Seq) analysis, clustering serves as the initial step in ST data analysis, grouping individual cells based on their gene expression patterns.
Similarly, the primary objective for ST data analysis revolves around dissecting tissue into distinct spatial domains.While traditional machine learning approaches have been applied to tackle this task, recent studies have sought to apply deep learning frameworks to learn how to classify spatial spots into specific regions [10][11][12][13].For instance, SpaGCN [12] identifies spatial domains through a graph convolutional network (GCN) framework, while STAGATE [13] deploys a graph attention autoencoder to define spatial clusters.However, such graph neural network (GNN) based methods usually suffer from representations collapse, which tends to map spatial spots into the same representation [14].Consequently, the discriminative capability of spot representation is limited, leading to inaccurate identification of spatial domains.
To tackle the aforementioned challenge, we proposed SGAE, which aims to learn discriminative spot representation and accurately decipher spatial domains.This framework is derived from the Dual Correlation Reduction Network (DCRN) [14], which effectively reduces information correlation at the dual level.SGAE adapts this architecture to ST data analysis by constructing a graph that incorporates both gene expression and spatial information.According to benchmarking assessments, SGAE outperforms existing algorithms in the task of domain identification with superior performance.Moreover, SGAE can be extended in the realm of 3D tissue structure identification.

Overview of SGAE framework
SGAE is an unsupervised algorithm for ST clustering that leverages a Variational Graph Autoencoder (VGAE) [15] within a Siamese graph neural network to combine gene expression and spatial information (Fig. 1).To implement SGAE, the gene expression matrix (X) and adjacency matrix (A) are fed into the encoder, which maps the gene expression data into a lower-dimensional latent space, generating embedding vectors (Z) for individual cells.Pseudo-label is firstly generated by pre-clustering based on gene expression patterns.SGAE adaptively learns the edge weights of the spatial neighbor network (SNN) to capture the similarity between neighboring spots and update the spot representation by aggregating information from neighbors.Finally, the latent embeddings can be visualized using Uniform Manifold Approximation and Projection (UMAP) and various clustering algorithms such as K-means and Louvain can be employed to identify spatial domains for subsequent analysis.
By calculating K-nearest neighbors (KNN) based on the relative spatial positioning of spots, SGAE can effectively capture the spatial relationships between cells.This is especially essential for spatial transcriptomics (ST) data with low spatial resolutions, such as 10x Visium, where discerning fine-grained spatial details can be challenging.Besides, SGAE introduces the concept of a cell type-aware SNN by pruning the SNN based on the pre-clustering of gene expressions.This preliminary clustering step aids in identifying regions that contain distinct cell types.
Through the incorporation of cell type information during the graph construction process, SGAE adeptly captures data heterogeneity and improve the accuracy of the graph representation.SGAE uses graph distortion to acquire diverse and informative node representations.This is achieved through the application of two types of perturbation: feature perturbation and graph perturbation.For feature perturbation, a random noise matrix is introduced to the feature matrix using the Hadamard product.On the other hand, graph perturbation involves edge removal and graph diffusion within the Siamese architecture.To implement edge removal, a mask matrix is generated based on the cosine similarity matrix computed through pairwise comparisons in the latent space.The 10% of edges with the lowest values are then removed.Graph diffusion is facilitated using a random walk-based Personalized PageRank algorithm [16], allowing for the passage of messages through higher-order neighborhoods.To optimize the learning process, SGAE employs an objective function inspired by the Barlow Twins approach [17], aiming to minimize the deviation of the cross-correlation matrix from the ideal identity matrix and reduce redundant information among nodes in the latent space, therefore improving the overall accuracy of the learned embedding.

SGAE exhibited remarkable effectiveness and robustness in spatial domain exploration
ST datasets generated by different technology platforms possess distinct resolutions and features, making it essential to validate the clustering robustness of SGAE across these platforms.To achieve this, we included ST datasets generated by 10x Visium, seqFISH [18], MERFISH [3], SLIDE-seq v2 [19], and Stereo-seq [7].For 10x Visium datasets, samples of human dorsolateral prefrontal cortex have been collected, which comprises 12 continuous slides, and each slide has been labeled into 7 layers based on the anatomical structure [20].For seqFISH, we acquired a sample of mouse gastrulation [21].351 genes have been detected and 19416 cells were labeled into 22 groups.Similar to seqFISH, a mouse primary motor cortex dataset which includes 254 genes and 3106 cells was detected by MERFISH [22].As for the SLIDE-seq v2, a mouse olfactory bulb dataset which contains 20139 cells and 21220 genes was included to test the performance of SGAE [19].To test the performance in tissue without clear structure.Liver cancer from Stereo-seq [23] is utilized.The dataset contains 14288 spots and a margin area between cancer and healthy tissue can be seen according to the H&E staining.Then we comprehensively compare the clustering performance of SGAE against other state-of-the-art spatial clustering methods, including SpaGCN [12], GraphST [10], STAGATE [13] and Leiden [24].Clustering performance was assessed by spatial visualization combined with Adjusted Rand Index (ARI), Normalized Mutual Information (NMI) and Fowlkes-Mallows Index (FMI).

Human dorsolateral prefrontal cortex 10x Visium dataset
We applied SGAE to analyze the 10x Visium ST dataset obtained from the human dorsolateral prefrontal cortex (DLPFC) [20].The visualization of cell clustering confirmed that SGAE was able to discern the intricate stratified cortex structures with remarkable clarity, surpassing the capabilities of other existing methods (Fig. 2A).Furthermore, our benchmarking results revealed that SGAE outperformed other algorithms in analyzing all 12 DLPFC slices (Fig. 2E).

Mouse gastrulation seqFISH dataset
The evaluation of SGAE's performance extends to the mouse gastrulation dataset, which was generated through the imaging-based technology seqFISH [21].The visualization of mouse gastrulation structures derived from different methods demonstrates higher effectiveness of SGAE in accurately discriminating embryo tissue sections (Fig. 2B).In contrast, STAGATE failed to decipher the spatial domain with precision, as it tends to divide the spatial domain into numerous disorder patches.Notably, SGAE reaffirmed its superiority in all benchmark metrics against other methods (Fig. 2F).

Mouse cortex MERFISH dataset
Based on the MERFISH dataset of the mouse primary motor cortex [22], we further compare the clustering results obtained by different methods.While all five methods successfully extract the stratified structure of the cortex, SGAE demonstrates a remarkable ability to capture the layered organization of the glutamatergic structures more accurately when compared to the original annotation (Fig. 2C).Furthermore, SGAE achieved the highest performance among all five methods, underscoring its effectiveness in precisely clustering cells and capturing the spatial arrangement of the primary motor cortex (Fig 2G).

Mouse olfactory bulb SLIDE-seq v2 dataset
The evaluation also encompasses the SLIDE-seq V2 dataset of the mouse olfactory bulb [19].
The spatial domains identified by SGAE exhibited remarkable consistency with the annotation provided by the Allen Reference Atlas, strengthening the confidence in its accuracy and reliability (Fig. 2D).Conversely, the Leiden clustering approach failed to provide a cohesive tissue structure in this dataset, while SpaGCN, GraphST, and STAGATE partially deciphered certain structures within the olfactory bulb.

Liver cancer Stereo-seq dataset
SGAE and alternative clustering methods were tested on a liver cancer sample obtained from Stereo-seq.The application of SGAE resulted in a clearer and more accurate identification of the margin border based on H&E staining (Supplementary Fig 1A-B).Notably, SGAE also detected clusters consisting of discrete spots located in different positions, reflecting the heterogeneous nature of the tumor tissue.To assess the spatial correlation of the clustering results, we computed the Moran's Index.The Moran's Index revealed that alternative methods tended to overutilize spatial information and identify clusters in blocks (Supplementary Fig 1C).To further evaluate the accuracy of the clustering results obtained by these tools, we focused on the rare cell type fibroblast and used VIM as a marker gene for fibroblasts.We visualized the spatial distribution of VIM and compared it with the most probable cluster identified by each of the methods.The results showed that Cluster 6 in SGAE exhibited a higher similarity to the spatial expression of VIM compared to other methods (Supplementary Fig 1D-E).
Overall, our results unequivocally establish SGAE as a powerful method for analyzing ST data, surpassing other state-of-the-art methods in terms of cell clustering performance and structure exploration of complex tissues.

SGAE deciphers spatial domains and provides discriminative representations
Stereo-seq is a novel ST technology that offers subcellular resolution and has opened up new avenues for investigating the intricate structures within complex tissues [7].However, the exploitation of its high-resolution capabilities necessitates the utilization of advanced clustering and spatial analysis methods.Therefore, we conducted a meticulous evaluation of SGAE's clustering performance using a Stereo-seq dataset of the mouse adult brain dataset [25].It comprises a total of 38811 cells and 20062 genes and has been labeled into 38 subclasses through manually annotation.Intriguingly, SGAE showcased exceptional discriminative power in accurately distinguishing mouse brain sections within this dataset, outperforming other methods such as SpaGCN, STAGATE, CCST, and GraphST (Fig. 3A).Subcluster analysis further demonstrated the superior performance of SGAE (Fig. 3B).SGAE accurately delineated distinct subpopulations within the tissue, whereas STAGATE inaccurately divided the DGGRC2 and TEGLU24 regions into two separate clusters, and SpaGCN assigned a larger region for TEGLU24 and HBGLU.
To provide a systematic comparison, we conducted an extensive evaluation of SGAE's clustering results using multiple benchmark metrics, including ARI, NMI, and FMI.Remarkably, SGAE outperformed all other existing methods across all benchmark metrics (Fig. 3C).Besides, we utilized Moran's Index (MI) to assess the spatial autocorrelation of each cluster.Although SpaGCN and STAGATE achieved higher MI scores, SGAE exhibited a distribution most closely aligned with the ground truth in terms of MI (Fig. 3D).It is suggested that SGAE effectively utilizes spatial information in a more reasonable and appropriate manner.Furthermore, we evaluated the representative embedding provided by SGAE, CCST [11], STAGATE, and GraphST through UMAP visualization (Fig. 3E).The results showed that SGAE exhibited a high-level of proficiency in extracting the embedding of the mouse brain Stereo-seq data, while GraphST struggled to distinguish different cell groups.To further evaluate the capability of SGAE to characterize biological representation, we performed pseudotime analysis and calculated the ANOVA F-score for each cell type (Fig. 3F).Surprisingly, SGAE achieved the highest ANOVA F-score, highlighting the discriminative capability of SGAE's embedding in accurately distinguishing between different cell types.
Taken together, these findings provide compelling evidence that SGAE not only surpasses other methods in terms of clustering accuracy, but also excels in providing superior embedding representation for the datasets.

SGAE enhanced complex spatial domain dissection in 3D Drosophila
The advanced use of ST clustering involves integrating 3D reconstruction technology to gain a comprehensive understanding of the spatial organization and gene expression patterns within complex tissues.The fundamental topic of 3D tissue structure dissection is to identify shared and specific spatial domains across multiple slices of ST datasets.Our investigation sought to determine whether SGAE could effectively accomplish this challenging multi-slice clustering task, especially for the datasets with less batch effect (Supplementary Figure 2).Notably, we found that SGAE surpassed Leiden and STAligner [26] in accurately dissecting the spatial domains of Drosophila embryos at different stages (E14-16, E16-18 and L1) [27], as evidenced by its higher similarity to the ground truth (Fig4.A, B).These findings highlighted the effectiveness of SGAE in achieving reliable multi-slice clustering for ST analysis.
After obtaining the clustering results from SGAE, STAligner and Leiden, we proceeded with the crucial step of stack slice registration to enable 3D tissue reconstruction.This involved aligning consecutive tissue slices to generate a complete and accurate 3D representation of the tissue.We observed that the 3D meshes generated from SGAE results exhibited exceptional accuracy in dividing the tissue into correct structures, aligning perfectly with the corresponding marker genes (Fig. 4C).It indicated that the spatial domains generated by SGAE are highly effective in achieving promising 3D tissue reconstruction.In contrast, STAligner and Leiden faltered in accurately dividing the tissue into correct structures in certain cases.This suggests the robustness and reliability of the spatial domains generated by SGAE.

Discussion
Spatial transcriptomics is a cutting-edge technology that allows us to simultaneously capture gene expression while retain spatial information of the tissue.The emergence of large-scale ST data has increased the demand for effective algorithms capable of dissecting spatial domains.To achieve this, we proposed SGAE, a framework composed of two identical encoders based on a Siamese network, which enabled us to encode cell features.Additionally, SGAE employs a graph neural network that facilitates the learning of informative representations of both gene expression and spatial locations.To fully leverage the spatial information provided by ST, we constructed a graph based on the spatial information of each cell and pre-clustered gene expression.We then used a linear combination operation to merge the decorrelated latent embeddings, enhancing the discriminative power of the resulting embedding and clustering accuracy, thus facilitating comprehensive analysis to provide profound insights into complex biological systems.
Our study demonstrates the effectiveness and robustness of SGAE in capturing tissue structures across different ST technology platforms.This superiority over other methods indicates the immense potential of SGAE as a reliable tool for analyzing ST datasets.Another notable strength of SGAE lies in its ability to accurately capture the heterogeneity present within ST datasets.The complexity and diversity of cell types within tissues pose significant challenges in accurately characterizing gene expression patterns.Notably, SGAE's embedding successfully captures the heterogenic information, enabling a more comprehensive understanding of the spatial organization of gene expression patterns within tissues.While SGAE has demonstrated its advantages in ST clustering, further validation across a wider range of ST datasets and biological systems is necessary to fully assess the generalizability of SGAE's performance.
In this study, we also applied SGAE to analyze the Drosophila 3D dataset and unravel the spatial domains during the E14-16, E16-18, and Larva L1 stages.We further compared the performance of SGAE with that of STAligner, a commonly used method developed for multi-slice ST analysis.
By evaluating benchmark metrics, we consistently observed that SGAE outperformed STAligner in effectively grouping cells into biologically meaningful clusters.The superior clustering results of SGAE carry significant implications for the analysis of 3D tissue structure reconstruction.In conclusion, SGAE demonstrates its proficiency in spatial domain identification on spatial transcriptomics with moderate batch effect.For datasets with a high batch effect, it is recommended to integrate batch removal methods upstream of SGAE to effectively mitigate this issue.By accurately categorizing cells into reasonable groups, SGAE could contribute to a more precise characterization of the spatial organization of gene expression patterns.This is particularly important for understanding the complex processes underlying biological development and differentiation.

Notations and Problem Definition
An undirected graph is usually represented by G = {V, E}, where V = {v 1 , v 2 , ⋯ , v N } and E are the node and edge respectively.Each node v i is characterized by a vector   ∈ R D , where D is the dimension of the attribute.Then the graph can be characterized by the feature matrix X ∈ R N×D .The relation between each node is characterized by the adjacency matrix , where a ij = 1 if v i and v j are connected by an edge, otherwise   = 0.
A degree matrix describes the number of edges connected to each node and can be expressed in a diagonal matrix D = diag(d 1 , d 2 , ⋯ , d N ) ∈ R N×N , d i is the degree of node   and calculated by . We normalize the adjacency matrix as A ̃=  −1 ( + ) where  ∈ R N×N is the identity matrix.
In this paper, we aim to train a Siamese graph encoder that embeds all nodes into the lowdimension latent space in an unsupervised manner.The resultant latent embedding can then be directly utilized to perform node clustering by clustering metrics such as K-means and Leiden.

The Overall Architecture of SGAE
The overall architecture of SGAE consists of Graph Distortion, Siamese Encoders, Siamese Decoders, and a reconstruction loss function.

Graph Distortion
We utilized two types of graph distortion including feature corruption and edge perturbation.
For feature corruption, which is the feature-level distortion, we apply Hadamard product to feature matrix and a random noise matrix generated from a Gaussian distribution, i.e.,  ̃=  ⊙ , where ⊙ means the Hadamard product and  ∼ (1, 0.1).
For edge perturbation, which is the structure-level distortion, we adopt two types of distortion, i.e., edge-removing and graph diffusion.For the edge removal, we generated a mask matrix  according to the similarity matrix by calculating the pair-wise cosine similarity in the latent space, where the 10% of the lowest edges will be removed.The final adjacency matrix after edge removal is In the graph diffusion treatment, we used Personalized PageRank to calculate the normalized adjacency matrix into a graph diffusion matrix by following MVGRL method [28]: where  = 0.2 as the teleport probability in a random walk.

Siamese Encoders
In order to reduce the utilization of space while learning richer cell representations, we constructed two same encoders based on Siamese network structure to encode cell features.
And the output is the embedding matrix .First, we use two parameter-shared encoders to encode graph  1 and graph  2 respectively, and generate embedding matrices  1 and  2 .The encoder in the -th layer can be formulated as: ,   and   are degree matrices of   and   ,  is the identity matrix,  1 () and  2 () are weight matrices of encoders in the -th layer,  () is the bias vector of encoder in the -th layer,  is the non-linear activate function, such as ReLU and Tanh.When layer  = 1,  1 (−1) =  1 .
Ultimately, the decorrelated latent embeddings derived from two different views, namely  1 and  2 , are merged using a linear combination operation.This amalgamation produces clustering-focused latent embeddings that can be effectively employed for clustering purposes, particularly through the utilization of the K-means algorithm.

Siamese Decoders
For SGAE, we construct a decoder based on graph convolutional neural networks, while reconstructing feature embeddings and adjacency matrices.The input is the embedding matrix , and the output is the original feature matrix  and the adjacency matrix .Firstly, we use the graph convolutional neural network to decode the embedding  to generate a feature matrix H ̂, and the calculation formula of the k layer decoder is as follows: where  is the degree matrix of the matrix , and  () is the parameter matrix of the -th layer of the decoder.Then, taking inner product computation between the embedding matrix  and its transpose to generate the adjacency matrix A ̂.

Reconstruction Loss Function
Finally, we calculate the feature matrix reconstruction loss  − , the calculation formula is as follows: Calculate the adjacency matrix reconstruction loss L REC−A , the calculation formula is as follows: The reconstruction loss L REC is the sum of the feature matrix reconstruction loss and the adjacency matrix reconstruction loss, and the calculation formula is as follows:

Redundant Reduction Module
In order to eliminate redundant information in node embedding and generate distinguishable embeddings for each node, the present invention designs a de-redundancy module, which eliminates redundant information from two levels: node level and feature level:

Clustering Guidance Module
In order to effectively learn the feature embedding related to the clustering task, the present invention designs a clustering guidance module.Firstly, we pre-train the model, and use K-means to cluster the generated node embeddings.Secondly, we construct a clustering guidance loss   according to the node embedding matrix and the clustering result of the previous step: a) Compute the soft assignment matrix  for all nodes and pretrained cluster centers using the Student's t distribution.b) Generate the target distribution matrix  according to the soft allocation matrix  , the element   of the  row  column is calculated by the following formula: Then compute the clustering guidance loss   using the KL divergence from the soft assignment, the target distribution and the pretrained soft assignment.
During training, the model is optimized by minimizing the loss function: After the model training is completed, the main flow of data in the model inference process is as follows: firstly, the model is used to obtain the low-dimensional feature embedding  of cells, and then based on the learned embedding, K-means is used for clustering, and finally the cluster labels of all cells are obtained.

Clustering Refinement
SGAE also incorporates an optional clustering refinement step.During this step, SGAE analyzes the domain assignment of each spot and its neighboring spots.Specifically, for a given spot, the label that appears most frequently among its surrounding spots is assigned to that spot.The clustering refinement step was exclusively performed for the human DLPFC 10x Visium data.

Performance Evaluation
We

Data Preprocessing
SGAE utilizes transcriptome-wide gene expression profiles with spatial coordinates as input.The raw gene counts per spot are first normalized to the total counts per cell and then scaled through log-transformation.In the case of 3D Drosophila datasets, we did not employ any multi-slice integration method as there was little batch effect observed from the UMAP result.Principal component analysis (PCA) is then conducted on the gene expression data using the sc.pp.pca() function, and the top 50 principal components per spot are subsequently utilized as the default expression feature.
Identifying differentially expressed genes.Wilcoxon test implemented in SCANPY [29] is used to calculate differentially expressed genes for each spatial domain Benjamin-Hochberg adjustment correlation via sc.tl.rank_genes_groups().

Spatial trajectory inference
We employed the PAGA algorithm [30] implemented in the SCANPY package to depict spatial trajectory.The PAGA trajectory and PAGA tree were inferred by the scanpy.tl.paga() function based on cell embedding generated by SGAE.Furthermore, scanpy.tl.dpt() was applied to estimate the pseudo time as well.To compare the performance of each clustering method with embedding, we calculate trajectory and pseudo time using methods above with same parameters settings.

Figure 1 .
Figure 1.An overview of SGAE framework.SGAE algorithm consists of three key modules.Firstly, the graph distortion module generates two distorted graphs by introducing both attribute and graph disturbances.Secondly, the encoder module generates two sets of representations for each sample.Thirdly, the redundant reduction module ensures that the same sample within the two distorted graphs has identical representations at both the feature and sample levels.Lastly, the discriminative representations are applied to clustering algorithms such as k-means to decipher spatial domains.

Figure 4 .Supplementary Figure 2 .
Figure 4. SGAE enhanced complex spatial domain dissection in 3D Drosophila Embryo.(A) 2D visualization of Drosophila Embryo clustering results at different stages (E14-16, E16-18, and L1) from SGAE and STAligner.(B) Benchmark metrics comparison of SGAE, Leiden and STAligner.(C) 3D visualization of Drosophila Embryo.The first row showed the marker genes of Drosophila Embryo at different stages, while the last three rows displayed the meshes generated by SGAE, STAligner and Leiden respectively.Supplementary Figure 1.SGAE reached good performance on complex and heterogenous liver cancer sample.(A) H&E staining of liver cancer sample.Manually added line indicate the border of tumor and healthy tissue.(B) Clustering result of SGAE and other methods.(C) Moran's Index of the clustering results of SGAE and other methods.(D) Spatial map of the expression of VIM (E) The most likely clusters associated with fibroblasts identified using SGAE and other methods, determined by the expression of VIM.Supplementary Figure 2. Less batch effect detected in 3D Drosophila embryos.UMAP visualization of 3D Drosophila embryos.Left : color in cell type annotation, Right : color in slices of sample.(A) E14-16.(B) E16-18.(C) L1.