Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro

Abstract Motivation The burgeoning generation of single-cell or spatial multiomic data allows for the characterization of gene regulation networks (GRNs) at an unprecedented resolution. However, the accurate reconstruction of GRNs from sparse and noisy single-cell or spatial multiomic data remains challenging. Results Here, we present SCRIPro, a comprehensive computational framework that robustly infers GRNs for both single-cell and spatial multiomics data. SCRIPro first improves sample coverage through a density clustering approach based on multiomic and spatial similarities. Additionally, SCRIPro scans transcriptional regulator (TR) importance by performing chromatin reconstruction and in silico deletion analyses using a comprehensive reference covering 1292 human and 994 mouse TRs. Finally, SCRIPro combines TR-target importance scores derived from multiomic data with TR-target expression levels to ensure precise GRN reconstruction. We benchmarked SCRIPro on various datasets, including single-cell multiomic data from human B-cell lymphoma, mouse hair follicle development, Stereo-seq of mouse embryos, and Spatial-ATAC-RNA from mouse brain. SCRIPro outperforms existing motif-based methods and accurately reconstructs cell type-specific, stage-specific, and region-specific GRNs. Overall, SCRIPro emerges as a streamlined and fast method capable of reconstructing TR activities and GRNs for both single-cell and spatial multiomic data. Availability and implementation SCRIPro is available at https://github.com/wanglabtongji/SCRIPro.


Introduction
Transcription regulators (TRs), including transcription factors (TFs) and chromatin regulators (CRs), play a crucial role in gene regulation by influencing transcription rates through mechanisms like recruiting transcriptional initiating complexes and modulating chromatin accessibility (Buenrostro et al. 2015).TRs form complex gene regulation networks (GRNs) with their target genes, also known as regulons, which are highly dynamic in different cellular contexts and serve as the foundation for various biological processes.Traditional methods for inferring GRNs, such as GENIE3 (Huynh-Thu et al. 2010), LISA (Qin et al. 2020), GRNBoost2 (Moerman et al. 2019), TIGRESS (Haury et al. 2012), ppcor (Kim 2015), and NIMEFI (Ruyssinck et al. 2014), were primarily designed for bulk samples from mixed cell-type tissues.However, these approaches are limited in their ability to accurately capture the regulatory programs operating in different cell types and states.With the advent of single-cell technologies, there has been a surge in the development of methods for inferring GRNs from single-cell transcriptome or epigenome data, including SCENIC (Aibar et al. 2017), PIDC (Chan et al. 2017), SCODE (Matsumoto et al. 2017), SINCERITIES (Papili Gao et al. 2018), chromVAR (Schep et al. 2017), and SCRIP (Dong et al. 2022).
However, methods that solely focus on single-cell transcriptome for predictions not only neglect the genuine chromatin accessibility state but also fail to achieve single-cell resolution in GRN predictions, only attaining a cluster level.Additionally, many of these methods heavily rely on motif references to identify potential targets, which results in the loss of the cell type specificity and cannot robustly predict TR activity without motifs, particularly for CRs.To address these limitations, we previously developed SCRIP (Dong et al. 2022), a method that reconstructs single-cell TR activity and GRNs from scATAC-seq data by integrating extensive collections of TR ChIP-seq and motif references.Nonetheless, SCRIP's effectiveness is influenced by the universality and the quality of scATAC-seq data.Recent advancements in single-cell multiomics data have led to the development of new tools for predicting TR activity.For example, tools such as FigR (Kartha et al. 2022), GRaNIE (Kamal et al. 2023), DIRECT-NET (Zhang et al. 2022), and GLUE (Cao and Gao 2022) utilize paired or integrated multiome data as inputs and employ linear/nonlinear regression methods to construct gene regulatory networks.Other tools, like CellOracle (Kamimoto et al. 2023), offer pre-built GRNs or the ability to create custom-defined GRNs using scATACseq data.Meanwhile, tools like MICA (Alanis-Lobato et al. 2024) use bulk ATAC-seq to identify potential TR binding sites, applying this landscape to refine single-cell transcriptomic data (Badia et al. 2023).However, it is worth noting that the aforementioned methods still heavily depend on the motif information, and further exploration is required to enhance their accuracy and coverage.
The flourishing development of spatial omics enables precise analysis of cellular structures within complex tissues and the spatial interactions between cells.Techniques, such as Stereo-seq (Chen et al. 2022) and STARmap plus (Shi et al. 2023), have achieved single-cell-level resolution.These methods have significantly enhanced our understanding of gene regulation in specific microenvironments, providing valuable insights into crosstalk between microenvironment interactions and intracellular GRNs.However, most existing tools for predicting TR activity do not consider cellular spatial location.They overlook the impact in expression similarity of different cells or spots within the same microenvironment, leading to inaccurate TR predictions based on spatial transcriptomics (ST) data.Furthermore, the emergence of spatial multiomics technologies, such as spatial ATAC-RNA-seq (Zhang et al. 2023), provides paired chromatin accessibility states with gene expression.This presents an opportunity for accurate TR activity and GRN prediction using spatial multiomic information.
In this study, we have developed SCRIPro, a computational framework designed to predict TR activity and reconstruct TR-centered GRNs for both single-cell and spatial multiomic data.SCRIPro addresses the challenge of sparse single-cell or spatial multiomic signals by employing a density clustering approach that considers either expression or spatial similarities.Additionally, SCRIPro leverages a comprehensive TR reference compiled from TR ChIP-seq peaks obtained from Cistrome DB (Zheng et al. 2019), along with motifs for 1252 human TRs and 994 mouse TRs.Finally, SCRIPro combines TR-target importance from epigenomic data with TR-target expression from transcriptomic data to construct the GRNs.We demonstrate the robustness and versatility of SCRIPro by applying it to various datasets, including human B-cell lymphoma, mouse hair follicle development single-cell multiome data, mouse embryo Stereo-seq datasets at consecutive developmental time points, and P22 mouse brain Spatial-ATAC-RNA data.The results showcase the superior performance and utility of SCRIPro in diverse biological contexts.

Overview of the SCRIPro method
To decrease computational requirements and to alleviate the impact of dropout events inherent in single-cell sequencing data, as well as to improve the stability of the inferred GRNs, we adopted a divide-and-conquer approach.SCRIPro begins by constructing SuperCells (Littman et al. 2023), which are aggregates of gene expression profiles from clusters of individual cells exhibiting similar transcriptional activity.Following this initial step, SCRIPro identify a set of marker genes for each SuperCell.These marker genes serve as representative features of their respective SuperCells and are subsequently utilized as inputs for the LISA (Qin et al. 2020) framework.Within LISA, SCRIPro implement ISD analyses to assess the effects of TR perturbations on the expression of these marker genes.The outcome of this process is a GRN constructed at the SuperCell level, which provides a higherorder representation of the regulatory landscape across the pooled cellular subpopulations.
In SCRIPro, for the transcriptomic-only data, we will use similar strategies from LISA to reconstruct the chromatin landscapes from bulk DNase/H3K27ac reference.For epigenomic-only data, we will use SCRIP to infer potential regulators.For multiomics data, we will perform alignment and use matched or paired epigenome datasets as chromatin landscape.Besides, SCRIPro considers spatial adjacency when generating SuperCells.We believe that with these features, SCRIPro could provide a robust and flexible solution for GRN inference supporting different single-cell and spatial modalities.

ChIP-seq collection
We employed the identical ChIP-seq datasets for SCRIP, which were sourced from the Cistrome data Browser (Zheng et al. 2019).Following a meticulous reorganization and refinement of annotations pertaining to factors, cell types, and tissues, we applied stringent filtering criteria: a median quality score for raw sequences exceeding 25, a uniquely mapped reads proportion surpassing 50%, a PCR bottleneck coefficient greater than 0.8, an enrichment of peaks by a minimum of 10-fold in quantities exceeding 100, a fraction of reads in peaks above 0.01, and an overlap of the top 5000 peaks with joint DNase I hypersensitive sites (DHS) exceeding 70%.We preserved only those peaks demonstrating a minimum 5-fold enrichment within each dataset.Datasets comprising fewer than 1000 peaks were subsequently excluded.After this filtration process, we amassed a total of 2314 human and 1920 mouse TR ChIP-seq datasets, encompassing 671 and 440 TRs, respectively.

Motif collection
In an effort to enhance the representation of TFs, we also synthesized pseudo-peaks data by conducting motif scanning.We procured the TF position weight matrix motifs for both human and mouse, amalgamated and transformed the data formats, and subsequently utilized the HOMER (Heinz et al. 2010) software to scan the genome for motif-associated genomic intervals.We juxtaposed the scanning outcomes with the Encyclopedia of DNA Elements (ENCODE) (2012) candidate cis-regulatory elements and the Cistrome Union DHS compilation, excluded any intersections with known blacklisted regions, and augmented the length of each motif locus to 340 bp to facilitate comparison with the ChIP-seq datasets.By imposing a filter based on the P-value, we retained the top 25 000 binding sites, ultimately yielding 916 human and 816 mouse pseudo-peaks derived from motif scanning.

Synthetic reference dataset
Subsequently, we integrated these meticulously curated SCRIPro datasets into the LISA framework, generating reference HDF5 (Hierarchical Data Format version 5) files for both human and mouse datasets, which comprised the count of ChIP-seq peaks and the metadata associated with the datasets.In summation, the constructed human TR index encompasses 1252 TRs, while the mouse TR index includes 997 TRs.

Regulatory potential model
To assess the influence of TRs on their target genes, we conducted a detailed analysis of ChIP-seq datasets contained in HDF5 files using the regulatory potential (RP) model (Wang et al. 2020).The computational formula for RP, adhering to the SCRIP methodology, is as follows: In this equation, n denotes the count of TR binding sites proximal to the TSS of gene g, while d i signifies the distance from the ith peak's center to the TSS.For TRs exhibiting more than 20% of peaks in the promoter region, they are categorized as promoter-centric TRs, and the half-decay distance d 0 is set at 1 kb.In contrast, TRs with fewer promoterlocalized peaks are classified as enhancer-centric, with their half-decay distance established at 10 kb.To enhance computational efficiency, the analysis was restricted to genes located within a half-maximal regulatory range (half-decay distance) of 15d 0 , as peak scores beyond this threshold diminish to less than 0.0005.
An enhanced version of the RP model was employed to explore the potential target genes regulated by TFs.This model extends beyond incorporating exon information by also accounting for the regulatory impact of adjacent genes.Specifically, if a peak is detected within the exon region of a gene, the corresponding score is assigned the value of 1, which is then normalized relative to the gene's total exon length.Conversely, should a peak reside in the promoter or exon region of a neighboring gene, its score is designated as 0.

Single-cell RNA-seq data
We first preprocess counts matrix for each cell following Scanpy workflow (Wolf et al. 2018).Next, principal component analysis is performed using scanpy:tl:pcaðÞ function to reduce data dimensionality.We compute neighboring cells for each cell using scanpy:pp:neighborsðÞ function, setting the number of neighbors to N_neighbors (default 10) and the number of principal components to N_pcs (default 40).Finally, we apply scanpy:tl:umapðÞfunction for UMAP dimensionality reduction and perform Leiden clustering using scanpy:tl:leidenðÞ function with a resolution parameter set to resolution (default 0.8) to stratify cellular populations.

Spatial transcriptomics RNA-seq data
After normalizing the count matrix, SCRIPro utilizes the approach from STAGATE (Dong and Zhang 2022) to build a spatial neighbor network (SNN), integrating the similarity between adjacent spots of a given location, and subsequently transforms this spatial data into an undirected neighbor network based on a predetermined radius r.Subsequently, utilizing a cell type-aware module, the SNN is pruned based on pre-clustered gene expression.After constructing SNN, SCRIPro employs a graph attention auto-encoder to integrate gene expression and spatial location.In cell type-aware module, SCRIPro employs a self-attention mechanism for both types of SNNs.The learned spatial similarities from the standard SNN and the cell type-aware SNN are denoted as att spatial and att aware , respectively.The final spatial attribution used is a linear combination of these two (where α, the default hyperparameter set at 0.5, represents the weight of the cell type-aware SNN): The output of the decoder is considered as the reconstructed normalized expressions.We then perform dimensionality reduction on the integrated data using scanpy:pp:neighborsðÞ and scanpy:tl:umapðÞ, following the same methodology as above.

SuperCell construction
After employing Leiden clustering (Traag et al. 2019) to identify cell subsets at a resolution parameter of 0.8, each cluster is treated as an independent RNA-seq dataset.Using a binary search method, we iteratively increase the secondary resolution to obtain more refined Leiden cluster classifications until the average number of cells in each small Leiden cluster reaches a user-specified N. SuperCells within each large Leiden cluster that contain fewer than 30 cells are merged with the nearest SuperCell to ensure a minimum of 30 cells per SuperCell.
Marker genes are then computed for each Supercell: For smaller scRNA-seq datasets, we identify the top 500 marker genes per Supercell using the scanpy.get.rank_ge-nes_groups_df()method.
For larger datasets (cell number >150 000), we recommend you to use large-scale marker gene selection strategy.We extract top 1500 genes in each leiden cluster, and then identify genes expressed above the 60% percentile in each Supercell as marker genes for subsequent analyses.SuperCells with fewer than 35 selected marker genes are excluded from further calculations

Chromatin landscape construction
SCRIPro has two different strategies for constructing chromatin landscapes: 1) For scRNA-seq data, SCRIPro use chromatin landscapes constructed in "Reference Dataset" (see Section 2).Then SCRIPro performs LISA's ISD calculations for each SuperCell's marker genes within eight chunks (by default), obtaining results for each SuperCell.2) In the context of multiomics data, there are generally two scenarios: whether the barcodes from scRNA-seq and scATAC-seq are matchable.If the barcodes match, such as with 10× Chromium, clustering is first performed using scRNA-seq data to delineate SuperCells, and then the scATAC-seq data is used to construct the landscape for each SuperCell during the ISD step.If the barcodes do not match, as with scMultiome-seq, SCRIPro initially employs GLUE (Cao and Gao 2022) to integrate the two datasets, matching RNA and ATAC omics information within the same cell.Subsequently, the reconstruction of Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro the ATAC landscape is carried out following the steps described above.To prepare the scATAC-seq data, we sort it using the bedtools (Quinlan and Hall 2010) sort command and merge intervals within each sorted TSV file, ensuring that they do not exceed 1000 bases, using the bedtools merge command.Each merged TSV file is then converted into a bigwig format file using the bedGraphToBigWig command (Kent et al. 2002).These bigwig files, corresponding to each SuperCell, are used as the landscape during the ISD step.

In silico deletion in SCRIPro
SCRIPro performs in silico deletion with the same strategy as LISA.LISA's chromatin landscape model uses L1-regularized logistic regression to select an optimum sample set for H3K27ac ChIP-seq or DNase-seq samples.LISA first calculated the chrom-RP for each RefSeq gene.The chrom-RP for gene k in sample j is defined as L is set to 100 kb, The weight w i represents the regulatory impact of a locus at position i on the gene k's transcription start site at genomic position t k .s ji is the signal of chromatin profile j at position i.
LISA also calculates peak-RP for each set of ChIP-seq data.The definition of peak-RP is same to what has mentioned above.Then the ISD method recalculates the chrom-RP after erasing the signal in all 1-kb windows containing at least one peak from a putative regulatory cistrome, and then comparing the model RPs with and without deletion to produce a ΔRP value for each gene.The combined statistics method for TR ranking compares the peak-RPs or ΔRPs of the query gene set with that of the background gene set.It uses the onesided Wilcoxon rank-sum test and combines peak-RP, DNase-seq, and H3K27ac chrom-RP for ChIP-seq-based methods.The Cauchy combination test is used to compute a summary P-value for each TR.

Calculate SCRIPro TR score
SCRIPro applies a negative logarithmic transformation to the summary P-values obtained from LISA's ISD results, serving as the score for individual TRs within each SuperCell.
Additionally, for each TR and its corresponding targets within a SuperCell, we calculate a z-score relative to the mean: where E represents the expression value of TRs (represented by m) or target genes (represented by n), while i denotes each SuperCell.More specifically, k represents the target genes regulated by TR m.For each set of ChIP-seq data, we select genes with an RP score >5 as the target genes for this TF.If the number of genes with an RP score >5 is less than 300, then we choose the top 300 genes in RP score ranking as the target genes.For each TR within a SuperCell, we first calculate the z-score of the TR value to obtain M TR [Equation ( 5)].
Then, We average the z-scores of each target gene to get M Target [Equation ( 7 )].Both M TR and M Target clip with (−4,4) as the threshold [Equation ( 6)], and the influence of extreme values on them has been subtracted.Then M TR and M Target is added to obtain the M exp [Equation ( 9)], which is the coefficient for LISA P-value correction.The TR activity score (S) is equal to N (negative logarithmic transformation of the summary P-values of each TR) multiplied by the min-max normalized M exp [Equation ( 10)].These steps ensure that both the expression of TR and target genes were included when calculating the final TR activity score.
2.9 Dataset analysis methods

CRISPRa human T cells dataset 2.9.1.1 Preprocessing
We downloaded this dataset (PRJNA787633) from http:// www.perturbase.cn/download.The downloaded data has already undergone preprocessing and mixscape analysis.We selected cells with the "gene" label in the SCRIPro reference TR dataset for downstream AUPRC and AUROC analysis.We applied SCRIPro (SCRIPro reference), SCENIC, SCING, and LISA (LISA reference) to this dataset with default parameters, and then performed downstream performance testing using the TR enrichment scores inferred by these methods.Specifically, SCRIPro uses the tf_score matrix, SCENIC uses the AUCell matrix, and SCING uses the aucs_mtx matrix for TR in the corresponding cluster of gen-e_membership, where the LISA P value is P_value_matrix calculated by SCRIPro.

Performance comparison of different SuperCell settings
We tested the performance under different supercell settings by adjusting the clustering parameters of SCRIPro.Specifically, we modified the Cell_num parameter in Ori_Data to values of (10,20,30,50,65,100,200).For Metacell, we set the target_metacell_size to 50, which resulted in 581 metacells with an average of 28.7 cells per metacell.Similarly, we computed supercells using the Cell_num parameter, setting cell_num to 30 to achieve the same granularity as the metacells, resulting in an average of 29.78 cells per supercell.We then calculated the AUROC and AUPRC for each setting, and recorded the number of supercells and the runtime.These results were then visualized using Matplotlib to create line charts.

Performance comparison of different methods
We use the "gene" labels in the dataset as the gold standard to compute the AUROC and AUPRC.Specifically, if a cell's gene label corresponds to a particular TR, it is labeled as 1; otherwise, it is labeled as 0. Using this data and the TR enrichment scores obtained from different methods, we calculate AUPRC and AUROC using the precision_recall_ curve and auc functions from the sklearn package.

B-cell lymphoma dataset 2.9.2.1 Preprocessing
In the scRNA-seq dataset, cells with fewer than 200 genes and genes present in fewer than three cells were removed.We retained only cells that had both RNA-seq and ATAC-seq counts.We applied SCRIPro with default parameters to the filtered scMultiome-seq peak count matrix to evaluate the activity of TRs.For each TR within a SuperCell, we first calculate the z-score of the TR value to obtain M TR [Equation ( 5 These steps ensure that both the expression of TR and target genes were included when calculating the final TR activity score.The activity scores of different TRs were visualized as heatmaps with seaborn's clustermap and projected onto UMAP plots with Scanpy.After integrating the original scATAC-seq data with SCRIPro, the resulting landscape was displayed using the IGV genome browser (Robinson et al. 2011).

Clustering performance comparison
For the assessment of SCRIPro, we chose to compare it with SCENICþ (Bravo Gonz� alez-Blas et al. 2023).The SCENIC algorithm (Aibar et al. 2017), which is the precursor to SCENICþ, is a widely recognized method for GRN inference.SCENICþ builds upon SCENIC by providing single-cell resolution TF activity scores, allowing for a comprehensive performance comparison with SCRIPro.
Since SCRIPro relies on ChIP-seq datasets, it is not suitable for direct comparison with traditional methods that are based on synthetic GRN datasets.Instead, we compared SCRIPro with SCENICþ in terms of gene expression correlation.Specifically, SCRIPro selects the TR activity score in SuperCells and calculates the correlation with gene expression within those SuperCells.On the other hand, SCENICþ's results are correlated with the original cell expression.Both methods were used with their default parameters.
To visualize and cluster the number of target genes of TFs in Tumor B Cells and B Cells, we utilized the sns:clustermapðÞ function in the Seaborn library, using the default parameters.Additionally, we employed Metascape (Zhou et al. 2019) to calculate the GO terms of the overlapping genes identified by the two methods.Finally, we visualized the GO terms using an R script.

Hair follicle development dataset 2.9.3.1 Preprocessing
We annotated SHARE-seq data cell types using labels from the original study.SCRIPro was applied with default parameters to the SHARE-seq data to assess the activity of TRs in each cell.

Pseudotime analysis
We utilized MIRA (Lynch et al. 2022) for trajectory analysis of the SHARE-seq data.We designated ORS as the starting cell type and IRS, Cortex, and Medulla as the terminal cell types to study the differentiation trajectory in hair follicle development.Cells were ordered by pseudotime using the "mira_pseudotime" column from MIRA results.The activity score of each TR in each terminal cell type (Medulla_prob/ IRS_prob/Cortex_prob > 0.8) was visualized in ternary plots.
2.9.3.3ATAC-RNA analysis SCRIP (Dong et al. 2022) was used to perform TR activity analysis on the scATAC-seq data from the SHARE-seq data, applying the SCRIP enrichment function with default parameters to the peak count matrix.Subsequently, we transformed the numerical scores from two distinct matrices-single-cell RNA sequencing data (RNA-infer) and single-cell ATAC sequencing data (ATAC-infer)-into their corresponding ranks.We then proceeded to compute the discrepancies between the two matrices by subtracting the RNA-infer ranks from the ATAC-infer ranks, thereby deriving a set of differential rankings that culminate in final score.These scores were clustered and compared according to the pseudotime obtained from MIRA (Lynch et al. 2022).

Mouse embryonic development dataset 2.9.4.1 Preprocessing
The Stereo-seq dataset detailing mouse embryonic development was procured from the specified database (https://db.cngb.org/search/project/CNP0001543/).SCRIPro build SNN and perform graph attention auto-encoder using STAGATE strategy.We utilized SCRIPro to process embryonic data at E16.5 to identify 28 spatial clusters employing a large dataset gene selection strategy.Standard preprocessing steps, including quality control, normalization, and data filtering, were executed prior to downstream analyses.Motif pattern was downloaded from JASPAR (https://jaspar.elixir.no/)(Fornes et al. 2020).

Identification of TR activity co-expression modules
The R package Giotto binSpectðÞ function was employed to discern TRs exhibiting spatial coherence in their activity scores as determined by SCRIPro.Giotto (Chen et al. 2023) identified 40 distinct spatial TR modules in E16.5 embryo datatset, which were subsequently analyzed using heatmap clustering.Centrality metrics were calculated using networkx Python package degree centralityðÞ function.

TR activity analysis and comparison
To assess the cluster purity of the identified spatial domains, we adopted ROGUE score (https://github.com/PaulingLiu/ROGUE).TF activity scores were computed using both SCENIC and SCRIPro, adhering to their default parameters.The re-clustering of these scores, integrating spatial context, facilitated the computation of the ROGUE score via the R package ROGUE.Normalized mutual information scores were derived by comparing the clustering outcomes of TR activity scores obtained from both methods against those from STAGATE clustering.

P22 mouse brain spatial multiomics dataset analysis 2.9.5.1 Preprocessing
The spatial ATAC-RNA-seq multiomics dataset of the P22 mouse brain was retrieved from the cited source (GSE205055).The scripro:cal ISD parallelðÞ function from SCRIPro was then employed to analyze the multiomics data, resulting in the Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro determination of TR activity scores for each SuperCell.We employed the binSpectðÞ function in Giotto to identify spatially variable TRs and subsequently conducted heatmap clustering to discern cell type-specific TRs.The pseudotime spatial trajectory analysis of neural stem cell emergence was explored by spaceFlow (Ren et al. 2022).

Wasserstein distance calculation
To investigate the impact of signaling pathways on TR expression in spatial contexts, we used the Wasserstein distance (Flamary et al. 2021) to measure differences in TR and gene expression across brain regions, utilizing spot positions as coordinates and expression levels as values.For a ligand expressed in m spots and a receptor expressed in n spots, we formed a matrix D 2 R m × n to record the Euclidean distances between spots, based on spatial coordinates.By identifying an optimal transport γ 0 2 R m × n that minimizes the total transport cost of ligand and receptor expression distributions: L 2 R m × 1 and R 2 R n × 1 .This total transport cost is determined by summation of the products of the transport value and the Euclidean distance between each spot.Based on the optimal transport plan, the Wasserstein distance can be computed as follows: We filter target gene expression presented in the L-R interactions collection [collected from NicheNet (Browaeys et al. 2020)], considering these as signaling pathways in cell-cell interactions (CCI).Next, we screened the L-R interaction pair based on the expression pair of TR and its target genes in adjacent cells, and considered it to be a potential signaling pathway for CCI.

Results
3.1 SCRIPro combines comprehensive chromatin and TR binding references to predict GRNs for both single-cell and spatial multiomics data SCRIPro comprises expression and chromatin modules corresponding to input transcriptomic and epigenomic data (Fig. 1).(i) The expression module accepts single-cell or spatial transcriptome data.To overcome gene coverage limitations and minimize drop-out effects, SCRIPro utilizes a density clustering approach based on K-nearest neighbor (KNN) and a graph attention auto-encoder to generate SuperCells (Littman et al. 2023) with consistent expression patterns or spatial coordinates (Dong and Zhang 2022) (Fig. 1, Supplementary Fig. S1a and b).The SuperCell expression is then used to evaluate TR expression levels and coexpression patterns within GRNs.For expression-only data, SCRIPro curates an extensive collection of public chromatin references encompassing 1471 DNase-seq data and 2575 H3K27ac data for human and mouse samples (Zheng et al. 2019) (Supplementary Fig. S1c-e).Subsequently, SCRIPro employs a logistic regression-based approach to scan the chromatin reference and reconstruct in silico chromatin landscapes that best match resemble marker genes identified in SuperCells (Supplementary Fig. S1a).(ii) The chromatin module of SCRIPro can utilize experimentally paired or GLUE-integrated single-cell or spatial ATAC-seq inputs to build chromatin landscapes.SCRIPro then uses the in silico or paired chromatin landscapes to assess the importance of TRs.
SCRIPro compiles a comprehensive TR reference dataset comprising 2314 human TR ChIP-seq data and 1920 mouse TR ChIP-seq data (Supplementary Fig. S2a and b, Table S1).This reference dataset is enriched with high-quality motifs from cis-BP and HOMER, covering 1252 human TRs and 994 mouse TRs (Supplementary Fig. S2c-f).SCRIPro conducts in silico deletion analyses of TR binding sites to evaluate the TR impact on the expression of marker genes within SuperCells (Fig. 1 and Supplementary Fig. S1a).The potential TR targets are determined based on the best-matched TR ChIP-seq data for a given SuperCell using an RP model (Supplementary Fig. S1a).Subsequently, SCRIPro integrates TR-target expression from transcriptome data and TR-target importance from epigenome data to compute the final TR activity within a SuperCell, identify regulated genes, and construct TR-centered GRNs.Moreover, SCRIPro can directly apply the SCRIP method for scATAC-seq-only data to predict TRs (Supplementary Fig. S1b).
SCRIPro's performance is further evaluated through downstream benchmarking, including assessing TR activity accuracy, clustering analysis, identifying cell type or stage-specific GRNs, and region-specific GRNs across various biological systems.

Performance evaluation and parameter selection
To quantitatively evaluate SCRIPro's performance, we used a published single-cell CRISPR activation dataset that screens for T-cell stimulation regulators in primary human T-cells as the benchmark for GRN inference methods (Fig. 2a).We specifically analyzed cells containing sgRNAs targeting 15 TRs from the original 70 hits to benchmark performance.Our hypothesis was that cells with introduced TR sgRNAs would demonstrate increased TR activity, detectable through GRN inference algorithms.Initially, we benchmarked various parameters and selected 30 cells per SuperCell, balancing performance and computational efficiency (Fig. 2b and c).While different methods of generating SuperCells showed no significant differences in performance, the SuperCell strategy proved to be more computationally efficient than MetaCell (Persad et al. 2023) (Fig. 2d).Furthermore, we evaluated the reconstructed TR activities using different methods.Both AUROC and AUPRC metrics indicated that SCRIPro outperformed SCING, SCENIC, and the LISA methods (which SCRIPro was modified from) (Fig. 2e-g).The consistent results were observed at both single-cell and supercell levels, as well as when using different TR references.These findings collectively demonstrate the robust and accurate performance of SCRIPro compared to existing methods.

SCRIPro identifies tumor-specific GRNs in the human B-cell lymphoma 10× multiome dataset
We benchmarked the performance of SCRIPro on a human B-cell lymphoma (small lymphocytic lymphoma, SLL) dataset using 10× single-cell multiome containing 14 566 cells.We annotated nine major cell types based on the expressed marker genes of each lineage, and tumor B-cells form a slightly different cluster compared to normal B-cells (Fig. 3a).For multiome data, SCRIPro first integrates RNA and ATAC data to obtain integrated clusters (Supplementary Fig. S3a).
SCRIPro could accurately predicts well-known master regulators including SPI1, IRF1, FLI1, and STAT2 for mono/macrophages, GATA3, RUNX3, SMARCA4, JUND, and MYB for T-cells, PAX5, BCL2, and IRF4 for B and tumor B-cells (Fig. 3b, Supplementary Fig. S3b).The identified TR ChIPseq reference corresponds well to the chromatin landscape from scATAC-seq at the SuperCell level, suggesting that the ChIP-seq reference is informative in predicting TR binding events (Fig. 3c).We compared the performance of SCRIPro with SCENICþ, a widely used algorithm that infers GRNs based on a combination of motif enrichment and GRNBoost2 (Moerman et al. 2019).SCRIPro successfully identifies IRF8 in plasmacytoid dendritic cells (pDCs), which has been reported to be essential for the development of pDC and type 1 conventional dendritic cells (Sichien et al. 2016).However, SCENICþ fails to assign IRF8 scores, possibly due to the limited number of pDC cells.SCRIPro specifically enriches SPIB, a driver regulator that mediates apoptosis through the PI3K-AKT pathway in diffuse B-cell lymphoma (Takagi et al. 2016), in tumor B-cells but not normal cells, while SCENICþ fails to predict the SPIB activity in all the Bcells (Fig. 3d).Finally, we assume the inferred TR activity should be highly correlated with their gene expression, with a positive correlation for activators and a negative correlation for repressors.SCRIPro shows a significantly high concordance between TR activity and TR expression compared to SCENICþ, with over 80% (81/105) of the factors showing better consistency (Fig. 3e, Supplementary Fig. S3c).Importantly, SCRIPro predicts activity scores for nearly 800 TRs, while SCENICþ is only able to evaluate over 100 TR activities (Supplementary Fig. S3d).In summary, these analyses suggest that SCRIPro can accurately infer GRNs globally and outperforms existing methods in terms of consistency between TR activity and TR expression.
B-cell lymphoma develops as a result of abnormal interactions between B cells and the microenvironment during development (Garaud et al. 2019).Given the accurate identification of TR activities specific to tumor B-cells, including SPIB, using SCRIPro, we conducted a systematic analysis to identify tumor-specific GRNs that potentially drive malignancy.Unsupervised clustering of TR activity identifies three independent clusters in tumor B-cells, compared to two clusters in normal B-cells (Fig. 3f).Both Groups 1 and 2 were shared between tumor and normal B-cells, with Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro Group 1 enriched in B-cell activation and differentiation, and Group 2 enriched in cytoplasmic translation that is important for B-cell development (Fig. 3g, Supplementary Fig. S5a-d).Notably, the majority of TRs in Group 3 of tumor B-cells belonged to the ZNF family, which has been reported to silence retrotransposons and regulate epithelial proliferation (Cassandri et al. 2017, Imbeault et al. 2017) (Fig. 3f-h).These analyses suggest that tumor B-cells may employ alternative proliferation strategies, such as activating epithelial proliferating genes.Collectively, our analyses demonstrate the superior accuracy and sensitivity of SCRIPro in identifying cell-type-specific TRs, even enabling the discrimination between similar cell types.

SCRIPro reveals epigenetic priming effects of mouse hair follicle differentiation
TRs play a crucial role in driving cell type differentiation, compared to static datasets that only contain differentiated cells, reconstructing GRNs from developmental datasets presents challenges due to subtle differences along the development trajectories.We next benchmarked the performance of SCRIPro on a hair follicle differentiation dataset generated  Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro using the SHARE-seq protocol (Ma et al. 2020).The cells were annotated into seven major cell types including outer root sheath (ORS), transit-amplifying cells-1 (TAC-1), and TAC-2, medulla, hair shaft-cuticle cortex (cortex), inner root sheath (IRS), and mix cells (Supplementary Fig. S6a).Starting from ORS cells, pseudo time analyses suggest three distinct differentiation paths that led to the formation of medulla, cortex and IRS cells (Supplementary Fig. S6b).SCRIPro robustly identifies TRs enriched in the initial ORS cell type and the three different trajectories (Supplementary Figs S6c, S7a, and S8a).For instance, medulla cells exhibited unique TR activity including Prdm1, Pbx3, and Rnf2, known for their significant regulatory roles in medullary-related functions (Roberts et al. 2017) (Rhee et al. 2004).Cortex cells were characterized by specific activities of Lef1 (Zhang et al. 2013) and Rora (Steinmayr et al. 1998).Furthermore, IRS cells exhibited high activity of Gata3, an important TR in the skin stem cell lineage during the initiation of epidermal stratification and hair follicle IRS patterning (Kaufman et al. 2003) (Supplementary Fig. S6d).These analyses demonstrate the capability of SCRIPro to identify lineage-specific GRNs along the developmental trajectory even with subtle differences.
Traditional GRN prediction methods designed for scRNAseq datasets, such as SCENIC, PIDC, and SCODE, primarily rely on the co-expression information to infer TR regulation.However, the prediction of TR activity based on gene expression and chromatin accessibility can be decoupled due to the influence of epigenetic priming.Since SCRIPro could both reconstruct the TR activity from expression (transcriptomeonly module) and chromatin accessibility (epigenome-only or paired module), we compared the difference between RNA versus ATAC inferred TR activity to systematically evaluate the potential priming effect.We focused on specific developmental lineage, e.g. the ORS to medulla path.Encouragingly, our analysis revealed a TR group in which chromatininferred activity preceded expression-inferred activity (Supplementary Fig. S6e and f, Group 1), indicating a strong epigenetic priming effect that leads to a delay in target gene expression.Conversely, Group 2 exhibited an opposite trend, with RNA-inferred activity preceding ATAC-inferred activity (Supplementary Fig. S6g).Notably, many factors in this group possess repressive functions including Jarid2 and Mtf2, subunits of the PRC2 complex reported in mouse embryonic stem cells (Zhang et al. 2011), thus validating the priming effect of repressive H3K27me3 modifications.Group 3, comprising the majority of TRs, displayed largely synchronous patterns regardless of whether they were predicted using RNA or ATAC data (Supplementary Fig. S6e and h).Similar patterns were observed for the ORS to cortex and ORS to IRS directions (Supplementary Fig. S8b).In summary, these analyses confirm the epigenetic priming effect using the hair follicle development SHARE-seq data and also underscore the importance of chromatin landscapes in accurately predicting the TRs required for the differentiation of future trajectories.

Integration of spatial information enhances TR activity and GRN prediction on E16.5 mouse embryo Stereo-seq data
To showcase the superior performance of SCRIPro in reconstructing spatial GRNs by integrating spatial locations and neighborhood information, we applied it to a Stereo-seq dataset derived from E16.5 mouse embryos (Chen et al. 2022) (Fig. 4a).Our spatial clustering strategy successfully identified 28 cell types with unique spatial locations (Fig. 4a, Supplementary Fig. S9a), and exhibiting denser cell positioning and higher within-cell type homogeneity compared to other spatial-based methods, including BayesSpace (Zhao et al. 2021), Giotto (Dries et al. 2021), and the raw annotations from Stereo-seq (using Squidpy) (Chen et al. 2022) (Supplementary Fig. S9b).For instance, in the heart region, the BayesSpace clustering mixed different heart cell types, while Giotto, the Stereo-seq raw annotations, and SCRIPro all revealed distinct spatial patterns corresponding to the various heart cell populations (Supplementary Fig. S9c).However, in the brain and spinal cord regions, Giotto failed to differentiate thalamus neurons, and the raw annotations could not separate the spinal cord from the mid/hindbrain.In contrast, the spatial domains identified by SCRIPro delineated these anatomical sub-regions (Supplementary Fig. S9d, indicated by black arrows).Furthermore, the ROGUE score analyses indicated that the spatial domains defined by SCRIPro were more homogeneous compared to the other methods (Supplementary Fig. S9e).We next quantified the TR activity in different cell types and re-clustered the cells by TR activity.SCRIPro demonstrated a significantly higher consistency with the cell types generated by spatial clustering compared to SCENIC (Supplementary Fig. S10a and b).For instance, we examined the TRs Foxo1 and Foxo3, known to be enriched in the brain and facial regions including the striatum, anterior thalamic nucleus, olfaction, and dental epithelium (Hoekman et al. 2006).SCRIPro robustly predicted the TR activity in these regions, with Foxo3 exhibiting a dispersed distribution compared to Foxo1 (Hoekman et al. 2006).In contrast, SCENIC failed to enrich either factor in the brain and facial regions (Supplementary Fig. S10b).Gata6 plays a critical role in cardiac function with pronounced enrichment in the outflow tract and aortic arch (Xin et al. 2006), SCRIPro distinctly identified the Gata6 activity across different spatial subsets of the heart (Supplementary Fig. S10b).Additionally, SCRIPro exhibited a broader TR coverage and successfully predicted the spatial TR activity for Neurod2 and Otx2 in distinct brain regions, which were not covered by SCENIC (Supplementary Fig. S10c).Finally, we benchmarked the performance of SCRIPro on homologous factors with similar motifs.SCRIPro accurately localized Gata4 in the heart region and Gata1 in the developing liver region, despite the highly resemblant motifs of these two factors (Fig. 4b).Similarly, SCRIPro demonstrated the ability to distinguish between Tcf12 and Tcf7, demonstrating the ability in identifying cell type-specific binding patterns of homologous factors through incorporating ChIP-seq data  Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro (Supplementary Fig. S10d).Collectively, these results suggest SCRIPro could accurately predict cell type and region-specific GRNs than existing tools on the spatial transcriptomic-only dataset, with the ability to distinguish TRs with similar motif patterns.TRs and their associated cofactors collaborate to modulate downstream genes, thereby establishing GRNs instrumental in determining cell phenotypes.To evaluate the effectiveness of SCRIPro in identifying TR regulons, we focused on factors enriched in the embryonic liver, specifically Gata1 (Papadopoulos et al. 2013) andTal1 (Elefanty et al. 1999).We conducted a screening of their target genes and pruned the network based on TR-target co-expression, resulting in the generation of co-regulation regulons (Fig. 4c).Functional analyses revealed that both TRs are linked to myeloid cell and erythrocyte differentiation and homeostasis, indicating a potential co-binding of these two TRs in regulating hematopoietic function in the embryonic liver (Fig. 4c).Similarly, by iteratively clustering the spatially variable TRs and regulon co-expression patterns, we successfully identified 40 coregulated gene modules in the mouse E16.5 embryo (Fig. 4d and Supplementary Fig. S11a and b).These modules show highly specific spatial patterns, including Pax5 for forebrain (M1), Neurog2 for mid/hindbrain (M10), Myog for muscle cells (M38), Msx1 in maxillary and limb mesenchymal cells (Jumlongras et al. 2001) (M3), and Hoxc8 in the mouse embryonic spine (Blackburn et al. 2009) (M36) (Fig. 4d and Supplementary Fig. S11b).Within each module, SCRIPro also constructed TR-centered GRNs to identify crucial factors.For example, Myod1, Myog, Mef2d, and Myf5 were identified as key nodes in the M38 muscle module, aligning well with their known roles as muscle master regulators (Fig. 4e).In summary, these analyses demonstrate the ability of SCRIPro to accurately identify TR target genes and construct cell type-specific GRNs, thereby enabling the identification of potential novel master regulators for each lineage.

SCRIPro detects stage-specific GRNs in cardiomyocytes across mouse embryonic heart sections
Our previous analyses demonstrate that SCRIPro could identify lineage-and stage-specific GRNs in the hair follicle development dataset, we next evaluated whether it is suitable for analyzing time-series spatial datasets.We applied SCRIPro on mouse embryo stereo-seq data (Chen et al. 2022) spanning from E11.5 to E15.5 across five continuous stages (Fig. 5), and aligned these spatial slides using SLAT (Chen-Rui et al. 2023) (Fig. 5a).Our focus was on heart development, given its early formation in mammalian embryos and the diverse changes observed in cardiomyocytes (CM) from E11.5 to E15.5 (Fig. 5b).We categorized cells in the heart region into different celltypes from E11.5 to E15.5 (Supplementary Fig. S12a and b).Encouragingly, SCRIPro identifies highly specific TR activity among different cell types starting from E11.5, including Gata6, Jund, and Mef2c for CM, Tcf4 and Hoxb3 for epicardium, and Sox9 for fibroblasts (Fig. 5c).To further elucidate the dynamics of TRs and their regulons during CM development, we constructed a cross-stage GRN based on enriched TRs in CM (Supplementary Fig. S13b).Our analyses identified stage-shared regulators including Nkx2-5, Hand2, Gata4, and Mef2d.Most of these factors are supported by existing literatures (Jumlongras et al. 2001, Blackburn et al. 2009) (Fig. 5d and Supplementary Fig. S13a).We applied the same analysis to single-cell mouse embryo hearts across the same developmental stages.Focusing on CM as an example, we found that the inferred TRs at different stages were generally consistent between the scRNA-seq and ST data (Supplementary Fig. S12c and d).For instance, the TR Nkx2-5 was identified as important across all three stage.Additionally, stage-specific TRs, such as Foxp1 in E11.5-E13.5 CM and Clock in E13.5-E15.5 CM, as well as the cell-type-specific TR Tbx5 in atrial CM (Supplementary Figs S12d and S13a), exhibited similar dynamics in both scRNA-seq and ST data.However, we also observed differences in the inferred TRs between the two data modalities.The scRNA-seq-specific TRs at E11.5, including Hdac2, Dicer1, and Kdm2b, were primarily associated with DNA repair, epigenetic regulation, and metabolic functions, and were often nucleus-enriched.In contrast, the ST-specific TRs across the three stages, such as Stat3, Smad4, Tgif1, and Foxp3, were involved in intercellular signaling and immune-related functions (Supplementary Fig. S12d).
In addition to TR activity, SCRIPro robustly identified the TR regulons, for which can be used to evaluate the stable or dynamic regulation of different TRs (Fig. 5e).Most TRs showed conserved regulation of its regulon among different stages.However, Prdm16, a regulator for brown adipocyte differentiation (Harms et al. 2014), shows remarkable dynamics in its regulon (Cibi et al. 2020, Wu et al. 2022) (Fig. 5e).We performed functional analyses of different subsets of the Prdm16 regulon and found that the constant Prdm16 regulon was enriched in fatty acid beta-oxidation, consistent with its well-known regulatory functions (Supplementary Fig. S13c).Interestingly, the genes lost at E11.5 were highly enriched in glycogen metabolism, while the unique genes at E13.5 were specifically enriched in NADP metabolism (Supplementary Fig. S13c).These analyses suggest that Prdm16 may play a critical role in the metabolic reprogramming from glycolysis to oxidative phosphorylation in CM, which has been reported to be connected with the proliferation ability of CM (Puente et al. 2014, Li et al. 2023).In summary, SCRIPro effectively tracks and analyzes the dynamics of TRs as well as their regulons in mouse embryonic heart development, enabling future identification of novel lineage regulators from ST-only datasets.

Spatial multiomic prediction of GRNs reveals crosstalk between intracellular gene regulation and extra-cellular interactions in the P22 mouse brain
Finally, we applied SCRIPro to a spatial ATAC-RNA-seq dataset from the mouse brain on postnatal day 22, which provided paired expression and chromatin accessibility data along with spatial location.SCRIPro successfully identified 10 distinct spatial domains that corresponded well with specific anatomical structures (Fig. 6a).Notably, domains 0, 1, and 3 represented Cortex (CT) regions associated with advanced neural and emotional processing functions (Pessoa and Adolphs 2010).Domain 6 corresponded to the corpus callosum (CC) region, crucial for interhemispheric communication, while domain 9 aligned with the lateral ventricle (LV) region, known for neural stem cell origins.Pseudo-time analysis revealed an inward-to-outward neural stem cell emergence pattern consistent with the differentiation trajectories of neuron cells (Supplementary Fig. S14a).SCRIPro accurately predicted spatial variable TRs for each region.For instance, it identified Sox6 and Sox10 as significant TRs in the CC region, and Sox2, Sox4, and Sox11 in the LV regions, highlighting the pivotal role of the SOX family in neurogenesis (Stevanovic et al. 2021) (Fig. 6b).Additionally, Bcl11b and Foxp1 (Tamura et al. 2004) exhibit strong TR activity within the striatum, where Bcl11b plays a critical role in the differentiation of medium spiny neurons important to motor control (Arlotta et al. 2008) (Fig. 6b).Moreover, Mef2c (Barbosa et al. 2008), Neurod2, and Neurod6 (Lin et al. 2004, Bormuth et al. 2013) have been previously demonstrated to regulate gene expression in the CT region (Fig. 6c and Supplementary Fig. S14b).These results validate the accuracy of SCRIPro in predicting region-specific TRs using sparse spatial multiomic datasets.
To further assess SCRIPro's accuracy in predicting TR regulons from spatial multiomic data, we constructed GRNs for Sox2 and Mef2c, which were enriched in LV and CT regions, respectively (Fig. 6d).The targets and functions of these two factors exhibited distinct characteristics.The Sox2 regulon was notably enriched in the WNT signaling pathway and neural tube/epithelial tube development (Mercurio et al. 2022), while the MEF2C regulon predominantly converged on signal transduction, synapse organization, and hindbrain development (Harrington et al. 2020).These analyses emphasize the specific functions of these TRs and demonstrate the accuracy of SCRIPro in identifying TR-specific regulons.In addition to intrinsic gene regulation, cellular crosstalk could also regulate cell type-specific TR expression.We then investigated whether spatial GRN analyses could be used to identify extracellular regulations on TRs such as CCI.Specifically, we focused on the TRs with documented ligandreceptor pairs (see Section 2).For example, we observed a high enrichment of NOTCH1 activity in the LV region (D9), which is known to harbor neural stem cells.Interestingly, its upstream regulator Dlk1/2 (S� anchez-Solana et al. 2011) and Jagged1 (Fissel and Farah 2021) were found to have closer associations with the NOTCH1 regulon in the LV region compared to the CC region (D6) and CT region (D0) (Fig. 6e).Similar analyses using FGFR1 expression did not yield significant differences (Supplementary Fig. S14c).In summary, SCRIPro effectively utilizes spatial multiomic data to construct detailed GRNs in the mouse brain, revealing distinct TR specificity across different spatial regions and facilitating the exploration of extracellular regulations influencing TR expression in different regions.

Discussion
Constructing high-resolution GRNs from extensive singlecell and ST data is crucial for understanding gene regulation mechanisms in cell fate determination and disease development.In this study, we developed SCRIPro, a rapid and userfriendly method that accurately predicts GRNs.SCRIPro addresses the challenge of sparse signals at the SuperCell (Littman et al. 2023) resolution by accounting for both expression and spatial similarity.Notably, SCRIPro includes a chromatin reconstruction step designed for scRNA-seq and ST datasets without paired ATAC-seq data, significantly enhancing its usability.Through in silico deletion TR analyses using a comprehensive TR ChIP-seq reference, SCRIPro outperforms existing motif-based methods in various systems, including human B-cell lymphoma, mouse hair follicle development, mouse developing embryos, and brain at P22.We anticipate that SCRIPro will be widely utilized by researchers to identify crucial TRs underlying novel differentiation and disease mechanisms.
SCRIPro integrates rich information from external ChIPseq datasets, addressing a significant limitation in most existing GRN inference methods for single cells, such as SCENIC (Aibar et al. 2017), SCING (Littman et al. 2023), chromVAR (Schep et al. 2017), SCENICþ(Bravo Gonz� alez-Blas et al. 2023), Dictys (Wang et al. 2023), and CellOracle (Kamimoto et al. 2023), which rely heavily on motif information.Using only motif information can lead to a loss of cell-type-specific secondary TF binding in about 45% of cases, as estimated from ENCODE data.By including external ChIP-seq datasets, SCRIPro significantly enhances GRN inference performance, similar to recent methods using pre-trained neural networks from ENCODE data (2012).Moreover, SCRIPro is optimized for single-cell datasets.Unlike the LISA method, which was designed for bulk RNA-seq data and shows poor performance when applied directly to single-cell multiome data, SCRIPro uses the SuperCell strategy to overcome data sparsity, prunes targets using TR-TG expression correlation, and improves reference quality with stringent QC filters.These optimizations result in significantly improved performance compared to the existing LISA method.SCRIPro also offers flexibility with variable input data.For transcriptomiconly data, it uses strategies similar to LISA to reconstruct chromatin landscapes from bulk DNase/H3K27ac references.For epigenomic-only data, it employs SCRIP (Dong et al. 2022) to infer potential regulators.For multiomics data, SCRIPro aligns and uses matched or paired epigenome datasets as the chromatin landscape.Additionally, SCRIPro considers spatial adjacency when generating SuperCells, providing a robust and flexible solution for GRN inference (Yuan and Duren 2024) across different single-cell modalities.
Despite its current superiority over existing methods, SCRIPro has some limitations.Its performance relies heavily on the quality of public ChIP-seq datasets, which can affect its robustness.To address this, SCRIPro integrates motif scanning results into the TR reference to mitigate the reduction in TR reference coverage after filtering out low-quality datasets.Furthermore, the rapid development of single-cell/ spatial epigenomics technologies such as sciATAC-seq3 (Domcke et al. 2020), sciMAP-ATAC (Thornton et al. 2021), and some multiome epigenomics technologies such as Paired-Tag (Zhu et al. 2021), scCUT&Tag-pro (Zhang et al. 2022), and DOGMA-seq (Mimitou et al. 2021), accelerates the accumulation of high-quality epigenome data in the field.SCRIPro aims to incorporate these data into the chromatin and TR reference, expanding coverage to more cell types and improving performance, particularly on rare cell types.While SCRIPro adeptly predicts TRs at a SuperCell resolution, its precision at the single-cell level is still being established due to challenges related to background noise and data quality.To overcome these limitations, we are exploring the use of machine learning models, such as diffusion and autoencoders, to improve the quality of single-cell and spatial-omics data, thereby enhancing both the resolution and accuracy of TR prediction.Finally, the cellular spatial locations from spatial-omics data are essential for understanding in situ gene expression regulation, cell interactions, and signal transduction within spatial microenvironments.Currently, SCRIPro calculates upstream signaling pathways by constructing spatial GRNs and filtering ligand-receptor (L-R) pair expressions.However, this approach is limited by the Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro coverage of spatial-omics data and the number of L-R pairs in the database.Future development integrating GRNs, protein-protein interactions (PPI), and L-R co-occurrence is expected to significantly increase the connections between intracellular GRNs and extracellular CCIs.
In summary, SCRIPro is a promising tool that enables researchers to leverage extensive single-cell or ST data to identify driver TR regulations, with or without paired epigenome information.It facilitates the interpretation of GRNs across diverse cell types, cellular trajectories, and spatial domains.
)].Then, we average the z-scores of each target gene to get M Target [Equation (7)].Both M TR and M Target clip with (−4,4) as the threshold [Equation (6)], and the influence of extreme values on them has been subtracted.Then M TR and M Target is added to obtain the M exp [Equation (9)], which is the coefficient for LISA Pvalue correction.The TR activity score (S) is equal to N (negative logarithmic transformation of the summary P-values of each TR) multiplied by the max-min normalized M exp [Equation (10)].

Figure 1 .
Figure 1.Overview of SCRIPro.SCRIPro takes single cell RNA-seq or spatial RNA-seq as input.SCRIPro first employs density clustering using a high coverage SuperCell strategy.While for spatial data, SCRIPro combines gene expression and cell spatial similarity information to a latent low-dimension embeddings via a graph attention auto-encoder.Then SCRIPro conducts in silico deletion analyses, utilizing matched scATAC-seq or reconstructed chromatin landscapes from public chromatin accessibility data, to assess the regulatory significance of TRs by RP model in each SuperCell.At last, SCRIPro combines TR expression and TR to generate TR-centered GRNs at the SuperCell resolution.The output of SCRIPro can be applied for TR target clustering, temporal GRN trajectory and spatial GRN trajectory.

Figure 3 .
Figure 3. SCRIPro identified tumor-specific GRNs in the human B-cell lymphoma 10× multiome dataset.(a) UMAP of nine cell types identified in human B-cell lymphoma dataset.Mono, Monocytes; pDC, plasmacytoid dendritic cells.(b) Heatmap showing the clustering of TRs by cell type.Top: Cell types annotated in (a).Right: Highlighted TRs.(c) The PAX5 ChIP-seq signal landscape identified by SCRIPro, which is aggregated based on SuperCells, shows across tumor B and B cell types for POU5F1 and BTG4 genes on chr11:111349018-111403152. (d) UMAP showing the predicted distributions of IRF8 (continued)

Figure 3 .
Figure 3. Continued and SPIB by SCRIPro and SCENICþ.IRF8 is highlighted in the pDC cell type in SCRIPro, while SPIB is prominent in Tumor B cells.(e) Box plot depicting the Pearson correlation between TRs and gene expression by SCRIPro, LISA P Value, SCENIC, SCENICþ, and SCING.(f) Heatmap clustering of TRs in tumor B and B cell types on the SCRIPro TR activity scale.Top: TR heatmap in the tumor B cell type, showing three clusters.Bottom: TR heatmap in the B cell type, showing two clusters.Right: An enlarged view of Group 3 (lower right corner) of the heatmap for tumor B, with all TRs labeled on the right.(g) Top: Venn diagram showing the overlap of TRs between B cell Group 1 and tumor B cell Group 1. Bottom: Venn diagram showing the overlap of TRs between B cell Group 2 and tumor B cell Group 2. (h) GO terms of the targets shared by TRs in Group 3.

Figure 4 .
Figure 4. SCRIPro detected cell type-specific GRNs in E16.5 mouse embryo Stereo-seq data.(a) SCRIPro identified 28 cell types on E16.5 mouse embryo stereo-seq dataset.(b) SCRIPro can distinguish different TRs spatial distribution with similar motif in same family.(c) SCRIPro is capable of predicting Gata1 and Tal1 target genes and build GRNs, and utilizes these target genes for GO analysis.(d) Heatmap showing the modules with significant spatial autocorrelation that are clustered into different modules based on spatial co-expression of E16.5 embryo.(e) Calculate TR center degree in module 38 (muscle) and find out important TRs.