Detecting differential alternative splicing events in scRNA-seq with or without Unique Molecular Identifiers

The emergence of single-cell RNA-seq (scRNA-seq) technology has made it possible to measure gene expression variations at cellular level. This breakthrough enables the investigation of a wider range of problems including analysis of splicing heterogeneity among individual cells. However, compared to bulk RNA-seq, scRNA-seq data are much noisier due to high technical variability and low sequencing depth. Here we propose SCATS (Single-Cell Analysis of Transcript Splicing) for differential splicing analysis in scRNA-seq, which achieves high sensitivity at low coverage by accounting for technical noise. SCATS models scRNA-seq data either with or without Unique Molecular Identifiers (UMIs). For non-UMI data, SCATS explicitly models technical noise by accounting for capture efficiency and amplification bias through the use of external spike-ins; for UMI data, SCATS models capture efficiency and further accounts for transcriptional burstiness. A key aspect of SCATS lies in its ability to group “exons” that originate from the same isoform(s). Grouping exons is essential in splicing analysis of scRNA-seq data as it naturally aggregates spliced reads across different exons, making it possible to detect splicing events even when sequencing depth is low. To evaluate the performance of SCATS, we analyzed both simulated and real scRNA-seq datasets and compared with existing methods including Census and DEXSeq. We show that SCATS has well controlled type I error rate, and is more powerful than existing methods, especially when splicing difference is small. In contrast, Census suffers from severe type I error inflation, whereas DEXSeq is more conservative. When applied to mouse brain scRNA-seq datasets, SCATS identified more differential splicing events with subtle difference across cell types compared to Census and DEXSeq. With the increasing adoption of scRNA-seq, we believe SCATS will be well-suited for various splicing studies. The implementation of SCATS can be downloaded from https://github.com/huyustats/SCATS.


Introduction
The emergence of scRNA-seq technology has made it possible to measure gene expression variations at cellular level. This breakthrough enables the investigation of a wide range of problems including analysis of splicing heterogeneity among individual cells. However, compared to bulk RNA-seq, scRNA-seq data are much noisier due to high technical variability, low sequencing depth, and the lack of full-length transcript sequencing for droplet-based protocols. Despite the growing popularity of scRNA-seq, few published studies have investigated alternative splicing, and even when studied, methods developed for bulk RNA-seq were utilized [1][2][3], which may not be optimal for scRNA-seq data.
Methods designed specifically for splicing analysis in non-UMI based scRNA-seq data only started to emerge recently [4]. Huang et al. [5] detects differential exon-usage by performing a pairwise comparison between every two cells, which becomes computationally infeasible when large number of cells are generated from droplet-based protocols [6][7][8]. Song et al. [9] quantifies exon-inclusion levels based on junction-spanning reads for non-UMI based scRNA-seq data. However, these estimations are unreliable due to sparse read counts that span exon-exon junctions and technical noise of scRNA-seq data, leading to limited power in detecting DAS events. Qiu et al. [10] and Ntranos et al. [11] developed approaches to detect differential transcript usage based on pre-estimated cell-specific isoform expressions or transcript compatibility counts. Although encouraging, the feasibility of estimating isoform usage at single-cell level still remains questionable due to limited informative reads for splicing in scRNA-seq.
Here we present SCATS, which achieves high sensitivity to detect DAS events in scRNAseq by accounting for technical noise and low sequencing depth through an exon-grouping approach originally developed in PennDiff [12]. Given annotated transcript information, sequencing reads aligned to exons originated from the same isoform(s) are first grouped together (Fig 1A). This grouping step is essential in splicing analysis of scRNA-seq data as it naturally aggregates spliced reads across different exons, making it possible to detect DAS events even for genes with low sequencing depth (Fig 1B, Fig 1C, Fig 1D, S1 Fig). Moreover, SCATS can be applied to scRNA-seq data either with or without UMIs. For non-UMI data, SCATS explicitly models technical noise by accounting for capture efficiency, amplification bias and dropout events through the use of external spike-ins; for UMI data, SCATS models that have exon e included. Mathematically, it can be written as x e ¼ P j2I e y j , where I e represents the set of isoforms that have exon e included, and θ j represents the relative abundance of isoform j. Although one can estimate the relative abundance of each inclusion isoform and then sum over the relative abundances of all inclusion isoforms to estimate x e , this approach is inaccurate due to the extreme sparsity of scRNA-seq data. To get a more reliable estimate of x e , we utilize splicing informative reads at the alternatively spliced exon e and those exons that are in the same exon group as exon e. This approach is more robust than estimating isoform relative abundances directly as it naturally aggregates splicing informative reads that originate from the same isoform(s) together. As shown in Fig 1A, for each gene, alternatively spliced exons are grouped if they originate from the same isoform(s) or complementary isoform(s) (i.e. exons share the same inclusion levels or complementary inclusion levels). Fig 1B illustrates the detailed procedure of exon grouping and the identification of informative reads for splicing.

Evaluation on simulated data
To evaluate the effectiveness of exon grouping for exon-inclusion level estimation, we simulated data from a generative model (see Material and Methods for details). Our simulations indicate that SCATS's estimates are strongly correlated with the true levels (R = 0.83), whereas naïve estimation that ignores technical noise in scRNA-seq only leads to a correlation of 0.66 (Fig 2A). We further assessed the performance of SCATS in detecting DAS events, and compared with two other state-of-the-art methods: Census [10], an algorithm designed to preprocess raw read counts by removing technical noise, and DEXSeq [13], a popular method for bulk RNA-seq splicing analysis. Specifically, Census estimates isoform expression levels first and converts them into Census RNA counts, and then fits a Dirichlet multinomial mixture model to detect differential splicing events. DEXSeq identifies differential exon usage by fitting exonic counts with a log linear regression model in which exonic counts are obtained by counting the number reads aligned to a virtual exon. Our results indicate that SCATS has well controlled type I error rates (Fig 2B, Fig 2C), whereas Census tends to yield false positive results with enrichment of p-values near zero, and DEXSeq is overly conservative with enrichment of p-values near one. SCATS is also more powerful than Census and DEXSeq (Fig 2D), although Census has severely inflated type I error rates. Compared to DEXSeq, SCATS performs consistently better, especially when the true exon-inclusion level difference is small. This indicates that SCATS is robust in detecting subtle splicing difference, thanks to the accurate exon inclusion-level estimation due to exon grouping and technical noise modeling.
The above simulations generated data assuming the same model underlying SCATS. To make a fair comparison, we designed a new simulator to generate an in silico library for a single cell (see Material and Methods for details). This simulator does not make any parametric assumptions on gene expression across cells. Results for data generated from this new simulation scheme are shown in S2 Fig, which indicate that SCATS has well controlled type I error rates (S2A Fig, S2B Fig), whereas Census tends to yield false positive results with enrichment of p-values near zero, and DEXSeq is overly conservative with enrichment of p-values near one. SCATS is also more powerful than Census and DEXSeq (S2C Fig). These results are consistent with what we observed in the original parametric simulation scheme.
To evaluate the impact of novel isoforms, we conducted a simulation study to test the robustness of SCATS to under-annotation, that is, some of the novel isoforms are not included in the annotation. Specifically, we simulated scRNA-seq read counts based on 100% Ensembl annotated isoforms and analyzed the data with SCATS using 100%, 75% and 50% of the annotated isoforms. annotation across different significance levels and effect sizes. SCATS is robust to under-annotation in all scenarios especially when effect size was large. Compared to full annotation, when only using 50% of the annotated isoforms in analysis, the power of SCATS only decreased by 4% on average across different significance levels. These results indicate the robustness of SCATS to isoform annotation. Similarly, we evaluated the impact of exon group length on SCATS. As shown in S4 Fig, all different length groups have well controlled type-I error rate when effect size equals to 0 and SCATS is more powerful as exon group length increased. This matches our expectation and demonstrates the improved performance of SCATS by grouping exons to account for read sparsity in scRNA-seq.
Moreover, we conducted a pseudo-bulk simulation by pooling scRNA-seq data from 300 cells in each group to compare this pseudo-bulk approach with SCATS. Since there is only one sample in each condition in the pseudo-bulk data, we cannot perform statistical tests. Therefore, we examined differential splicing by fold change (FC). We considered two significance Informative read counts of 1,000 exon groups for 600 cells (300 vs 300) were simulated using a generative model shown in S1 Fig with cell-specific technical parameters estimated from CA1 pyramidal cells in the scRNA-seq data generated by Zeisel et al. [16]. To simulate data that resemble real scRNA-seq data, one CA1 pyramidal cell was randomly selected and read coverage for each gene in that cell was used to obtain true gene concentration at the population level (gene concentration ¼ 100; 000 � reads from the gene total number of reads from the CA1 pyramidal cell ). Based on the generative model accounting for technical noise, we generated 6,000 reads on average for each cell. (A) Scatter plots of 1,842 true exon-inclusion levels, estimated by percent spliced in (PSI) for a given exon, against estimates from SCATS and a naïve method that ignores technical noise. SCATS models technical noise to quantify usage of an alternatively spliced exon while the naïve method simply estimates the inclusion level by c ¼ Y þ Y þ þY À , where Y + and Y − are the informative read counts of the "+" and "-" exon groups across cells. SCATS estimates are closer to the ground truth. thresholds of average fold change for analysis based on the pseudo-bulk data: FC>1.2 and FC>1.5. For the pseudo-bulk approach, the power is calculated as the proportion of exon groups with FC>1.2 or 1.5 among exon groups with differential splicing and the type-I error is calculated as the proportion of exon groups with FC>1.2 or 1.5 among exon groups with no splicing difference. As shown in S5 Fig, SCATS has higher power for FC>1.5 across all scenarios, especially when effect sizes are relatively small. SCATS and FC>1.5 both have well controlled type-I error rate. When effect size is greater than 0.2, FC>1.2 is more powerful than SCATS. However, FC>1.2 has severely inflated type-I error rate. These results demonstrate the benefit of SCATS over pseudo-bulk approach in detecting inclusion-level difference of scRNA-seq data.

Application to Tasic et al. data
Next, we applied SCATS to a scRNA-seq dataset generated from adult mouse brains, which includes 1,679 cells generated using the SMART-seq protocol (Tasic et al. dataset [3]) (see details in Methods). This dataset includes three major cell classes and 49 sub-cell types (23 GABAergic cell types, 19 glutamatergic cell types, and seven non-neuronal cell types). Major cell classes and sub cell types reported in the original publication were treated as ground truth to evaluate SCATS in characterizing splicing heterogeneity across cells. This dataset is non-UMI based but read counts from 92 ERCC spike-ins are available, which allow us to quantify technical variations. To assess the accuracy of exon-inclusion level estimation of SCATS, we selected two consecutive alternatively spliced exons (flip-flop exons [14]) from two Glutamate receptor genes Gria1 and Gria2 (S6A Fig, S6B Fig), which have been reported to display a highly variable splicing pattern across cell types [3,14]. These flip-flop exons are complementary with each other, and each single one is included in a different mature RNA. Sommer et al. [14] measured the expression levels of the flip and flop exons, respectively, with RNA in situ hybridization (ISH). Their findings showed that the flip exon was preferentially utilized in layer II/III of neocortex as compared to the flop exon, whereas both exons were equally expressed in layer VI of neocortex. To evaluate the exon usage quantification of SCATS, we estimated cell type-specific exon-inclusion level of the flip exon. As shown in Fig 3A and Fig  3B, the estimated exon-inclusion levels of the flip exon in cell types L2-Ngb (layer II) and L2/ 3-Ptgs2 (layer II/III) are over 80%, whereas the estimates are around 50% in cell type L6a . These results are consistent with RNA ISH measurements in Sommer et al. [14].
Encouraged by the accurate exon-inclusion estimates, we next conducted a pairwise DAS analysis across all 49 sub-cell types based on 296 most differentially spliced genes (with 966 exon groups) selected by Tasic et al. using MISO [15], a popular method developed for splicing analysis in bulk RNA-seq data. S7 Fig shows the quantile-quantile (QQ) plots of the log-transformed p-values from pairwise comparisons of different sub-cell types, which include sub-cell type comparisons both within major cell classes and cross major cell classes. For tests within GABAergic, glutamatergic and non-neurons, the p-values from SCATS are distributed slightly below the red diagonal line until when the significance level α reaches to 0.01. This pattern is consistent with the QQ plot shown in the simulations (Fig 2B). In contrast, there is a sharp deviation from the diagonal line at α = 10 −1.5 for all three cross-major cell class subsets, indicating that more DAS events were detected at less stringent significance thresholds. This is not surprising because more DAS exons are expected to be detected between sub-cell types from different major cell classes compared to sub-cell types that originate from the same major cell class. S8A Fig, S8B Fig and S12 Fig show DAS heterogeneity revealed by SCATS across the 49 sub-cell types based on the 966 exon groups in these 296 genes. The heatmap shows three clusters, consistent with the three labeled major cell classes: GABAergic, glutamatergic, and non-neuronal. Although this pattern was also revealed by MISO analysis in Tasic et al., SCATS results revealed more heterogeneous splicing patterns within each major cell class than MISO, demonstrating its higher sensitivity in detecting splicing heterogeneity.
For GABAergic cells, Tasic et al. conducted a hierarchical clustering analysis based on qRT-PCR measurements obtained from 79 marker genes for GABAergic sub-cell types. These cells were classified into three sub-clusters. To further assess the accuracy of SCATS in profiling splicing heterogeneity, we treated these qRT-PCR based clustering result as ground truth  2. Moreover, the dendrogram of SCATS shows that the purple cell types were separated from yellow and red cell types at the first level of the hierarchical clustering tree, whereas the yellow and red cell types were grouped together at the first level of the tree, indicating that the yellow and red cell types are more similar to each other than to the purple cell types. This pattern is consistent with the hierarchical clustering analysis based on the qRT-PCR measurements by Tasic et al. These results demonstrate the high sensitivity of SCATS in detecting DAS among closely related cells from the same major cell class. Additionally, we generated a sashimi plot using IGV to validate the detected DAS exons by SCATS. We also analyzed a larger dataset that includes 6,275 genes with 12,073 exon groups that met the analysis criteria. The patterns are overall similar to what we observed for the 296 most differentially spliced genes selected by Tasic

Application to Zeisel et al. data
Although designed primarily to detect DAS events in full-length transcript scRNA-seq data, SCATS can also detect DAS events for UMI data, which typically only contain sequences from the 3' or 5' end of a transcript. Due to the sparsity of splicing informative UMI counts, few methods are able to detect DAS events in UMI-based scRNA-seq data. To evaluate the performance of SCATS when full-length transcript information is not available, we analyzed the Zeisel et al. dataset [16], which are UMI-based data generated from 3,005 cells in mouse cortex. This dataset includes nine major cell classes and 47 sub-cell types. We performed DAS analysis on 1,826 marker genes (3,542 exon groups) for these nine major cell classes across the 47 subcell types in this dataset. Since UMI counts were consolidated from non-UMI reads, we compared read count distributions between UMI and non-UMI data. Not surprisingly, splicing informative read counts in UMI data are much sparser than non-UMI data (S11A Fig, S11B  Fig), making it challenging to detect DAS in UMI data. However, by exon grouping and appropriate statistical modeling, SCATS is able to detect DAS even with such sparse data. Fig  4A (S13 Fig) shows the heatmap of the proportions of the detected DAS events for each pairwise comparison by SCATS. Neuronal cells are clearly differentiated from non-neuronal cells, although some major cell classes (marked by single colors) cannot be discerned from this heatmap. To better understand these results, we treated the proportions of significant DAS exons for each pairwise comparison as a similarity metric and performed hierarchical clustering analysis as shown in Fig 4B. For SCATS, the separation in neuron cells agrees well with labeled cell types, with only one CA1 Pyramidal cell type (ClauPyr) misclassified as Interneurons. For non-neuronal cells, SCATS clearly separated the Oligodendrocytes and Astrocytes cell classes from others, with four labels in Ependyma, Microglia and Mural misclassified. These clustering results demonstrate the reliability of DAS detections across cell types using SCATS even when splicing informative read counts are sparse. We also conducted similar analyses using MISO, Census and DEXSeq. Census may yield more false positive results because it detected more than 40% DAS events across most pairwise comparisons, even for those comparisons within major cell classes (Fig 4C, Fig 4D). In contrast, DEXSeq and MISO missed most of the DAS events and the hierarchical clustering analysis failed to separate different major cell classes for both neurons and non-neurons (Fig 4E, Fig 4F, S11C Fig, S11D Fig). This is not surprising as splicing informative molecule counts in UMI-based scRNA-seq data are extremely sparse, and methods developed for bulk RNA-seq data are not optimal when analyzing such data. It is worth noting that splicing quantification offers higher resolution of cellular heterogeneity than total gene expression. As shown in S11E Fig, exon-inclusion levels showed more distinct patterns across cell types for gene Gria1 than total gene expression.

Discussion
Detection of mis-regulated alternative splicing events is a critical step towards the understanding of association between isoform variations and disease. scRNA-seq data offer new insights into alternative splicing at cellular level, which helps us to understand cellular heterogeneity with higher resolution as compared to bulk RNA-seq. However, low sequencing depth and technical noise have precluded the investigation of splicing heterogeneity in most scRNA-seq studies. Several approaches have been developed to detect differential alternative splicing for scRNA-seq data. However, these methods either ignore technical noise or perform statistical tests based on unreliable isoform/exon usage estimates. To fill in such knowledge gap, we developed SCATS, which utilizes informative read counts at exon group level to achieve high sensitivity in DAS detection. By grouping exons that originate from the same isoform(s) and modeling of technical noise, SCATS aggregates spliced reads across different exons, making it possible to detect splicing events in scRNA-seq. Through simulations, we have shown that SCATS yields more accurate quantification of exon-inclusion levels than a method that utilizes empirical junction reads directly and is more powerful in detecting DAS events compared to existing methods.
Accurate quantification of exon-inclusion levels is critical for DAS detection. To estimate exon-inclusion levels more accurately, SCATS corrects local heterogeneity of read coverage to improve DAS detection accuracy. In particular, 3' bias of UMI-based scRNA-seq data leads to most reads being distributed to the 3' end of an isoform, which complicates exon-inclusion level quantification and detection of DAS events for exons that are far away from the 3' end. SCATS models this non-uniform read distribution in a non-parametric manner, which upweights 3' end in the model to remove 3' sequencing bias of UMI-based scRNA-seq data. This bias correction together with exon grouping in SCATS makes it possible to detect DAS events at the 5' end even when few reads are mapped to that region.
Although SCATS showed promising performance compared to existing methods, we note that SCATS cannot be applied to estimate isoform expression at cellular level because most next-generation sequencing (NGS) reads are too short to capture unique sequences for each isoform. In a typical scRNA-seq dataset generated using NGS, the number of splicing informative reads in each cell is extremely small, and in many situations is even less than the number of isoforms in Ensembl or Gencode annotation. The extreme sparsity of data and the lack of reads capturing an entire transcript prohibit the estimation of isoform-specific gene expression in a single cell. To address this issue, we are currently exploring scRNA-seq data generated using long-read sequencing technologies such as Nanopore and PacBio, to estimate isoform expression in each cell and to detect differential usage of isoforms.
In summary, we have developed a robust approach to detect DAS events by accounting for both technical noise and biological variations such as transcriptional bursting across single cells. Through simulation studies and real data analyses, we showed that SCATS outperformed existing methods in detecting DAS events for both UMI and non-UMI scRNA-seq data. With the increasing adoption of scRNA-seq, we believe SCATS will be well-suited for various splicing studies and offer additional insights beyond total gene expression analysis.

Splicing informative read counts based on exon grouping
To get informative reads for DAS analysis, we pre-process sorted isoform annotation file in GTF format and sorted aligned scRNA-seq data in SAM format. For each gene, alternatively spliced exons are grouped if they originate from the same isoform(s) or complementary isoform(s) (i.e. exons share the same inclusion levels or complementary inclusion levels). Suppose the gene under consideration has M exon groups, then all reads mapped to this gene can be grouped into 2M+1 categories: 1) uninformative reads: reads that map to exons shared by all isoforms of the gene; 2) included (+) informative reads of exon group j: reads mapped to alternatively spliced (AS) exons that are included in exon group j (1�j�M); and 3) excluded (-) informative reads of exon group j: reads mapped to AS exons that are excluded from exon group j (1�j�M). Let be the observed read counts in cell i for the gene, where Y i0 denotes uninformative read count, Y +ij denotes included informative read count of exon group j, and Y −ij denotes excluded informative read count of exon group j. Fig 1A shows an example of how exons are grouped and informative reads are counted. For this hypothetical gene, there are three isoforms constructed by five exons, where three of them are alternatively spliced and grouped into two groups (blue and green). Among the seven mapped reads, reads 1 and 6 are uninformative reads (Y +i0 = 2), read 2 is an informative read included in exon group 1 (Y +i1 = 1), reads 3 and 4 are informative reads excluded from group 1 (Y +i0 = 2), read 5 is an informative read included in exon group 2 (Y +i2 = 1), and read 7 is an informative read excluded from group 2 (Y −i2 = 1). Fig 1B shows a workflow that describes how exon grouping and informative read extraction are implemented in SCATS.

Modeling splicing informative UMI and non-UMI counts
SCATS is designed to detect DAS events at exon group level. For the ease of notation, we drop the exon group index. For a given exon group of interest, let ψ be the inclusion level of an included (+) exon group and θ g be the mean expression of gene g. To account for cell-specific technical noise, we model splicing informative read counts Y +i and Y −i using a hierarchical model (S1 Fig). We assume cell-specific true expression level of the included/excluded exon group μ +i /μ −i follows a log-normal distribution: logðm À i Þ � Normalðlogðy g � ð1 À cÞ � h À Þ; s 2 À Þ where h + /h − represents the probability of included/excluded reads being informative and s 2 þ =s 2 À is variance of expression level in log scale.
For non-UMI count, amplification bias and low capture efficiency in scRNA-seq are quantified by a linear model. Specifically, for cell i, the expected value of the informative read count λ +i /λ −i is modeled as where α i and β i are cell-specific technical parameters designed to model capture efficiency and amplification bias, respectively. These technical parameters can be quantified with TASC [17] when ERCC spike-ins are provided.
For UMI count, cell-specific parameter β i is equal to 1 as amplification bias can be alleviated by UMI. Therefore, the expected informative UMI count λ +i /λ −i is proportional to true expression of the included/excluded exon group: Moreover, we account for inflated zeros of scRNA-seq which are primarily sourced from two contributing factors: technical dropout and transcriptional bursting. Dropout happens when a particular transcript is lost in scRNA-seq experimental steps while the gene is expressed in the cell. Also, excessive zero counts can be introduced when significant bursting genes are in "off" state due to transcriptional heterogeneity. Previous studies have shown that the modeling of dropout in UMI data is unnecessary [18,19]. Therefore, to model zero inflation, for a given gene, we let Z i be the indicator that the gene in cell i is captured in the library (dropout does not happen) and it is in the "on" state. The probability of Z i = 1 is π i . Therefore, given λ +i and λ −i , the distribution of the observed informative UMI count Y +i /Y −i is where Poisson distribution has been reported to approximate the UMI sampling process well after zero inflation being removed [18,19].

Probability of included/excluded reads being informative
Another key part of SCATS is to estimate h + /h − , the probability of included/excluded reads being splicing informative. From the isoform structure of gene g, we can determine the origin (exon group) of a read (being informative read) only when it is mapped to exon-exon junction or an alternatively spliced exon. Thus, a natural way to estimate the probability that a read being mapped to this region is to calculate the ratio between weighted length of the informative region and the whole region. Due to low sequencing depth of scRNA-seq data, we assume the exon group shares the same h + /h − across cells instead of estimating cell-specific where I + and I − denote the set of isoforms in the included and excluded group, respectively, I denotes the set of all isoforms in gene g, S � d denotes the set of informative base pair positions of isoform d, and S d denotes the set of all base pair positions of isoform d. To calculate weighted lengths of the region, we utilize observed read coverage w s at base pair position s and relative abundance θ d of isoform d which is estimated based on informative reads. Therefore, the numerator represents the weighted length of informative region for included exon group (+) and excluded exon group (-), respectively, and the denominator represents the weighted length of whole region in gene g.

Detection of differential alternative splicing exon group
Exon-inclusion level, which measures the relative usage of an exon, is a commonly used measure to quantify the alternative splicing process. To detect DAS exons, we can examine if the inclusion level of exon group (ψ) is significantly different between two groups of cells. To make accurate inference on the inclusion level difference, technical parameters (α i , β i ) for each cell i and population mean expression θ g of gene g are first calculated during data pre-processing, using software TASC [17]. This allows us to eliminate bias due to technical variations in scRNA-seq data. Given an exon-group of interest, the informative read counts Y +i /Y −i are fitted to the hierarchical model described above, with likelihood function calculated as The joint distribution of cell-specific observed informative read counts and their true expression levels is and similarly P Y À i ; m À i jc ð Þ ¼ ð1 À p i Þf LN ðm À i jlogðy g � ð1 À cÞ � h À Þ; s 2 À Þ þ e À e a i þb i logm À i p i f LN ðm À i jlogðy g � ð1 À cÞ � h À Þ; s 2 À Þ; where the probability distribution function of log-normal distribution f LN xjy; s 2 ð Þ ¼ 1 xs ffi ffi ffiffi 2p p e À ðlogxÀ yÞ 2 2s 2 : To detect exons with DAS, we test whether there is significant difference in inclusion level ψ between two group of cells, denoted by A and B, i.e., H 0 :ψ A = ψ B vs.
cÞj is employed to test this hypothesis, where L 0 ðYj b cÞ; L 1 ðYj b cÞ are maximized under the null hypothesis H 0 and alternative hypothesis H 1 , respectively. Asymptotically, T follows a chi-square distribution with one degree of freedom.

Datasets and evaluations
We first conducted a simulation study to benchmark the performance of SCATS and compared it with other existing DAS methods, including Census [10] and DEXSeq [13]. To simulate realistic datasets, the average gene expressions (θ g ) for each cell group were assigned based on measurements of CA1 pyramidal cells in the scRNA-seq data generated by Zeisel et al. [16]. The exon start and end positions were based on Ensembl annotation and downloaded in GTF format from UCSC Genome Browser (https://genome.ucsc.edu/). The cell-specific technical parameters (α,β,κ,τ) were estimated based on measurements from CA1 pyramidal cells in scRNA-seq data from Zeisel et al. [16]. Given these abundances and technical parameters, we then parameterized the generative model described previously, which allows us to simulate exon-level informative read count (Y +ij ,Y −ij ) mimicking the scRNA-seq data generation process. Following this procedure, we simulated 1,000 exon groups with 6,000 total reads per cell for 600 cells (300 in condition A, and 300 in condition B) for five scenarios (Δ = |ψ A −ψ B | = 0, 0.1, 0.2, 0.3, 0.4) to evaluate type I error rate and power for SCATS, Census and DEXSeq.
Next, we evaluated different DAS detection methods using mouse cortical dataset generated by Tasic et al. [3], which includes 1,679 cells from 49 sub-cell types in three major cell classes (23 GABAergic cell types, 19 glutamatergic cell types, and seven non-neuronal cell types). Among these cells, the average sequencing depth is 3 million. We directly downloaded the aligned data in BAM format from NCBI Gene Expression Omnibus (GSE71585). Given read counts of 6,275 endogenous RefSeq genes with at least two annotated isoforms and 92 ERCC spike-ins across cells, we applied TASC to quantify technical variations, yielding accurate estimation of average gene concentrations in each sub-cell type. Similarly, informative reads in each exon group were counted using SCATS based on the 6,275 genes with 12,073 exon groups. On average, 4,311 exon groups were covered by at least one splicing informative read in each cell. We treat an exon group as qualified for analysis when at least three cells have at least one informative read in each condition. Pairwise DAS comparison was conducted between these 49 sub-cell types with 985 qualified exon groups on average for each comparison.
Another adult mouse brain data used for evaluation was acquired from Zeisel et al. [16], which includes sequencing data of 3,005 cells from various regions of mouse brain. These cells were classified into nine major cell classes and 47 sub-cell types with sub-cell types from the same major cell class considered to be relatively homogenous. In this study, raw data in FASTQ format were downloaded from NCBI Gene Expression Omnibus (GSE60361) and aligned to mm10 reference genome with Ensembl gene annotation using STAR [20]. For each cell, we processed the BAM file and counted splicing informative reads using SCATS for 1,826 marker genes (with at least two annotated isoforms by Ensembl) with 3,542 exon groups of the nine major cell classes. For each cell, there are 465 exon groups on average covered by at least one splicing informative UMI read. Given these informative counts, DAS analysis between each pair of the 47 sub-cell types was conducted. To compare the ability of DAS detection, we also performed pairwise DAS analysis across sub-cell types using DEXSeq and Census. For each comparison, we required at least three cells having at least one informative UMI in each condition, leading to 568 qualified exon groups on average for each DAS comparison.

Software availability
An open-source implementation of the SCATS algorithm can be downloaded from https:// github.com/huyustats/SCATS.

Life sciences reporting summary
Further information on experimental design is available in the Life Sciences Reporting Summary.
Supporting information S1 Fig. Schematic of the SCATS model. An alternatively spliced exon is included in the "+" exon group (blue) and excluded from the "-" group (white). μ +i and μ −i represent true expression levels of the "+" and "-" exon groups, respectively, in cell i. λ +i and λ −i are intermediate variables that model amplification bias, capture efficiency, and sequencing bias in cell i for the "+" and "-"exon groups. Z i models transcriptional bursting and dropout event of the gene in cell i. Y +i and Y −i represent observed informative read counts of the "+" and "-" exon groups in cell i. SCATS utilizes a hierarchical model to detect differential alternative splicing (DAS) events between cell groups by accounting for technical noise from scRNA-seq data. Pairwise DAS comparison across 49 sub-cell types in three major cell classes was performed using SCATS. DAS comparisons were classified into two groups: comparison for cell types within major cell classes (A, C, E), and comparison for cell types across major cell classes (B, D, F). X-axis represents uniform theoretical quantiles between 0 and 1 in −log 10 scale. Y-axis represents observed p-value quantile in −log 10 scale. Uniformly distributed data should follow the red dashed line. Q-Q plots of p-values from within major cell class comparisons (GABAergic, Glutamatergic, Non-neuronal) are similar to the plot under the null hypothesis in our simulation study. This matched our expectation because most exons should not be differentially spliced between sub-cell types within the same major cell class. In contrast, the distribution of p-values is highly right-skewed starting from X = 10 −1.5 for cross-major cell class comparisons (GABAergic vs. Glutamatergic, GABAergic vs. Non-neuronal, Glutamatergic vs. Non-neuronal), indicating that more DAS events were detected.