A unified pipeline for FISH spatial transcriptomics

Summary High-throughput spatial transcriptomics has emerged as a powerful tool for investigating the spatial distribution of mRNA expression and its effects on cellular function. There is a lack of standardized tools for analyzing spatial transcriptomics data, leading many groups to write their own in-house tools that are often poorly documented and not generalizable. To address this, we have expanded and improved the starfish library and used those tools to create PIPEFISH, a semi-automated and generalizable pipeline that performs transcript annotation for fluorescence in situ hybridization (FISH)-based spatial transcriptomics. We used this pipeline to annotate transcript locations from three real datasets from three different common types of FISH image-based experiments, MERFISH, seqFISH, and targeted in situ sequencing (ISS), and verified that the results were high quality using the internal quality metrics of the pipeline and also a comparison with an orthogonal method of measuring RNA expression. PIPEFISH is a publicly available and open-source tool.


INTRODUCTION
Development of single-cell RNA sequencing (scRNA-seq) over the past decade has allowed researchers to probe the heterogeneous nature of real tissue by characterizing the transcriptome of individual cells.This has led to the discovery of many new cell types and has enhanced our understanding of the mechanisms of disease. 1 However, in the process of separating single cells from each other so that their transcriptomes can be sequenced individually, the spatial context of each cell and the location of each transcript within cells, both of which contain important biological information, are lost.Spatially resolved transcriptomics methods can be used to characterize transcriptomes on a single-cell level such that the spatial context of each transcript and cell is also recovered.This spatial information is incredibly useful biomedically, as many diseases can be characterized by abnormal spatial patterns, 2 and in developmental biology, where many vital and early processes are driven by spatial relationships between different biological factors 3 and defects in their spatial distributions can have serious consequences for the developing organism.
Named as Nature's ''Method of the Year'' in 2020, 4 spatially resolved transcriptomics is becoming increasingly prevalent in new published research after the first multiplexed methods were demonstrated in the mid 2010s, 5,6 with new methods and discoveries being released at an ever increasing pace.The earliest spatial transcriptomics methods were based on fluorescence in situ hybridization (FISH) probes with specific mRNA targets, which could then be imaged with a microscope. 7These early methods were very low throughput, only targeting a handful of genes, 8 but more recent advances in multiplexing these methods have allowed for nearly whole-transcriptome-level targeting.Exposing samples to many sequential rounds of hybridization with different sets of FISH probes, called seqFISH, 9 was used to go from a few dozen target genes to hundreds and has been shown to be effective in both cell cultures 10 and in tissue samples. 11A newer version, seqFISH+, 12 has shown that it is possible to characterize the expression of over 10,000 different mRNA targets simultaneously.A similar approach is used in MERFISH but with more advanced built-in error correction abilities, allowing for increased accuracy of results. 13Another in situ imaging-based technique, called in situ sequencing (ISS), takes advantage of short-read sequencing technology to image the location of mRNA targets as they are sequenced using traditional sequencing-by-synthesis technology. 5Experiments based on this method have been able to generate spatial maps of breast cancer tumors that could be useful in clinical diagnostics of patients with cancer, leading to improved patient outcomes. 14Advances in ISS methods have led to increased throughput and accuracy. 15s these spatial transcriptomics methods further improve to become higher resolution and throughput while also becoming less complex to perform, the number of researchers who wish to use them will also increase.Thus, it is also important to have available standardized, open-source methods for processing and analysis of the results from these spatial methods for the purpose of reproducibility and easy comparison of results between different research groups.Much of the analysis done for published research relies on in-house code with little documentation, making the adoption of these experiments difficult and time consuming.Existence of publicly available, opensource computational tools for analysis of spatial data will make these types of experiments more accessible for labs that wish to perform them, facilitating the generation of a larger body of spatial data.
Although some current sequencing-based spatial transcriptomics methods have open-source libraries 16,17 or bundled toolkits 18 available, FISH-based methods have significantly fewer options at their disposal. 19Experimental apparatus must usually be assembled in-house, and most labs write their own analysis code.Written by the Chan-Zuckerberg Institute, starfish 20 is an open-source Python library containing methods and data types for the processing and analysis of FISH-based spatial imaging results.It is capable of taking raw image files and outputting decoded results detailing the spatial location of each transcript, which cell it is located in, and where that cell is located in the original image.These results can then be used in downstream tools such as squidPy 21 or stLearn. 22While useful, starfish is lacking in several different areas, including requiring a large number of input parameters to carry out a full analysis, having only several basic image pre-processing tools, lacking an adequate seqFISH decoding algorithm, and the absence of quality control metrics to determine the significance of the results.
We have created a universal spatial transcriptomics pipeline for FISH-based methods called PIPEFISH using the CWL (Common Workflow Language) framework and an improved version of the starfish package that includes a novel seqFISH decoding method and PoSTcode 23 as an ISS decoding option.PIPEFISH organizes the tools available in our custom starfish module into an ordered pipeline that can take raw image data from the most popular FISH-based methods and extract spatially annotated mRNA transcript counts from them.We have also developed a number of quality control metrics that can be used to assess the performance of the pipeline results.Using the pipeline, we found spatially annotated transcripts from three different in situ image datasets, seqFISH of a mouse embryo, MERFISH of human U2-OS cells, and targeted ISS in a mouse brain, and used both the internal QC (quality control) metrics and an external metric to show that our results are high quality.

A general SpaceTx pipeline
We have developed PIPEFISH, a general pipeline to extract transcript locations from the raw results of a FISH experiment (Figure 1).Input can be accepted in many forms so long as images are validly encoded tiffs and a codebook describing the expected transcripts is provided.The codebook is expected to have transcript names and their paired ''barcodes,'', which is the list of imaging round and color channel combinations where the transcript is expected to fluoresce.Additional information can be provided in a configuration json file with experimental design details, such as the hamming distance between barcodes if the experiment was designed such that incomplete barcodes can be recovered during decoding.
The configuration json file can also include parameters for running image processing, such as detailing which auxiliary views (if any) should be used for image registration or subtracted from the primary views as a background.Other tunable parameters for image processing, such as the kernel radius for rolling ball background subtraction or a white tophat filter, can also be specified.
There are two primary types of image decoding: pixel based and spot based.The pixel-based method treats each intensity at a given (x,y,z) location as a value in a vector across all rounds and channels and then assigns the barcode (if any) most likely to match that vector.The spot-based method first identifies spots in the image as local intensity peaks and then applies one of several ''decoding'' methods to predict which spot locations correspond to transcripts.
Transcripts can then be assigned cell IDs by applying a segmentation mask.There are naive methods included to generate a mask from a provided auxiliary view, using thresholding and watershed based approaches or using the neural-networkbased CellPose segmentation tool. 24If using CellPose, the user can choose one of the pre-trained models offered by CellPose or they can use their own custom model.External masks, such as those produced by Fiji or Ilastik, can also be imported.
Should troubleshooting of an individual step be needed, the pipeline has been configured to allow for each step to be run individually from prior output and the same json configuration file.We hope to address the needs of all FISH experiments in this comprehensive and simple-to-use pipeline.

An improved seqFISH decoder
We found that starfish decoding of seqFISH spots using starfish's decoding methods consistently produced far fewer mRNA targets than expected, which can reduce the accuracy of many downstream analyses.This is a result of the starfish's seqFISH decoders requiring that spots must be at least mutual nearest neighbors of each other to be connected into a barcode, which is very sensitive to spot drift, a common problem in seq-FISH experiments with high numbers of hybridizations.To address this, we developed the CheckAll decoder, which considers all possible spot combinations that could form barcodes and chooses the best non-overlapping set (Figure S1).
The CheckAll decoder checks all possible barcodes that could be formed from the given spot set and chooses those most likely to represent true mRNA targets (STAR Methods).It is based on the method used by the original seqFISH authors 9 but features some improvements and an additional option that allows for adjustment of the precision/recall tradeoff.This ''mode'' parameter can take three different values and controls this tradeoff by setting several parameters that are not under user control, with high accuracy mode resulting in higher precision but lower recall, low accuracy mode having lower precision but higher recall, and the medium accuracy mode falling somewhere in between.Also, unlike the current starfish decoders, the CheckAll decoder is capable of decoding error-corrected barcodes.If each barcode used has a hamming distance of at least two from every other code, they can be uniquely identified even without a complete barcode.These error-corrected transcripts found with an incomplete barcode are less accurate (more prone to false positives) than those decoded with complete barcodes but can significantly increase recall if that is preferred.

QC metrics
In order to provide qualitative confidence in the results from this pipeline, we defined a set of internal QC metrics that can be applied to the vast majority of experiment types.These QC results are automatically generated during a pipeline run and provide a variety of statistics and graphs for each provided field of view (Figures S2 and S3).Some of these metrics rely on ''offtarget'' barcodes that do not correspond to actual experimental probes included; these can be added to the codebook manually or can be inserted automatically during the pipeline's initial conversion step.Off-target barcodes are treated identically to normal barcodes during decoding and are regarded as false positives in the QC step, something that has been experimentally verified to be functionally identical to including false positive probes that do not match any transcripts. 13We have found that these off-target-based metrics are particularly useful for directly comparing sets of results (Figure S2), as the false positive rates and the threshold of true decoded barcodes that can be delineated from false positive barcodes give a quantifiable point of comparison between datasets.Transcripts identified through the error correction method, where the barcode does not exactly match a known target and is still close enough to be uniquely identified, tend to be more prone to false positives than non-corrected transcripts.Because of this, we show results both with and without error-corrected barcodes in the relevant QC figures to allow users to determine whether they would like to use the error-corrected transcripts in their analyses.
By comparing these QC results across different pipeline parameters, it is possible to determine which set of results is the highest quality.Some of the additional metrics (Figure S3; STAR Methods) may help to troubleshoot which pipeline stage needs altered parameters.The inclusion of these metrics to the pipeline allows for iterative parameter selection to get the best results possible from a set of input data.

Pipeline performance on real FISH datasets
To show that the tools we have built are capable of obtaining high quality results from a variety of FISH experiment types, we ran PIPEFISH on publicly accessible data from three different FISH experiments, each of a different type: seqFISH of a mouse embryo 11 (351 genes), targeted ISS of a mouse brain 23 (50 genes), and MERFISH of human U2-OS cells 25 (130 genes).After transcript locations were obtained, we evaluated QC metrics, using both the aforementioned methods internal to PIPEFISH and the orthogonal methods external to PIPEFISH, to evaluate performance.

Internal quality check
For each dataset, we initially evaluated performance using the internal QC metrics produced by the pipeline.One set of metrics that can be useful for all datasets are those that estimate the background signal through the quantification of off-target barcodes inserted into the provided codebook.While this method is beholden to how densely packed the codebook is, it provides insight into the estimated false positive rate of the experiment, which single handedly can be criteria to rerun the experiment.Our results show that true on-target barcodes are found at a significantly higher frequency than the off-target codes (Figure 2), indicating that there are likely to be few false positive transcripts in these results.
Additionally, we verified that the novel CheckAll decoder we developed outperforms the native decoders from starfish.We only compare results from starfish's NearestNeighbor decoding method, as the ExactMatch method could only find a handful of targets in our images.Performance was evaluated using the false barcode metric to show that the CheckAll decoder was less prone to false positives while also finding more mRNA targets (Figure S4; Table 1).The cost for this increase in performance compared to the starfish decoders comes in the form of increased run times and memory requirements.While the starfish decoders are nearly instantaneous and require very little memory, the CheckAll decoder can take much longer and may require more memory to return results depending on the input (Figure S5).The CheckAll decoder is capable of taking advantage of Python multiprocessing features in order to mitigate the long run times at the cost of additional memory.

External quality check
In addition to the internal metrics shown, we validated the results given by the pipeline using a form of orthogonal validation for each dataset.For the mouse embryo seqFISH dataset, in addition to the 351 genes measured by seqFISH, there were an additional 36 genes imaged by smFISH that were used to verify predicted expression of genes not measured by seqFISH in the original study. 11We used the Python package Tangram 26 to predict expression per cell for the 36 genes measured by smFISH using seqFISH counts from the pipeline and cell-type-annotated scRNA-seq counts from the Mouse Gastrulation Atlas 27 and compared this with the actual expression measured by smFISH in the same cells (Figure 3A; STAR Methods).To achieve highaccuracy matching and prediction of unmeasured genes, the seqFISH counts must be accurate, and so we can use the accuracy of the predicted expression, using the smFISH counts as a truth set, as a quality metric to assess the performance of the pipeline at identifying transcripts in the images.The average Pearson correlation between the predicted and measured counts for the 36 smFISH genes was found to be 0.49 (Figure S6).The large range of correlation coefficients (0.12-0.85) is likely a result of differing image qualities between the smFISH genes.As the smFISH images are not multiplexed, they are very sensitive to the choice of threshold when identifying spots in the images.Thresholds for each gene were determined by plotting the number of spots found at a large range of thresholds and calculating the elbow point of the curve.The area under the curve (AUC) can also be used as a rough estimate of how much the intensities of the true fluorescent signal and the noise in an image overlap, and large overlaps would make separating signal from noise more difficult and less accurate.We found that the correlation between predicted and smFISH counts to be fairly strongly correlated with the AUC of the spot count vs. threshold curve with a Pearson correlation of À0.67 (Figure S7), indicating that the genes where performance is low are likely a result of poor separation between signal and noise intensities in that image and not poor-quality seqFISH counts.
As the MERFISH images used here were of a cell culture of a single cell type, we could directly compare the average expression per cell from the pipeline with RNA expression values obtained using some other standard method.We obtained FPKM measurements calculated from bulk RNA-seq experiments of the same cell type as used in the MERFISH images 25 (personal communication, J. Moffitt) and compared these values with the average expression of the cells in the MERFISH images and found them to be highly correlated with a Pearson correlation coefficient of 0.85 (STAR Methods) (Figure 3B), indicating that the counts the pipeline produced are similar to those obtained by bulk RNA-seq.
There exist abundant expression maps of the mouse brain, which allowed us to compare the pipeline results on the targeted ISS images with a trusted reference.We downloaded single gene expression maps from similar sections of mouse brains from the Allen Brain Atlas and found that the spatial distribution of transcripts from the pipeline and the reference atlas images matched for most genes (STAR Methods; Figures 3C and S8).Out of the fifty genes probed, thirty-four had coronal section images in the Allen Brain Atlas Mouse Brain reference, and nearly all of them showed a strong match between the spatial expression patterns found by our pipeline and in the reference data.

DISCUSSION
As spatial genomics continues to become more popular as an investigation tool, there will be a growing need for computational tools capable of extracting useful information from the raw image data generated by these methods.We have developed this general and customizable pipeline for those who wish to analyze FISH-based spatial transcriptomics images as part of their research.The pipeline's versatility allows it to be used for any experiment involving targeted barcoding of transcripts and includes options for pre-processing image data, a novel seqFISH decoder, and automated quality metrics that can be used to assess performance.Using real data from three different in situ methods and a variety of quality metrics, we showed that the pipeline can be used to obtain high accuracy results that can then be used in a number of downstream applications.
The pipeline is currently capable of obtaining annotated transcript locations from raw images with high fidelity but all subsequent analysis is focused on assessing the quality of the results and not on answering any specific biological question.A future version of the pipeline may include options for downstream analyses that could be automatically performed after the transcripts have been decoded and assigned to cells.This could include imputation of missing gene expression counts based on orthogonal scRNA-seq data, cell-type assignment, identification of spatially variable genes, cell neighborhood/communication effects, and general spatial statistics on both the gene and cell level.
There already exists a number of tools for any of these analyses, so this would involve testing each to find those worth including.This would make the pipeline an end-to-end workflow for FISH-based spatial transcriptomics analysis, streamlining what is typically a difficult and confusing process.
Here, we have developed a multistage pipeline composed of a series of carefully chosen image-processing algorithms.In the future, it may be possible to use curated results of this pipeline to train a neural-network-or similar machine-learning-based approach to predict the presence of transcripts from FISH images in a single pass or with much less pre-processing.Such a generalized learning approach could integrate additional sources of information to make more accurate predictions, such as cellular location and neighborhood, or to simultaneously predict cell segmentation and could potentially be made computationally efficient through the use of hardware acceleration.We therefore view this work as an important prerequisite to future, more generalized inference on this rich source of data.
Limitations of the study PIPEFISH can be used to significantly simplify the process of identifying transcripts in multiplexed FISH images.However, there are non-insignificant hard-drive and RAM requirements in (C) Comparison of spatial distribution of four genes found in the targeted ISS dataset by the pipeline with reference expression of the same genes in a similar brain slice from the Allen Brain Atlas.In the top row, each yellow dot represents a transcript, while the image underneath is the DAPI stain of the sample; in the bottom row, blue dots represent low expression, while more red dots represent higher expression (no color map provided by Allen Brain Atlas), while the image underneath is the Nissl stain of the sample.order to successfully run the pipeline.As each step is run on all FOVs (fields of view) before proceeding to the next step, the output for each step must be saved.Each individual image tile will be duplicated up to three times in the pseudosort (optional), conversion to SpaceTx format, and pre-processing steps, resulting in a large cost to hard-drive space, as multiplexed FISH images can already be quite large.Each FOV could instead be run through the entire pipeline before moving on to the next FOV, only saving the final results, in order to reduce this storage requirement, though this would make it impossible to restart the pipeline at a specific step during troubleshooting.A future version of PIPEFISH could include an option to choose between saving the output from every step and discarding all non-final results so that the user can choose which option suits their needs best.In addition to requiring large amounts of hard-drive space, certain PIPEFISH operations can also be quite RAM intensive, such as the CheckAll decoder, which can require 30 GB+ RAM to decode 2.5 million spots (Figure S5D).Also lacking from PIPEFISH is support for parallelization between FOVs.While some individual operations in PIPEFISH make use of CPU parallelization, it is currently not possible to process more than a single FOV simultaneously.This can result in long run times when there are a large number of FOVs.Toil 28 is a pipeline management system that can execute CWL jobs, and a planned update to PIPEFISH will parallelize the decoding step when run with Toil.

STAR+METHODS CheckAll decoder
Starting from a set of spots of shape (round, channel, z, y, x), a codebook of barcodes of shape (round, channel) with no off rounds in any barcode, and a search radius value, the spots are connected into a non-overlapping set of barcodes that each match a barcode in the codebook.Two slightly different algorithms are used to balance the precision and recall.They share the same steps except the order of two of the steps are switched between the different versions.The following is for the "filter-first" version.
1.For each spot in each round, find all spot neighbors in other rounds that are within the search radius.2. For each spot in each round, build all possible full length barcodes based on the channel labels of each spot's neighbors and itself.3. Choose the "best" barcode of each spot's possible barcodes by calculating a score that is based on minimizing the spatial variance and maximizing the intensities of the spots in the barcode (shown below).Each spot is assigned a "best" barcode in this way.Drop "best" barcodes that don't have a matching target in the codebook.5.Only keep barcodes/targets that were found as "best" using at least n of the spots that make each up (n is determined by the mode parameter).6. Find an approximate maximum independent set of the spot combinations so no two barcodes use the same spot.
a. A graph is created where each node is a combination of spots that make up a decoded barcode and edges connect nodes that share at least one spot.b.Nodes are eliminated from the graph in order of highest number of edges to lowest, with ties being broken by choosing the barcode with the higher score (described in step three), until there are no longer any edges in the graph.
The other method, called "decode-first", is the same except steps 3 and 4 are switched so that the minimum scoring barcode is chosen from the set of possible codes that have a match to the codebook.The filter-first method will have lower recall but higher precision while the other method will have higher recall but at the cost of lower precision.
Decoding is run in multiple stages and the parameters change each stage to become less strict as it progresses.The high accuracy algorithm (filter-first) is always run first followed by the lower accuracy method (decode-first), each with slightly different parameters based on the choice of "mode" parameter.After each decoding, the spots found to be in decoded barcodes are removed from the original set of spots before they are decoded again with a new set of parameters.In order to simplify the number of parameters to choose from, we have sorted them into three sets of presets ("high", "medium", or "low" accuracy) determined by the "mode" decoding parameter.
Decoding is also done multiple times at different search radius values that start at 0 and increase incrementally until they reach the specified search radius.This allows high confidence barcodes to be called first which makes things simpler in subsequent stages as there are less spots to choose from.

Figure 1 .
Figure 1.Overall schematic of processing pipeline The workflow divides the processing into multiple discrete steps.(1) Raw images are optionally sorted into pseudorounds and pseudochannels and then converted to the standardized SpaceTx format (STAR Methods).(2) Images undergo pre-processing such as registration, white tophat filtering, high-pass and low-pass Gaussian filters, and histogram matching.(3) Transcripts are identified within each image and listed in a table with (x,y,z) coordinate information.(4) Cell boundaries are identified and transcripts are assigned a cell ID number.(5) QC metrics for the experiment results and processing are calculated from the decoded output.

Figure 2 .
Figure 2. Quality control figures highlighting false positive off-target barcodes from pipeline output across all fields of view of each dataset (A) Total counts of each barcode.Total count of each barcode colored by barcode type, and error-corrected count shown above non-corrected counts (in orange or green) where applicable.Proposed minimum barcode count threshold is calculated as the upper end of the 95% confidence interval of a normal distribution with the mean and standard deviation of the observed off-target tallies.This is calculated for non-corrected barcodes (dashed) and the combination of noncorrected and corrected barcodes (solid).(B) Transcript count per cell, with the same color scheme as (A).Median transcript count per cell for ISS, seqFISH, and MERFISH.EC, error corrected; NC, noncorrected.

Figure 3 .
Figure 3. External validation of pipeline results (A) Comparison of predicted gene expression by Tangram with measured expression of the same gene by smFISH (top six correlating genes shown).Each column corresponds to a gene, with the Tangram predicted expression above and the measured smFISH expression below.Pearson correlation coefficient is printed below each.(B) Correlation of MERFISH counts with bulk RNA-seq FPKM values.(C)Comparison of spatial distribution of four genes found in the targeted ISS dataset by the pipeline with reference expression of the same genes in a similar brain slice from the Allen Brain Atlas.In the top row, each yellow dot represents a transcript, while the image underneath is the DAPI stain of the sample; in the bottom row, blue dots represent low expression, while more red dots represent higher expression (no color map provided by Allen Brain Atlas), while the image underneath is the Nissl stain of the sample.
ðRoundNum À QualSumÞÞ RoundNum = Number of rounds in experiment QualSum = Sum of normalized intensity values of all spots in barcode Sv = À logð1 = 1+ðS i = x;y;z varðcoords i ÞÞÞ coords i = Vector of spot coordinate values for all spots in the i th dimension C = constant ð2 hereÞ 4.

Table 1 .
Summary of CheckAll decoder performance CA -high, CA -medium, and CA -low columns show precision and total on-target counts for the starfish NearestNeighbor decoder, CheckAll decoder (high-accuracy mode), CheckAll decoder (medium-accuracy mode), and CheckAll decoder (low-accuracy mode), respectively.Percentage difference columns show the performance difference between the starfish NearestNeighbor decoder and the CheckAll decoder for each accuracy mode.The starfish NearestNeighbor decoder is not capable of detecting error-corrected targets, so NC and NC + EC values are identical.NN, NearestNeighbor; CA, CheckAll; NC, not corrected; EC, error corrected.