Spectacle: an interactive resource for ocular single-cell RNA sequencing data analysis

,

Several resources have been assembled to make previously generated gene expression datasets more approachable. A major example of such a resource is the Genotype-Tissue Expression (GTEx) Project, which provides a visualization platform to study the correlation between genetic variants and gene expression across a diverse collection of tissues (2013). However, ocular tissues were not included in this database, limiting the utility of this resource to vision researchers. To address this, the Ocular Tissue Database (OTDB) compiled microarray gene expression values in ten human donor eyes and summarized the results in an internet interface (Wagner et al., 2013). Subsequently, the eyeIntegration database assembled hundreds of publicly available ocular bulk RNA sequencing datasets comprising over one thousand individual cornea, retina, and RPE-choroid samples. This database provides tools for querying gene expression and offers a web-based visualization system for comparing expression profiles between different ocular tissues (Bryan et al., 2018). Likewise, the Single Cell Portal (https://portals.broadinstitute.org/single_cell) allows users to survey single-cell level gene expression across a diverse group of tissues. While this resource is impressively large (including 237 different datasets at the time of this publication), the diverse sequencing and analytic technologies used in these studies preclude interactive analysis beyond visualizing which cluster(s) express a particular gene of interest.
Each of these existing interactive gene expression resources have strengths and limitations shaped by the number of included datasets, the sequencing technology used for the experiments, and the degree of interactivity the user has with the data. We set out to develop a highly interactive, web-based visualization interface for analyzing human ocular singlecell RNA sequencing data. Our platform, Spectacle, facilitates exploring single-cell RNA sequencing expression data from the human retina and RPE/choroid. Spectacle allow users to visualize gene expression at both the cell and cluster level, analyze differentially expressed genes in interactively-selected cell populations, and identify subsets of cells exhibiting distinct gene signatures. Spectacle is freely accessible at OcularGeneExpression.org/singlecell and is powered by cellcuratoR, an open-source R package that enables users to locally interact with their own single-cell datasets.

Data Processing with cellcuratoR:
In order to produce consistent interactive visualization features across different single-cell RNA sequencing experiments, we developed the R package cellcuratoR. cellcuratoR provides a framework to convert single-cell RNA sequencing data analyzed in Seurat (v3.0.0 -v3.1.5) (Butler et al., 2018) into a format interpretable by the R Shiny interface. For reference, an example pipeline for creating a Seurat object from mapped FASTQ files is available as Supplementary File 1. The processed Seurat object is subsequently optimized to remove unneeded features for downstream visualizations. This serves to minimize file sizes and increase the speed of the reactive user interface. cellcuratoR is freely available as an R package from GitHub (www.github.com/drewvoigt10/cellcuratoR). The cellcuratoR package can be installed on any computer and run in a local R session, allowing users to explore data from their own experiments without requiring a webserver. Five publicly available human single-cell RNA sequencing datasets from the retina (Voigt et al., 2020a;Voigt et al., 2019b) and RPE/choroid (Voigt et al., 2019a;Voigt et al., 2020b) have been pre-processed with cellcuratoR for interactive visualizations with the web-hosted Spectacle resource (Table 1).

Site availability:
Spectacle is freely available at OcularGeneExpression.org/singlecell or https:// singlecell.ivr.uiowa.edu. The user interface is documented in a detailed "how to" guide, which is accessible from the left-hand menu of Spectacle. This guide includes an overview video and also details how to interactively load different datasets, visualize clusters of different cells, create expression heatmaps and violin plots, re-cluster cell populations of interest, and perform differential expression between different cell types across different biological conditions, as described below. In addition, the GitHub page for cellcuratoR (www.github.com/drewvoigt10/cellcuratoR) includes several animations that demonstrate salient features of the user interface.

User-interface visualizations:
Interactive visualizations of single-cell data were implemented using the Shiny (v1.3.2) framework in R (Chang et al., 2020). Several different visualization modalities are available to analyze and interpret gene expression within each dataset. Application Programming Interface (API) calls and parameters for the Seurat-based visualization functions are outlined in detail in Supplemental File 2.
Dimensionality reduction: Visualization of graph-based clusters is achieved with a dimensionality reduction plot. By default, the dimensions are based upon the pre-computed values using uniform manifold approximation and projection (UMAP), but other methods are also supported (e.g., t-distributed Stochastic Neighbor Embedding (tSNE) and principal component analysis (PCA)). The user can zoom in on subpopulations of cells and hover over cells to determine their cluster identity. In the dimensionality reduction visualization, cells can also be shaded according to cluster identity or the originating library.
Heatmaps: Gene expression can be visualized across clustered cells within a dimensionality reduction view in the form of heatmaps, in which each the shading of each cell is proportional to the relative expression of the specified gene. A custom legend provides reference to expression levels as transcripts per 10,000 (TP10K).
Violin Plots: Violin plots depicting the expression of genes of interest are presented per cluster. Expression distributions are only drawn if at least 25% of cells in a cluster express the gene of interest.
Differential expression: Differential expression can be performed between pre-defined clusters of cells or cell populations manually selected with the lasso tool. In addition, differential expression can be performed between cells that originate from different biological conditions (such as cells originating from healthy versus diseased libraries), so long as these biological conditions are dichotomous.
Reclustering: Cell populations can be explored in a lower dimensional feature space with re-clustering. This analysis renormalizes the selected cell population and performs a dimensionality reduction on that subset. Supported dimensionality reduction algorithms include UMAP, tSNE, and PCA.

Spectacle access and overview.
Spectacle is an interactive visualization system for ocular single-cell gene expression analysis that can be accessed at OcularGeneExpression.org/singlecell. Spectacle is a webaccessible instance of cellcuratoR, an R package available at Github (www.github.com/ drewvoigt10/cellcuratoR). Spectacle currently hosts five pre-processed single-cell RNA sequencing datasets from human retina and RPE/choroid (Table 1), and additional datasets will be added in the future as they become available. A video tutorial covering salient features of Spectacle is accessible via the "How to guide" from the left-hand menu.
Visualizing cell clusters and gene expression: Spectacle provides several complementary visualizations for exploring gene expression across clusters. First, dimensionality reduction plots are used to visualize clustered cells (Figure 1). Second, the user can query cluster-level expression of one or more genes in the form of violin plots (Figure 1), which can be useful for classifying clusters into putative cell types or for exploring expression patterns of genes of interest. Lastly, users can color the cells in the dimensionality reduction plots by the expression of a gene of interest with heatmaps (SI Figure 1).
In addition to visualizing cells by cluster identity, it is often useful to analyze which libraries contribute cells to each cluster. Hence, Spectacle provides the ability to color cells by library composition, which can aid in visualizing clusters that contain a majority of cells from a specific biological condition. For example, in analyzing populations isolated from the fovea and peripheral human retina, distinct subpopulations of glial cells belonging to foveal (red) and peripheral (blue) libraries become apparent (Figure 2). Such cellular groups can further Voigt et al. Page 4 Exp Eye Res. Author manuscript; available in PMC 2021 November 01.
be re-clustered in attempt to identify subpopulations or to view relationships between cells in a more granular space (Figure 2). In this example, foveal and peripheral glial cells remain well segregated upon re-clustering, further suggesting that the multidimensional gene expression patterns of glial cells are influenced by the region of the retina from which they came.
Differential expression for generating hypotheses: Differential expression analysis is often central to RNA-based experiments. Spectacle thus supports two modes of differential expression analysis. First, gene expression can be compared between any combination of pre-defined cellular clusters or flexibly selected cell populations with the user drawn "lasso" tool ( Figure 3A). Such functionalities aid in identifying gene signatures specific for cell populations or subpopulations. Second, gene expression can be compared between biological conditions, such as anatomic region or disease state, which promotes hypothesis-generating questions about normal ocular physiology and disease pathogenesis. Spectacle displays these results as graphs ( Figure 3CD) and tables ( Figure 3E), which users can download as high-resolution image files and spreadsheets.

Case Study:
To further demonstrate the utility of Spectacle, we present a basic analysis of retina samples from five human donors: one donor with autoimmune retinopathy and four control donors (Voigt et al., 2020a). In this dataset, 23,429 cells were recovered from paired foveal and peripheral libraries. After clustering and dimensionality reduction, three distinct clusters of bipolar cells were identified ( Figure 4A). These clusters were assigned to rod bipolar cell, cone OFF bipolar cell, and cone ON bipolar cell types by comparing expression of distinguishing marker genes in heatmaps ( Figure 4B) and violin plots ( Figure 4C). In the mouse retina, bipolar cell subsets have been extensively characterized with single-cell RNA sequencing, which resulted in the identification of 15 morphologic and transcriptomic subsets (Shekhar et al., 2016). Using the Spectacle re-clustering tool, we re-clustered the three observed bipolar cells detected in this experiment to analyze potential subpopulations ( Figure 4D).
Next, we asked if key genes enriched in bipolar cell subsets localized to subpopulations within the re-clustered bipolar cells. RELN, an enriched gene in the BC7 subclass of cone ON bipolar cells, did not localize to any subpopulations within the unprocessed dataset ( Figure 4E, insert) but demonstrated localization to subpopulation of cells within the reclustered object ( Figure 4E). Likewise, expression of ERBB4, a gene enriched in the cone OFF bipolar BC3A morphologic class, segregated to a small population of OFF bipolar cells within the re-clustered object ( Figure 4F). Such re-clustering analysis highlights the utility of exploring gene expression in a reduced dimensional space for the identification and characterization of cellular subpopulations of interest.

Discussion
Visualizing single-cell RNA sequencing data can be complicated and usually requires a degree of bioinformatic expertise. With Spectacle, we have deployed an interactive singlecell RNA sequencing exploration resource for ocular datasets that can extend data interpretation to a broad range of vision researchers. In addition, our development of cellcuratoR provides a flexible platform for bioinformaticians to share their own single-cell RNA sequencing analyses with interdisciplinary teams. We believe that cellcuratoR will increase accessibility to and interpretability of the many information-rich, publicly available single-cell datasets.
Other visualization tools for single-cell RNA sequencing data exist to interactively explore data (Hillje et al., 2019;Innes and Bader, 2018;Patel, 2018;Pont et al., 2019). We believe that Spectacle offers at least two major advantages. First, ocular tissues are excluded from many popular gene expression resources, such as GTEx. This prevents quickly determining if a gene of interest is expressed in the retina, RPE, or choroid, and instead forces a researcher to embark on a bioinformatic exercise to reprocess data from pre-published datasets. With Spectacle, one can identify the precise population(s) that express a given gene of interest within seconds, which we have found to be immensely helpful in preparing manuscripts and discussing clinical cases. Second, Spectacle is extremely interactive. Many existing single-cell visualization platforms share the basic functionalities offered by Spectacle, such as displaying gene expression in heatmaps or violin plots. But Spectacle builds upon these visualization aspects and adds several more advanced features to further hypothesis generation. For example, re-clustering of selected cell populations allows for discovery of cellular subgroups that are not discernable when analyzed with other cell types, as illustrated by our analysis of bipolar subpopulations (Figure 4). In addition, Spectacle supports highly flexible differential expression analysis to not only identify genes enriched in each cellular cluster, but also to detect genes enriched in cells across biological conditions (supporting comparisons such as foveal versus peripheral, youth versus age, health versus autoimmune retinopathy, and health versus age-related macular degeneration). Thus, we believe that Spectacle dramatically lowers the analytical barrier for vision researchers to quickly access and interact with rich ocular single-cell expression datasets.
Spectacle contains five previously published ocular datasets available for interactive visualization. We plan on updating Spectacle with future studies from our group as they become published. For other groups wishing to interactively explore their own datasets, we have made cellcuratoR, our R-package that powers Spectacle, freely available (www.github.com/drewvoigt10/cellcuratoR). After preliminary analysis in Seurat, bioinformaticians can use cellcuratoR to privately distribute results with their interdisciplinary research groups. We have found that this interactive style of sharing results is immensely helpful in generating hypotheses and drafting manuscripts. Likewise, cellcuratoR is published under the permissive GPL-3 license, allowing others to modify the codebase to their experimental needs and host their own publicly facing webservers. We will maintain both Spectacle and cellcuratoR to ensure compatibility with any future updates to the Seurat R package.
There are several limitations to this approach. First, the high degree of interactivity of the user interface requires consistent data processing of a Seurat analyzed S4 object in R. This initial processing requires a degree of bioinformatic expertise. While Seurat is a popular single-cell RNA sequencing analysis tool, several other data analysis systems exist and are used in the field (Azizi et al., 2018;Klein et al., 2015;Parekh et al., 2018;Qiu et al., 2017).
In addition, the numerous visualization and analysis features of Spectacle require loading of very large expression matrices into memory. While the file sizes of such objects have been minimized where possible, loading a dataset takes several seconds. Benchmarking experiments (SI Figure 2) suggest that dataset loading times increase linearly with the number of cells in each experiment. Computationally demanding functionalities, such as differential expression, are appreciably slower with larger datasets, and we recommend adjusting thresholds to accelerate these processes when exploring larger studies. Lastly, highly interactive differential expression is powerful for identifying enriched genes across different cell types and biological conditions; however, such interactivity may permit phacking in data analysis (Head et al., 2015). In particular, the differential expression within Spectacle (and Seurat) treats all cells as independent observations, which inflates p-values, especially in the context of comparing expression across biological conditions. Most singlecell experiments, including those provided in Spectacle, contain limited biological replicates, and hence the statistical significance of differential expression results should be interpreted with caution.
In summary, Spectacle aids in generating publication-ready visualizations, performing basic data analysis, and interpreting results from complex ocular single-cell RNA sequencing experiments. This expedites hypothesis generation and testing to improve understanding of visual diseases.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.   In addition to visualizing cluster labels, the dimensionality reduction plot can be colored by originating library (left). This is particularly useful when libraries correspond to different biological conditions, such as region of isolated tissue (with cells colored in red originating from the fovea and cells colored in blue originating from the periphery). Re-clustering of cells (right) can aid in the identification of subpopulations of cells and viewing the relationships of cells in a more granular space. In this example, after glial cell populations (dotted line) have been re-clustered, the foveal and peripheral cells remain discrete, further suggesting that gene expression in glial cells is influenced by region.  A-B. Differential expression can be performed between pre-characterized clusters of cells or interactively selected populations with the lasso tool, with cells selected on the left belonging to "Group_1" (A) and cells selected on the right belonging to "Group_2" (B). In addition to comparing expression between selected populations, differential expression can be performed between cells in the same region originating from different biological conditions, such as disease status (not shown). C. Differential expression results are displayed graphically. The y-axis depicts the log of the fold-change between cells in the Group_1 and Group_2 selections. The x-axis depicts a variable called "delta percent," which represents the percentage of cells in Group_1 samples that express each gene minus the percent of cells in Group_2 samples that express the gene. For example, the gene BCO2 is expressed by 4.1% of cells in Group_1 and 67.6% of cells in Group_2, resulting in a delta percent of 0.041 minus 0.676 = −0.635. This visualization allows for the expression level (yaxis) and the proportion of expressing cells (x-axis) to be simultaneously evaluated. D. The cell selections are re-depicted on the standard dimensionality reduction space. E. In addition Voigt et al. Page 13 Exp Eye Res. Author manuscript; available in PMC 2021 November 01.