Single cell RNA-sequencing data generated from human pluripotent stem cell-derived lens epithelial cells

Detailed transcriptomic analyses of differentiated cell populations derived from human pluripotent stem cells is routinely used to assess the identity and utility of the differentiated cells. Here we provide single cell RNA-sequencing data obtained from ROR1-expressing lens epithelial cells (ROR1e LECs), obtained via directed differentiation of CA1 human embryonic stem cells. Analysis of the data using principal component analysis, heat maps and gene ontology assessments revealed phenotypes associated with lens epithelial cells. These data provide a resource for future characterisation of both normal and cataractous human lens biology. Corresponding morphological and functional data obtained from ROR1e LECs are reported in the associated research article “A simplified method for producing human lens epithelial cells and light-focusing micro-lenses from pluripotent stem cells “ (Dewi et al., 2020).


a b s t r a c t
Detailed transcriptomic analyses of differentiated cell populations derived from human pluripotent stem cells is routinely used to assess the identity and utility of the differentiated cells. Here we provide single cell RNA-sequencing data obtained from ROR1-expressing lens epithelial cells (ROR1e LECs), obtained via directed differentiation of CA1 human embryonic stem cells. Analysis of the data using principal component analysis, heat maps and gene ontology assessments revealed phenotypes associated with lens epithelial cells. These data provide a resource for future characterisation of both normal and cataractous human lens biology. Corresponding morphological and functional data obtained from ROR1e LECs are reported in the associated research article "A simplified method for producing human lens epithelial cells and light-focusing micro-lenses from pluripotent stem cells " (Dewi et al., 2020

Value of the Data
• These single cell RNA-sequencing profiles, obtained from lens epithelial cells derived from human embryonic stem cells, provide insights into the transcriptional heterogeneity of human lens cell cultures. • Exploration of these data will provide molecular insights into regulation of gene expression in human lens epithelial cells that will be of use to researchers investigating lens and cataract development. • These data can be further analysed to better understand lens transcriptional regulators, and to provide molecular hypotheses to guide functional investigations of normal lens biology and cataract initiation.

Data Description
Here we report initial scRNA-seq data analyses for human, pluripotent stem cell-derived, ROR1e LECs. These LECs were generated from CA1 human embryonic stem cells as outlined in Fig. 1 . The culture protocol included 16 days guided differentiation of the stem cells to lens cells, followed by harvesting of ROR1e LECs and then an additional 7-days culture in an optimised LEC medium [1] . Phenotypic characterisation of similar ROR1e LEC populations demonstrated: they are morphologically-similar to antibody-purified ROR1 + LECs; they express expected LEC crystallin proteins; and they can be aggregated to generate light-focusing micro-lenses that have similar morphology and protein expression to primary human lenses [2] .

Principal component analyses
The ROR1e LECs were dissociated immediately prior to processing for scRNA-seq. RNA was extracted from the cultured LECs and processed using a 10X Genomics single cell 3 mRNAprep kit. Sequencing was performed using an Illumina NextSeq 500 sequencer. The scRNA-seq data can be accessed via Array Express accession E-MTAB-9178. Analysis using the Seurat guided clustering suite yielded 1979 cells, with a total of 17,944 genes expressed. Assessment via principal component analysis (PCA) revealed three distinct cell clusters ( Fig. 1 B). Together, clusters 1 and 2 accounted for 98.3% of the total cell population (78% and 20.3%, respectively). Cluster 3 accounted for 1.7% of the total cell population. The main genes responsible for the observed variation between the cells are shown ( Fig. 1 C-E).

Heat map analysis of critical lens genes
Heat map analyses were used to investigate expression of critical lens genes in each of the three cell clusters. The initial analysis examined expression of 65 genes known to be required for lens development. This included growth factor signaling genes, crystallin genes and their regulators, proliferation and cell survival genes, and various transcription factors [3] ( Fig. 2 ). Clusters 1 and 2 expressed similar levels of all 65 genes -including high expression of critical lens genes such as PAX6 and CRYAB -while some genes had reduced expression in cluster 3. Overall, these data suggest the ROR1e cells have a lens phenotype.

Analysis of the top 20 cluster marker genes
Assessment of the top 20 genes most differentially expressed across the 3 clusters ( Fig. 3 ) showed 70% of these genes (14/20) were more highly expressed in Clusters 1 and/or 2, and 30% (6/20) more highly in Cluster 3. In particular: Cluster 1 had higher expression of crystallin genes  associated with lens fibre cells ( Fig. 3 A); Cluster 2 had higher expression of some genes associated with cell proliferation such as CENPF, TOP2A, UBE2C and CCNB1 ( Fig. 3 B); and Cluster 3 had higher expression of some genes associated with epithelial-to-mesenchymal transition ( Fig. 3 D).

Analysis of cluster 1 marker genes
Heat maps and GO analyses were then used to assess the 17 genes identified as marker genes for cluster 1. The heat map analysis showed cluster 1 marker genes ( Fig. 4 ) were more similarly expressed in cluster 2 than in cluster 3 -suggesting cluster 1 is similar to cluster 2. No significant GO terms were associated with the cluster 1 marker genes. Comparison with published lens transcriptomic datasets revealed all the cluster 1 marker genes are expressed in adult human lenses [4] and/or mouse lenses [5] -including protein-coding and non-coding genes required for normal lenses (e.g., BEX2, TKT, SNHG8, MALAT1) [6] .

Analysis of cluster 2 marker genes
For the 314 genes identified as markers of cluster 2 ( Fig. 5 ), most were expressed much higher in cluster 2 compared to both clusters 1 and 3 -particularly for lowly-expressed cluster 2 marker genes. GO analysis of the cluster 2 marker genes identified terms relating to cell cycle and mitosis ( Table 1 ). These data indicate cluster 2 contains cells that had higher expression of some proliferation-related genes when the cell population was captured.

Table 1
Top 20 significant GO terms associated with Cluster 2 markers.

Analysis of cluster 3 marker genes
For the 245 cluster 3 marker genes ( Fig. 6 ), most were detected in clusters 1 and 2 but at lower levels. GO analysis of the cluster 3 marker genes identified terms relating to collagen metabolism and extracellular matrix degradation ( Table 2 ). As the cluster 3 marker gene list includes fibronectin, MMP9 and collagen 1/3 genes ( Fig. 3 D), it is possible cluster 3 represents cells primed for epithelial-to-mesenchymal transition (though this cluster represents only 1.7% of the total population of ROR1e LECs).

Comparison of GO terms obtained from all expressed genes in each cluster
To gain a more detailed insight into the relative difference in transcriptomes between the three ROR1e LEC clusters, additional GO analyses were performed using the expressed genes for each cluster. When the expressed genes from all three cluster were combined, the GO terms identified were consistent with known lens biology. This includes GO terms relating to establishment of a polarised epithelium (a key feature of primary LECs), and terms related to growth factor signalling pathways required by LECs (e.g., FGF and WNT) [7] . When the expressed genes from each cluster were analysed separately, 15 of the top 20 GO terms were shared across all the clusters [2] . The first 12 of these 15 shared GO terms were the most statistically-significant GO terms for each of the three clusters (excluding generic terms). Strikingly, these shared GO terms related to establishment of a polarised epithelium, morphogenesis of an epithelium, FGF signalling and WNT signalling. A GO term relating to epithelial to mesenchymal transition was identified in the very small cluster 3. Overall these GO analyses are consistent with stem cellderived ROR1e LECs being a highly purified population of human LEC-like cells. These findings are also consistent with morphological, lens function, proteomic and electron microscopy data obtained using ROR1e LECs [2] .  6. Heat maps of cluster 3 marker genes. The 245 marker genes that were significantly and positively associated with cluster 3 were grouped into four heatmaps based on average gene expression level per cell, then sorted based on lowest to highest expression in cluster 3.

Table 2
Top 20 significant GO terms associated with Cluster 3 markers.

Bioinformatic analyses
Processed scRNA-seq data were analysed using R and the Seurat guided clustering suite version 3 [11] . The data was filtered to include cells with unique feature counts between 200 and 60 0 0 and with < 10% mitochondrial counts. Global-scaling normalization (LogNormalize) was used to normalizes the feature expression measurements for each cell. A subset of features exhibiting high cell-to-cell variation was calculated using the module FindVariableFeatures. A standard, pre-processing linear transformation scaling was applied (ScaleData function). The data then underwent linear dimension reduction via PCA using the previously determined variable genes. Marker genes were identified for each cell cluster using ROC analysis and taking the default value of 0.25 log(feature count) threshold-within FindAllMarkers. Identified clusters of genes were visualised using PCA plots where individual genes were overlayed on the plot. GO analyses were performed using the Functional Annotation tool of the DAVID Bioninformatics Resources, version 6.8 [12][13][14] . For each cluster, GO analysis was performed using both all expressed genes, and genes with average expression count of ≥1 per cell. The top 20 GO terms are reported for each cluster based on Benjamini p-values, excluding generic GO terms related to common cellular processes (e.g., cell cycle, macromolecule processing, respiration, protein processing, regulation of apoptosis, etc.).

Ethics Statement
Approval for this study was provided by the Western Sydney University Human Research Ethics Committee (Australia).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.