Serum-Based Culture Conditions Provoke Gene Expression Variability in Mouse Embryonic Stem Cells as Revealed by Single-Cell Analysis

Variation in gene expression is an important feature of mouse embryonic stem cells (ESCs). However, the mechanisms responsible for global gene expression variation in ESCs are not fully understood. We performed single-cell mRNA-seq analysis of mouse ESCs and uncovered significant heterogeneity in ESCs cultured in serum. We define highly variable gene clusters with distinct chromatin states and show that bivalent genes are prone to expression variation. At the same time, we identify an ESC-priming pathway that initiates the exit from the naive ESC state. Finally, we provide evidence that a large proportion of intracellular network variability is due to the extracellular culture environment. Serum-free culture reduces cellular heterogeneity and transcriptome variation in ESCs.

In Brief Guo et al. investigate mechanisms of gene expression variation and cellular heterogeneity in mouse embryonic stem cells using single-cell mRNA-seq analysis. They find a differentiationpriming pathway in the cell culture system and show that serum provokes gene expression variability in mouse embryonic stem cells.

INTRODUCTION
Early mammalian development cells differentiate toward trophectoderm (TE) and inner cell mass (ICM). The ICM goes on to form the epiblast (EPI) and the primitive endoderm (PE). Embryonic stem cells (ESCs) can be derived from the ICM in the presence of leukemia inhibitory factor (LIF) and fetal calf serum (FCS) (Evans and Kaufman, 1981). ESCs have two important characteristics: the capacity for differentiation into all somatic cell types and the property of unlimited self-renewal in vitro.
Previous studies suggest that ESCs in culture are not homogeneous. Transcription factors associated with ESC identity may be expressed in a heterogeneous manner. For example, Nanog and Dppa3 are expressed in only a fraction of cells (Chambers et al., 2007;Hayashi et al., 2008). Variation in expression of these individual genes has been implicated in controlling the differentiation potential of different subpopulations. However, traditional methods are limited to the analysis of small number of genes. The mechanisms underlying genome-scale ESC variability are not fully characterized.
Using single-cell technologies, several studies have reported transcriptomic analysis of mouse ESCs and uncovered signaling and microRNA pathways that influence heterogeneity of ESCs in culture (Gr€ un et al., 2014;Kumar et al., 2014). More-recent studies have also examined transcriptional networks and cellcycle regulators that contribute to transcriptional variation (Kolodziejczyk et al., 2015;Papatsenko et al., 2015). Epigenetic regulation, which may also contribute to overall variability, has not been adequately explored. Moreover, the relevance of ESC culture heterogeneity to early embryonic development has yet to be analyzed.
In this study, we sought to combine the power of microfluidicbased single-cell mRNA-seq and single-cell qPCR to characterize in depth the molecular basis of heterogeneity among mouse ESCs in culture. We employ optimized computational strategies to reveal epigenetic mechanisms contributing to variation in gene expression and search for upstream pathways that induce network plasticity.

Single-Cell mRNA-Seq Analysis Reveals Heterogeneity among Mouse ESCs in Culture
We performed single-cell mRNA-seq analysis of undifferentiated ESCs in culture. Feeder-free J1 ESCs were grown in the presence of serum and LIF. Single ESCs were captured on a medium-sized (10-17 mm cell diameter) microfluidic RNA-seq chip (Fluidigm) using the Fluidigm C1 system ( Figure 1A). Whole-transcriptome-sequencing libraries were prepared using template switching-based amplification ( Figure 1B). We compared the abundance of selected markers from single-cell cDNA amplified with the template switching (SMART) method, as well as the sequence-specific amplification (SSA) method. qPCR results from different amplification products revealed comparable expression patterns for wild-type ESCs, namely high-level detection of EPI markers Pou5f1, Nanog, and Sox2, as well as low-level detection of TE markers, Cdx2 and Gata3. From amplified cDNA, we detected a bimodal distribution for Id2 and sharp unimodal distribution for endogenous controls, Actb and Gapdh ( Figure 1C).
Amplified single-cell libraries were barcoded, pooled, and sequenced to a depth of about 1.2 million reads per sample. For each gene in a sample, the median reads per kilobase of transcript per million reads mapped (RPKM) was approximately ten (Figures S1A and S1B). In order to filter out unreliable signals, we removed genes with low copy counts and ensured an average Pearson correlation of R = 0.8 between two sequencing duplicates for each single-cell library (Figures S1C-S1E; see Experimental Procedures for details). Using this strategy, we recover $9,000 genes per cell ( Figure 1D). The average of single-cell mRNA-seq profiles from ESCs showed high correlation with bulk mRNA-seq profiles from the same cell line ( Figure 1E). However, we observed that a fraction of the samples had distinct global signatures from the others, suggesting strong heterogeneity under the tested culture conditions ( Figure S1F). Although an endogenous control gene, Actb, and a pluripotency gene, Pou5f1, were homogenously expressed among single cells, we observed strong variation of other markers, including Lamb1, Clu, and Snai1 in both J1 and E14 cells (Figures S1G and S1H). By examining the expression correlation of key lineage regulators in the single-cell data, we defined different gene modules that correlate with this heterogeneity ( Figure 1F). The tightly correlated pluripotency markers, Pou5f1, Sox2, Nanog, and Fgf4, define a module for maintaining the undifferentiated ESC state. A Sox17, Gata6, and Gata4 cluster reflects a PE module that is indicative of PE differentiation.

Gene Expression Variability Is Associated with Distinct Chromatin States
In order to study variability of gene expression within the singlecell transcriptome data, we first tested different ways to quantify the level of variability. As variability measurements are easily influenced by mean level and amplification bias, we sought to decouple gene expression variation from the mean expression. We fitted a Lowess curve to log2 of the mean expression versus the log2 of the SD and then calculated the distance from this curve for each gene (Figure 2A). Because the distribution of this distance is approximately normal, we further rescaled the values by converting to Z scores. The resulting value, which we term the Lowess coefficient of dispersion (LCOD), is used to quantify the variation of gene expression. We show that LCOD is the least correlated or anti-correlated with the mean expression level as compared to other measurements (Figure 2B). We then selected the most-(LCOD > 1.5) and the least-variable genes (LCOD < À1.5). Gene Ontology (GO) enrichment analysis indicated that the most-variable genes are related predominantly with developmental processes, whereas the least-variable genes are enriched for translation, mRNA processing, and splicing ( Figure S2A).
To investigate the mechanism underlying variability at the single-cell level, we integrated our single-cell mRNA-seq data with the genome-wide transcription factor binding and chromatin state information obtained from publicly available bulk-level ChIP-seq data sets. We mapped both transcription factor occupancy and key chromatin marks in a 10-kb window at the transcriptional start site of the most-and least-variable genes. Of note, the master pluripotency regulators, including Oct4, Sox2, and Nanog, displayed similar binding patterns between the most-and least-variable genes. However, we observed a distinct chromatin state signature associated with the most-variable genes, including enrichment of the H3K27me3 mark and Ezh2 occupancy, as well as depletion of H3K36me3. Our analysis suggests that chromatin regulators may play an important role in mediating gene expression variability at the single-cell level ( Figure 2C).
We then aimed to further discriminate the list of most-variable genes using the chromatin marks found to correlate for gene expression variation. As shown in Figure 2D, we observed three distinct patterns, suggesting multiple pathways leading to fluctuations in gene expression. Cluster 1 genes were strongly enriched for H3K27me3 and Ezh2 binding and moderately enriched for H3K4me3, suggesting a role of polycomb group proteins in mediating expression variability. Cluster 2 genes were moderately enriched for H3K27me3, H3K4me3, and H3K36me3. Cluster 3 genes were enriched for H3K4me3 and H3K36me3, indicating a possible role of Setd2.
Importantly, the most-variable genes were enriched for previously defined bivalent genes marked by both H3K27me3 and H3K4me3 in their promoters (22% of the most variable versus 6% for all the genes measured in our assay; Figures 2D and S2B-S2D; Bernstein et al., 2006). We also found that overall gene expression variability was significantly higher among bivalent genes (p = 1.0EÀ32; KS test; Figure S2E). Whereas bivalent genes have been commonly considered to be silent in ESCs, previous studies have been limited to population level analysis.
Here, using single-cell analysis, we observed that many bivalent genes are in fact actively transcribed in a subset of cells and that the overall distribution is bimodal, suggesting that the transcriptional activities in an ESC may be highly dynamic and that the bivalent domains may play a role in modulating the frequency of gene activation.

Computational Analysis Reveals ESC Early Priming Pathway
To better understand the complex structure driven by heterogeneity in gene expression, we used locally linear embedding (LLE) dimensionality reduction analysis. LLE is an unbiased approach that computes a low dimensional representation of the data, preserving the original distances between neighborhoods points (Roweis and Saul, 2000). As seen from the LLE projection, a culture of morphologically ''undifferentiated'' ESCs was comprised of different subgroups ( Figures S3A and  S3B). The distribution of cell states suggests a defined pathway exiting pluripotency. In order to delineate this pathway more accurately, we applied a principal curve analysis and reconstructed a smooth path that passes through the cells at all stages (Figures 3A and S3C). By mapping the individual cells onto the principal curve, we identified three distinct cellular states ( Figures 3A  and S3C). On the left, the closely clustered population corresponds to the naive ESC state. The cells within this population express pluripotency markers (e.g., Nanog, Sox2, and Klf2) at high level ( Figures 3A and S3C). In addition, expression of differentiation markers was not detected in this group. On the top of the curve, we defined a previously unrecognized population, consisting of ''primed'' cells. These ESCs simultaneously express pluripotent markers (e.g., Sox2 and Nanog) and differentiation markers (e.g., Gata4,Gata6,and Lamb1). This population appears to represent a transcriptionally primed cellular state in which cells are exiting the naive ESC state and under transition to a differentiated state. The third cluster of cells express Gata6 and Gata4 at high level and pluripotent markers at low level. Cells of this population are predominantly representative of PE lineage cells, which are considered to be the default differentiation state for wild-type ESCs in culture. Of note, expression of Tet1, Ezh2, and Suz12 was high in the naive state, reduced in the primed state, and then repressed in differentiated cells ( Figure S3C), whereas the endogenous control markers, Actb and Gapdh, were robustly expressed in all cells. To examine in a systematic fashion the contribution of each gene to the pluripotency exit pathway, we calculated the Pearson correlation between its expression level in a cell and the mapped position on the principal curve. For the most-variable genes, we identified a subset whose expression levels were highly correlated with the differentiation path, including Lama1 and Lamb1 ( Figure S3D). These genes are likely to play an important role in initiating cell differentiation. Similarly, we also identified another subset whose expression levels are anticorrelated with the differentiation path, such as Tet1 and Tet2. These genes are likely to play an important role in maintenance of pluripotency. Figure 3B depicts an expression heatmap of highlighted genes along the early ESC differentiation path. In the heatmap, we show that primed cells co-express pluripotent and differentiation modules.
The primed ESC state maintains a distinct gene expression signature (Figure S3E). Hierarchical clustering of single-cell data also distinguishes this state as a unique cell-type cluster that co-expresses pluripotency markers and differentiation markers ( Figure 3C). In order to link the state with in vivo developmental processes, we reanalyzed previously published single-cell data from blastocyst stage ICM cells (Guo et al., 2010). We found a corresponding primed cell-type cluster that is distinct from known PE and EPI cell clusters in the blastocyst ICM ( Figure 3D). The special cluster of blastocyst cells also co-expresses Sox2, Gata4, and Gata6. The identification of a primed state adds to the complexity of seemingly homogenous pluripotent cells and suggests stepwise exit from the naive pluripotent state both in vitro and in vivo.

External Culture System Affects Network Variability
A central question regarding cellular heterogeneity is whether variability in gene expression is derived from internal transcriptional ''noise'' or results from fluctuation in response to external signals. To address this question, we searched for upstream regulators of the variably expressed genes defined by our analysis. We used the Haystack pipeline (Pinello et al., 2014) to identify enriched transcription factor motifs upstream of different groups of highly variable genes ( Figure 4A). For cluster 1 variable genes, we observed enrichment for a motif recognized by Zbtb33. For cluster 2 genes, the motif for TCF factors was enriched. For cluster 3 genes, the most-enriched motif corresponded to that for serum response factor, SRF. TCF factors lie downstream of glycogen synthase kinase 3 (GSK3) pathways in ESCs (Martello et al., 2012). SRF is a critical transcription factor that binds to the c-fos serum response element (Norman et al., 1988) that lies downstream of serum response and the MAPK pathways (Hill et al., 1993). These clues suggest that the serum-based culture conditions generally employed for ESCs might be a major contributor to variable gene expression observed in single-cell analysis. Indeed, downstream effectors for these signaling pathways are highly variable in cultured ESCs ( Figures S4A and S4B).
Besides the classical serum-based culture conditions, a serum-free 2i culture system targeting both the MAPK and GSK3 pathways has been found to maintain mouse ESC pluripotency (Ying et al., 2008). To ascertain the contribution of serum-based culture conditions on expression variability, we assessed expression in J1 ESCs at the single-cell level in medium containing normal serum, knockout serum replacement, or 2i chemicals (PD184352 and CHIR99021). We analyzed these three ESC cultures using a more-cost-effective singlecell qPCR protocol that we previously described . We selected 96 genes for analysis, including known pluripotency regulators and differentiation markers, as well genes that displayed strong variability under standard culture conditions.
On examination of the single-cell data from cells under the three culture conditions, we found that ESCs cultured in serum expressed more markers of differentiation (e.g., Id2, Lamb1, and Snai1) than ESCs in 2i medium ( Figure S4C). On LLE projection of single-cell data, global expression of 2i ESCs was more tightly distributed than that of ESCs cells cultured in serum or serum replacement ( Figure 4B). Specifically, by focusing on distribution over the first principal component and distribution of expression SD, we confirmed that ESCs cultured in 2i medium exhibit the least variation. Medium with serum led to the greatest heterogeneity in expression ( Figures  4C, 4D, and S4D). As revealed by violin plots in Figure 4D, the expression distribution of key regulators suggested a more-homogenous transcriptional network in 2i ESCs. For example, Tbx3, which is a highly variable pluripotency marker, showed clear bimodal distribution in serum-cultured ESCs. However, under 2i conditions, the percentage of Tbx3-expressing cells was significantly increased, whereas the differentiation priming marker Snai1 was repressed. These findings were confirmed in E14 ESCs, as well as an independent clone of J1 ESCs ( Figure S4E). We next asked whether the reduced heterogeneity under 2i conditions was accompanied with altered epigenetic status. We searched for an effect of 2i culture on the bivalent marks of highly variable genes using available epigenomic data (Marks et al., 2012). The overall number of bivalent genes was reduced dramatically in 2i-cultured ESCs, as compared with ESCs cultured in serum-containing medium ( Figure S4F). We also found that, among the most-variable genes defined in serumcultured ESCs, two-thirds of bivalent markers lost their bivalency in 2i conditions ( Figure 4E).
We then utilized published single-cell DNA methylation data (Smallwood et al., 2014) to interrogate the link between gene expression variation and DNA methylation variation. For each gene, we considered the region [TSS-2kb, TES+2kb] and calculated the difference in methylation variance between serum and 2i conditions. Interestingly, we found a moderate correlation (r = 0.33; p value = 0.0016) between the difference in methylation variance and the difference in gene expression variance between serum and 2i conditions ( Figure 4F). When cells were cultured with serum, Tbx3 and Snai1, two variable markers in ESCs, showed strong variation of DNA methylation level in gene bodies. However, when cultured with 2i medium, such epigenetic variation was significantly reduced (Figure S4G), suggesting that reduction of DNA methylation variability may in part contribute to reduction of gene expression variability.
In summary, the nature of the culture conditions represents an important contributor to bivalency, gene expression variation, and DNA methylation variation in mouse ESCs. With replacement of serum and proper targeting of the related signaling pathways, variability among ESCs is largely controllable without hampering pluripotency and self-renewal.

DISCUSSION
Cellular heterogeneity has been accepted as a hallmark of both embryonic and adult stem cells (Graf and Stadtfeld, 2008;Chambers et al., 2007). It has been proposed that variation in gene expression arises from transcriptional noise and network fluctuation and that associated heterogeneity accounts for stochasticity of cell fate decisions in stem and progenitor cells (Chang et al., 2008). Using mouse ESCs as a model, we have investigated global gene expression variability at single-cell resolution.
In agreement with recently published single-cell analyses of mouse ESCs (Gr€ un et al., 2014;Kolodziejczyk et al., 2015;Kumar et al., 2014;Papatsenko et al., 2015), we observed significant heterogeneity in gene expression in the serumcultured mouse ESCs. Using LLE analysis, we showed that heterogeneity does not appear to be stochastic but rather follows a defined differentiation pathway toward PE-like cells. Importantly, we defined a primed ESC state that reflects transition from a naive to differentiated state. ESCs in the primed state co-express pluripotency and differentiation modules. We have also provided evidence that the primed state is developmentally relevant, as the same signature is found in the developing mouse blastocyst during PE and EPI lineage specification. Prior studies emphasized transcriptional networks and microRNA pathways that lead to gene expression variation (Kolodziejczyk et al., 2015;Kumar et al., 2014). In the current work, we associate gene expression variation with epigenetic characteristics. We used LCOD analysis to extract true variability from mean expression level and describe the unique epigenetic status that distinguishes the highly variable genes. We propose that a proportion of previously defined bivalent marked genes are actually highly variable in their expression in cultured ESCs, suggesting a possible role for bivalent domains in modulating the frequency of transcription activation. One caveat is that ChIPseq data are obtained from population-level studies. As a result, it remains unclear whether bivalent domains are established in all cells or only a fraction of cells. Future developments of methods for the mapping of epigenetic marks at single-cell resolution are needed to resolve these issues.
Importantly, we demonstrated that the culture environment contributes strongly to observed gene expression variability. Upon replacement of serum and targeting the MAPK and GSK3 pathways by 2i conditions, ESCs in culture exhibit greater homogeneity in gene expression. Our results confirm findings from other recent studies (Gr€ un et al., 2014;Kumar et al., 2014). The FGF and MAPK pathway are closely related with EPI cell differentiation (Ying et al., 2008;Guo et al., 2010). WNT and GSK3 signaling has been implicated in control of gene expression noise during development (Arias and Hayward, 2006). Using motif analysis, we have connected gene expression variation with these important signaling pathways. Interestingly, replacing serum with knockout serum replacement alone also reduces gene expression variation, suggesting that other serum-responsive pathways contribute to ESC culture heterogeneity. Moreover, we show that 2i-cultured ESCs exhibit reduced bivalency and altered single-cell-level gene expression variation correlates with single-cell-level methylation status. We provide an example that gene expression variation is controllable through proper perturbation of key signaling pathways.
The plasticity of mammalian cellular states complicates an understanding of cell fate decision mechanisms. Comprehensive characterization of dynamic stem cell differentiation pathways requires single-cell gene expression analysis. Acquisition of similar analyses from different cellular systems should eventually allow for the mapping of the cell fate decision landscape and the modeling of dynamic network configurations during mammalian development.
Single-Cell mRNA-Seq Feeder-free J1 ESCs were grown in the presence of serum and LIF. ESCs were dispersed via trypsin-EDTA treatment. Single-cell whole-transcriptome amplification was performed using the FluidigmC1 Single-Cell Auto Prep System (C1 System) as per the manufacturer's recommendations (full details available at http://www.fluidigm.com). Amplified cDNAs were diluted with C1 DNA dilution reagent, quantified using Quant-it HS system, and validated by qPCR with selected primers. Successfully amplified single-cell cDNA samples were selected and diluted to the same concentration. Single-cell libraries were constructed using the Nextera XT DNA Sample Preparation kit (Illumina), pooled using Nextera XT DNA Sample Preparation Index Kit (Illumina), and then sequenced using Hiseq 2500 (Illumina). Each library was sequenced twice on two lanes.

Single-Cell qPCR
Individual primer sets (total of 96) were pooled to a final concentration of 0.1 mM for each primer. Individual cells were sorted directly into 96-well PCR plates loaded with 5 ml RT-PCR master mix (2.5 ml CellsDirect reaction mix, Invitrogen; 0.5 ml primer pool; 0.1 ml RT/Taq enzyme, Invitrogen; 1.9 ml nuclease-free water) in each well. Sorted plates were immediately frozen on dry ice. After brief centrifugation at 4 C, the plates were placed immediately on PCR machine. Cell lyses and sequence-specific reverse transcription were performed at 50 C for 60 min. Then, reverse transcriptase inactivation and Taq polymerase activation was achieved by heating to 95 C for 3 min. Subsequently, in the same tube, cDNA was subjected to 20 cycles of sequence-specific amplification by denaturing at 95 C for 15 s, annealing, and elongation at 60 C for 15 min. After preamplification, PCR plates were stored at À80 C to avoid evaporation. Pre-amplified products were diluted 5-fold prior to analysis. Amplified singlecell samples were analyzed with Universal PCR Master Mix (Applied Biosystems), EvaGreen Binding Dye (Biotium), and individual qPCR primers using 96.96 Dynamic Arrays on a BioMark system (Fluidigm). Ct values were calculated using the BioMark Real-Time PCR Analysis software (Fluidigm).

ACCESSION NUMBERS
The accession number for all RNA-seq data reported in this paper is GEO: GSE75804.