Mass cytometry-based single-cell analysis of human stem cell reprogramming uncovers differential regulation of specific pluripotency markers

Human induced pluripotent stem cells (hiPSCs) are reprogrammed from somatic cells and are regarded as promising cell sources for regenerative medicine and disease research. Recently, techniques for analyses of individual cells, such as single-cell RNA-sequencing and mass cytometry, have been used to understand the stem-cell reprogramming process in the mouse. However, the reprogramming process in hiPSCs remains poorly understood. Here, we used mass cytometry to analyze the expression of pluripotency and cell cycle markers in the reprogramming of human stem cells. We confirmed that during the reprogramming, the main cell population was shifted to intermediate population consisting of neither fibroblasts nor hiPSCs. Detailed population analyses, using computational approaches, including dimensional reduction by spanning-tree progression analysis of density-normalized events (SPADE), PhenoGraph, and diffusion mapping, revealed several distinct cell clusters representing the cells along the reprogramming route. Interestingly, correlation analysis of various markers in hiPSCs revealed that the pluripotency marker TRA-1-60 behaves in different pattern from other pluripotency markers. Furthermore, we found that the expression pattern of another pluripotency marker, octamer-binding protein 4 (OCT4), was distinctive in the pHistone-H3 high population (M phase) of the cell cycle. To the best of our knowledge, this is the first mass cytometry-based investigation into human reprogramming and pluripotency. Our analysis elucidates several aspects of hiPSC reprogramming, including several intermediate cell clusters active during the process of reprogramming and distinctive marker expression patterns in hiPSCs.

that the expression pattern of another pluripotency marker, octamer-binding protein 4 (OCT4), was distinctive in the pHistone-H3 high population (M phase) of the cell cycle. To the best of our knowledge, this is the first mass cytometrybased investigation into human reprogramming and pluripotency. Our analysis elucidates several aspects of hiPSC reprogramming, including several intermediate cell clusters active during the process of reprogramming and distinctive marker expression patterns in hiPSCs.
Since Yamanaka and others developed induced pluripotent stem cells (iPSCs) in mouse and human (1)(2)(3)(4), there have been many efforts to understand the reprogramming process induced and coordinated by defined transcription factors, such as OCT4, SOX2, c-MYC, and KLF4 (5)(6)(7)(8)(9)(10)(11)(12)(13)(14). However, research tools from traditional molecular biology and biochemistry are not sufficient for simultaneously examining the complex dynamics of the many factors involved in the reprogramming process. Recently, mass cytometry, a unique technique incorporating mass spectrometry and flow cytometry using element isotope-conjugated antibodies, has provided an opportunity to monitor more than forty factors at once with minimal spectral overlap compared to that of fluorescence-based cytometry (15). This powerful tool is now at the forefront of several fields, such as research on the hematopoietic system (16)(17)(18)(19)(20), cancer (21- 24), and stem cells (23,24).
A need for analytical tools that can be applied to the complex and large data sets acquired by mass cytometry has led to the development of several computational tools able to organize, analyze, and visualize such data. Dimensional reduction is the most competent concept for simplifying and visualizing multidimensional parameters in a two-or threedimensional field. Spanning-tree progression analysis of density-normalized events (SPADE) was the first clustering algorithm developed for mass cytometry (25). SPADE visualizes the distribution of populations in a tree-like structure according to their marker expression patterns. SPADE also has the ability to detect small populations by applying down-sampling prior to clustering and to group cell populations according to their characteristics. viSNE is another algorithm for visualizing multi-dimensional data in a two-dimensional field based on t-distributed stochastic neighbor embedding (t-SNE) (26) dimensional reduction, developed in 2013 by Amir et al (27). PhenoGraph, another clustering tool, partitions the cells in a sample into several clusters on a two-dimensional plane to identify phenotypic subpopulations easily (18). Metaclustering combined with PhenoGraph enables the phenotypic clustering of several samples into a small number of representative groups. Recently, Cytofkit, based on R, has been developed to support the analysis of mass cytometry data in various ways with a graphical user interface (28). It supports clustering based on various methods, such as ClusterX (28), PhenoGraph, and DensVM (29), followed by visualization based on several methods, such as principal component analysis (PCA), t-SNE, and Isomap (30), as well as diffusion map (31), which can track differentiating cells. These methods show the high-dimensional complex heterogeneity of each population on a comprehensible two-dimensional plane.
In the mouse reprogramming system, a few studies have used mass cytometry to investigate the reprogramming process. First, mass cytometry was used for the identification of early reprogramming regulators and markers by comparison of fully and partially reprogrammed murine iPSC lines (23). The behavior of multiple markers during the reprogramming process was analyzed simultaneously using SPADE. By applying time-resolved progression analysis to mass cytometry, Zunder et al showed the dynamics that occur during reprogramming, including the time-dependent emergence of various populations and cell cycle changes in three different fibroblast reprogramming systems (24). These studies on the murine reprogramming system using mass cytometry greatly improved our understanding of the phases of the reprogramming process at the single-cell level. Thus, this promising technology is expected to provide the opportunity to elucidate human reprogramming and pluripotency at the singlecell level.
A distinctive cell cycle progression is one of the well-known properties of pluripotent stem cells. They exhibit a faster cell cycle with a shortened G1 and elongated S phase compared with those in somatic cells (24,32). This phenomenon is caused by the involvement of pluripotency markers, which aid in the rapid progression through the G1/S checkpoint. To date, the contribution of many components expressed by pluripotent stem cells, such as OCT4, SOX2, NANOG, and c-MYC, to progression through the G1/S checkpoint is already known (33). However, multi-dimensional study of the representative pluripotency and cell cycle markers and examination of their relationships have not been reported yet.
In this study, we analyzed the expression of pluripotency and cell cycle markers in the reprogramming process of human iPSCs (hiPSCs) using mass cytometry. Subsequent computational analyses, such as statistical correlation and dimensional reduction by SPADE, PhenoGraph, and diffusion map, were applied to investigate the reprogramming process and the relationship among the various pluripotency and cell cycle markers.

Population changes during reprogramming
Since the viral infection is known to affect the cell cycle checkpoints of the host cells (34)(35)(36), we utilized episomal vector system for hiPSC reprogramming to investigate the cell cycle and pluripotency during reprogramming process to minimize the interference of viral infection, despite of its relatively lower efficiency than those of methods using viral vectors (37,38). To select sampling timepoints, we first analyzed the reprogramming process at day 20 and 30. Although we observed few individual cells expressing OCT4 and NANOG at day 20, fully reprogrammed hiPSC colonies were found at around day 30 (Fig. S1a). The number of alkaline phosphatase (AP)-positive colonies were increased along reprogramming process, and reached about fifty (~0.05% efficiency) at day 30 ( Fig. S1b), which is higher than previous studies with episomal reprogramming (39,40). These results indicate that reprogramming entered late stage at day 20 and completed at day 30.
To investigate the early and late stage of reprogramming, we decided to analyze day 10 and 20 of reprogramming process as well as human fibroblasts and hiPSCs as negative and positive controls, respectively, using mass cytometry (Fig. S1c). Samples were analyzed by Helios™ mass cytometer for 10 markers: OCT4, SOX2, NANOG, TRA-1-60, c-MYC, CD44, phosphor-pRB (pRB), Cyclin B1, Ki67, and phosphor-histone H3 (pHistone H3). Archyperbolic sine transformation and sequential gating isolated viable singlet cells for subsequent analysis (Fig. S1d). We obtained a total of 1,153,971 cells (299,520 fibroblasts, 462,978 cells on reprogramming day 10, 147,030 cells on reprogramming day 20, and 244,442 hiPSCs) for analysis. SPADE analysis clustered by pluripotency markers (OCT4, SOX2, NANOG, and TRA-1-60) and fibroblast marker CD44 revealed relatively concentrated populations in stage-specific clusters at each stage (Fig. 1a). Many reports have suggested the presence of intermediate populations that appear during the reprogramming process and do not share the characteristics of the origin cells or iPSCs (4,23,24,41,42). Therefore, we grouped the cells into three clusters: a fibroblast-like population, which was mainly found in the human fibroblast sample; an iPSC-like population, which was enriched in hiPSCs; and an intermediate-like population, which was not included in the other two populations. The intermediate-like population increased over the course of the reprogramming process (50-60%), while the fibroblast-like and iPSC-like populations accounted for about 82% and 99% of fibroblasts and hiPSCs, respectively, implying a population shift from fibroblasts to hiPSCs during reprogramming (Fig. 1b, Table S1). This phenomenon was also observed in the PCA plot (Fig. 1c). As reprogramming proceeded, the populations shifted toward the lower right region of the PCA plot, which encompassed the hiPSC population (Fig. 1d). This reflected a decrease in CD44 expression and increase in the expression of pluripotency markers and some cell cycle signatures, such as pRB, Cyclin B1, and Ki67 (Fig. S1e).

Phenotypic clustering of cells to identify distinctive populations in reprogramming
To examine which populations emerged during the course of reprogramming from fibroblasts to hiPSCs, we employed PhenoGraph, which is an advanced t-SNE algorithm that clusters single cells according to their marker expression patterns. Using PhenoGraph, each sample was clustered and visualized on a 2D plane with two bh-SNE axes ( Fig. S2a-d, upper panel) (43) based on its marker expression patterns ( Fig. S2a-d, lower panel). Since the expression levels of the markers in PhenoGraph heatmaps are each shown as a relative value from 0 to 1 separately, a total of 44 clusters representing expression changes occurring during the reprogramming process was presented in a single heatmap (Fig. S2e). Based on this, metaclustering was performed to investigate the clusters by their phenotypes throughout the reprogramming process. As a result, the 44 clusters were clustered again into 5 metaclusters (Fig. 2a) and shown on the t-SNE map with colors indicating the samples and metaclusters ( Fig. 2b  and 2c).
Metacluster 1 showed a fibroblast-like phenotype with high expression of CD44 that was observed in fibroblasts and reprogramming cells on days 10 and 20 ( Fig. 2b and 2c). Metacluster 2 exhibited a pluripotent cell-like phenotype with high expression of pluripotency markers, such as TRA-1-60, SOX2, OCT4, and NANOG, and no expression of fibroblast marker CD44. Consistent with this, most clusters classified into metacluster 2 were found in the hiPSC sample ( Fig. 2b and  2c). Interestingly, metaclusters 3 and 5 were mostly found in cells during the reprogramming process ( Fig. 2b and 2c). Metacluster 3 exhibited low expression of most of markers except CD44, implying that the cells in this metacluster were in a non-proliferating state based on their lack of cell cycle marker expression. Some cells in the fibroblast sample were also included in this cluster, indicating that these cells were unlikely to achieve pluripotency. Metacluster 5 exhibited further reduction in CD44 expression and a slight increase in SOX2 expression ( Fig. 2b and 2c, Fig.  S2f and S2g). In particular, the three clusters (clusters 12, 23, and 24) in metacluster 5 consisted of cells from reprogramming days 10 and 20 only ( Fig. 2b and 2c), which may imply the establishment of reprogramming-prone populations by successful expression of reprogramming factors. It has been reported that SOX2 is the first factor expressed during the hierarchical phase of reprogramming after the stochastic phase and that it plays an indispensable role in the activation of the pluripotency circuit (7). By contrast, metacluster 4 contained small number of cells expressing various markers such as SOX2, Cyclin B1, OCT4, pRB, NANOG, CD44, pHistone H3, and c-MYC. These cells may be mixture of reprogrammed cells; however, fibroblasts that were not transfected with reprogramming factors were also contained within this population ( Fig. 2b and 2c, Fig. S2a and S3e). This suggests that the clusters contained in metacluster 4 should be considered outliers as a result of non-specific antibody binding. Based on these results, we could identify the appearance of various cell populations during the reprogramming process and classify them into several groups, including cells on the reprogramming process.

Reprogramming aspects identified by diffusion map analysis
For more detailed analysis of reprogramming, we employed Cytofkit, a multifunctional clustering and visualization tool for mass cytometry (28). All samples were clustered into 30 sub-clusters on the same t-SNE plane using the PhenoGraph algorithm (Fig. 3a). Similar to the earlier results, samples from reprogramming days 10 and 20 occupied positions on the plane between those of fibroblasts and hiPSCs (Fig. 3b). For simplified examination of the reprogramming process based on clustering, clusters were classified into three groups according to their expression changes during reprogramming: fibroblast-enriched, reprogramming-enriched, and hiPSC-enriched clusters (Fig. 3c, Fig. S3a, Table S2). As a result, 9, 14, and 6 clusters were found to belong to each group, respectively (Fig.  S3b). The fibroblast-enriched clusters clearly showed high expression of CD44, while the reprogramming-enriched clusters showed reduced expression of CD44 and heterogeneous expression of other markers (Fig. S3c).
Using these clusters, we examined the reprogramming process using diffusion map, which was developed for single-cell analysis of differentiation data (31), using Cytofkit. Each cluster from Fig. 3a was displayed with the same color in the diffusion map and highlighted separately by each population-enriched cluster (Fig. 3d). The fibroblast-enriched population was located in the upper area of the diffusion map, and as reprogramming proceeded, the placement of each cluster moved downwards. In reprogramming-enriched populations, two major populations were observed, in addition to a few cells falling near the hiPSC-enriched population indicated by red circle in left lower panel of Fig.  3D. According to the expression pattern of each marker along the diffusion map axes (Fig. S3d), the two reprogramming-enriched populations, separated to distal side of diffusion map component 1, showed slight differences in the expression of markers. However, positions of most cells in those two populations on the diffusion map plane were relatively distant from that of the hiPSCs. Thus, we focused on the few cells observed near the hiPSC-enriched population. Although these cells were derived from various clusters, most belonged to clusters 18 and 20 (Fig. S3e, left panel). These two clusters occupied the position nearest the hiPSCs in both the PCA plot and t-SNE map (Fig. S3e, middle and right panels, respectively), which implies that they exhibit the most similar phenotypes to hiPSCs among the cells in the reprogramming process. Given the expression pattern of metacluster 5 in the previous analysis (Fig. 2a), cluster 18 likely contains reprogrammed cells. Based on these results, we speculate that at least SOX2 and perhaps pHistone-H3 expression is necessary for reprogramming

Relationship of pluripotency and cell cycle markers in hiPSCs
After necessary and in-depth classification by marker expressions, we sought to examine the relationships among markers. viSNE was performed with CD44 and the pluripotency markers OCT4, SOX2, NANOG, and TRA-1-60 (Fig. S4a). The distributions of cells indicated that fibroblasts and cells in the reprogramming process shared a high proportion of the population while iPSCs were totally distinct from other samples due to their high expression of pluripotency markers, as expected. In terms of cell cycle markers, most of the markers showed similar gradient pattern, implying high correlations among those markers. To quantify the correlation between each pair of markers, Spearman's rank correlations were calculated (Table S3). Hierarchical clustering of correlations among markers within each sample revealed a highly conserved correlation between c-MYC and pHistone H3 (Fig. 4a). This result is consistent with previous reports showing that pHistone H3 is a functional downstream target of c-MYC (44). In addition, hiPSCs exhibited strong correlations among pRB, Ki67, and OCT4 as well (Fig. 4a).
Since strong correlations among the markers were observed only in hiPSCs (likely because of the lack of pluripotency markers in other samples), we focused on hiPSCs for further analysis. For a more accurate calculation of correlations between markers, we sought to exclude cells that spontaneously differentiated during culture and those abnormally unmarked by antibodies; we therefore identified 82,103 undifferentiated hiPSCs that expressed the pluripotency markers OCT4, SOX2, NANOG, and TRA-1-60 at levels higher than 95% of fibroblasts (OSNT-gated population). Gated cells showed similar correlations between each pair of markers as the total hiPSC sample ( Fig. 4a and Fig. S4b). Gating was also used to remove nonproliferating cells expressing low levels of cell cycle markers, such as Ki67, pRB, and pHistone-H3 (Fig. S4c). In the resulting sample, we revisited the relationships among pluripotency markers. Unexpectedly, TRA-1-60 appeared to behave differently from the other pluripotency markers, with weak correlations with other markers in both the viable and OSNT-gated populations ( Fig. 4a and Fig. S4b). Since TRA-1-60 expression in the OSNT-gated population revealed two distinct populations, TRA-1-60 low and TRA-1-60 high (Fig. S4d-e), we analyzed the expression level of each marker in each of these populations (Fig. 4b). Interestingly, pluripotency and cell cycle markers showed similar expression pattern between the TRA-1-60 low and the TRA-1-60 high populations. This result suggests the possibility that TRA-1-60, known as a critical marker for fully reprogrammed state (45), is independent of the core pluripotency circuit consisting of OCT4, SOX2, and NANOG in hiPSCs.
Next, we classified OSNT-gated cells into nine subgroups according to the expression levels of OCT4 and NANOG (Fig. 4c). SOX2 and TRA-1-60 were excluded due to their uniform expression in hiPSCs and weak correlations with other markers, respectively ( Fig. 4a-b). OCT4 and NANOG expression was divided into high, medium, and low (Fig. S4d-e). After removing subgroups accounting for less than 1% of cells (Fig. 4c, gray text), six subgroups were further analyzed. The expression of most markers decreased as OCT4 and NANOG expression decreased, with the exception of cell cycle markers in the OCT4 low and NANOG mid (LM) population (Fig. 4d). It is generally known that pluripotency and cell cycle progression are tightly linked and regulate each other (24,32,33,46). In this analysis, however, the LM population exhibited high expression of cell cycle markers, especially pHistone-H3. Since pHistone-H3 is a well-known marker expressed during mitosis or meiosis (47), this result implies a distinctive behavior of pluripotency markers during the M phase of the cell cycle. We next compared marker expression levels in M phase (pHistone-H3 high population) with those in other populations (pHistone-H3 low population) in hiPSCs (Fig. S4f). Interestingly, OCT4 showed the opposite expression pattern from most markers, including cell cycle markers and other pluripotency markers such as SOX2 and NANOG, which were highly expressed in the pHistone-H3 high population (Fig.  4e). OCT4 expression level in pHistone-H3 low population was about 2-fold higher than that in pHistone-H3 high population (Fig. S4g). This result implies the distinct regulation of OCT4 during cell cycle progression in the pluripotent state. Taken together, our results indicate that using the in-depth single cell analysis with various pluripotency and cell cycle markers, we were able to precisely look into the relationships between markers and discover unexpected regulatory aspects of TRA-1-60 and OCT4 in regard to pluripotency and the cell cycle, respectively, in hiPSCs.

Discussion
Compared to traditional omics studies, single-cell analyses provide ultimate resolution for investigations into complex processes with heterogeneous populations such as iPSC reprograming. Many tools have been developed for single-cell analyses of cellular component expression, with single-cell RNA-sequencing and mass cytometry among the most popular tools for the analysis of mRNA and epitope expression levels, respectively. Single-cell RNA-sequencing enables the acquisition of the entire transcriptome of each cell. However, the number of cells that can be analyzed is restricted by cost and technical limitations. Although mass cytometry allows the examination of fewer markers than single-cell RNA-sequencing, the number of markers included is greater than for any other fluorescent dye-labeled antibody-based cytometry. Mass cytometry also has the advantages of allowing the analysis of an unlimited number of cells and the detection of epitopes that cannot be analyzed using mRNA sequences, such as analysis of proteins that undergo post-translational modification, polysaccharides, or lipids. Here, we analyzed human somatic reprogramming using mass cytometry, as we needed to acquire a large number of cells due to the low efficiency of human reprogramming. As this new technology has not yet been employed to examine the human reprogramming process, we analyzed reprogramming from the macroscopic to microscopic level by investigating population shifts, the emergence of various cell populations, the reprogramming route, and the relationships among pluripotency and cell cycle markers. Based on these results, we provide new insights into pluripotency and cell cycle regulation regarding TRA-1-60 and pHistone-H3, which cannot be analyzed in the conventional RNAsequencing.
In our analysis, pluripotency and cell cycle markers such as pRB, Ki67, c-MYC, OCT4, NANOG, and pHistone-H3 were clustered based on their strong correlations in hiPSCs. Strong correlations between several pluripotency and cell cycle markers in hiPSCs implies that the proliferation of iPSCs may be strongly associated with the expression of pluripotency markers as known to date (24,32,33,46,48). Interestingly, two pluripotency markers, SOX2 and TRA-1-60, did not exhibit strong correlations, even with OCT4 and NANOG. SOX2 was similarly expressed in all cells, while TRA-1-60 showed differential expression and was not correlated to any other pluripotency marker expression in hiPSCs. TRA-1-60 epitope is a sialylated keratan sulfate proteoglycan that is expressed on podocalyxin in some pluripotent cells including embryonic carcinoma cells as well as pluripotent stem cells (49)(50)(51). However, its regulatory mechanism and relationship with other pluripotency network components remain unclear. It is known that molecular mass of podocalyxin is changed and the epitope is no longer detected by TRA-1-60 antibody along with differentiation (51). Knock-out of podocalyxin in human embryonic stem cells results in decreases of TRA-1-60, but does not disrupt maintenance of pluripotency (52). As other pluripotent stem cellspecific glycosphingolipid surface antigens, i.e., stage-specific embryonic antigen (SSEA)-3 and -4, are dispensable for maintaining pluripotency and differentiation in vitro and in vivo (53), we assume that TRA-1-60 also may be a marker for human pluripotent stem cells and independent of core pluripotency network.
The expression levels of OCT4 and NANOG tended to be correlated with those of cell cycle markers. Intriguingly, we found that the OCT4 low NANOG mid (LM) population showed high levels of cell cycle markers, especially the M-phase marker pHistone-H3. In addition, OCT4 showed the opposite trend compared to the other markers. This result implies that the expression of some pluripotency markers may not be constant but oscillate during cell cycle progression. It was reported that the expression of pluripotency marker genes such as NANOG, TERF1, PRDM14, and TDGF1 may increase during the cell cycle, mainly at S-G2 phase, to maintain pluripotency (46). It makes sense for pluripotency markers to be observed at high levels in S-G2 phase because many pluripotency markers are involved in the regulation of the G1/S checkpoint for rapid cell cycle progression in hiPSCs (33). Another study reported that differentiation potential of hiPSCs varies at each stage of cell cycle, and is especially restricted at M-phase (54). This could be a clue for the relationship between OCT4 decrease and pHistone-H3 increases in terms of restricted differentiation and chromosome condensation during mitotic phases. Thus, further study is needed to clarify this possibility on the nature of pluripotency markers during cell cycle progression.
In this study, we first analyzed the human reprogramming process and pluripotency in terms of pluripotency and cell cycle markers using mass cytometry. Our results showed that mass cytometry can reproduce previous results as well as produce new findings, demonstrating its reliability and effectiveness, respectively. Mass cytometry and its analytical tools are helpful for understanding multidimensional aspects of biological process by analyzing multiple combinations of various markers in which posttranscriptional or post-translational markers are included. This enables laborious and timeconsuming analyses with numerous markers to be replaced with a single experiment. Our research therefore suggests that mass cytometry is an invaluable tool for complex research involving the reprogramming process, pluripotency, and other stem cell phenomena, as well as the phenotyping of heterogeneity in cancer or other diseases.

Cell culture and reprogramming human fibroblasts into hiPSCs
Human CRL-2097 fibroblasts were cultured in Dulbecco modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum, 1% non-essential amino acids, and 1% sodium pyruvate (Thermo Fisher Scientific, Waltham, MA, USA). A hiPSC line derived from CRL-2097 (ATCC, VA, USA) was maintained in DMEM/F12 supplemented with 20% KnockOut™ Serum Replacement (Thermo Fisher Scientific) and 20 ng/ml basic fibroblast growth factor (bFGF; R&D Systems, Minneapolis, MN, USA). For cellular reprogramming, CRL-2097 cells were transfected with three episomal expression vectors, pCXLE-OCT3/4-shP53, pCXLE-L-myc-Lin28, and pCXLE-SOX2-KLF4 (Addgene, Cambridge, MA, USA), with the Neon™ transfection system and transfection kit (Thermo Fisher Scientific) according to the manufacturer's instructions. Electroporation conditions were as follows: pulse voltage: 1650, pulse width: 10, and pulse number: 3. For reprogramming, transfected cells were plated on the culture dish at 10,000 cells/cm 2 density and cultured with hiPSC medium-based conditioned medium from a culture of gammairradiated mouse embryonic fibroblast feeder cells in hiPSC medium supplemented with 20 ng/ml bFGF, 1 mM nicotinamide, and 10 mM Y-27632 (Tocris, Bristol, UK). The medium was changed every two days. Samples were collected on days 10 and 20 after transfection. As a negative and positive control, CRL-2097 fibroblasts and their derivative iPSC lines, respectively, were also collected.

Alkaline phosphatase analysis
As previously described (55), samples were washed once with phosphate-buffered saline (PBS; Welgene, Daegu, Korea) and fixed with 4% formaldehyde solution (Sigma-Aldrich, St. Louis, MO, USA) for 30 sec, followed by staining with AP staining solution for 15 min in the dark. The number of AP-stained colonies was counted at day 20 and 30 of reprogramming, respectively, to check the reprogramming efficiency.

Immunofluorescence analysis
Samples were washed once with PBS (Welgene) and fixed with 4% formaldehyde solution (Sigma-Aldrich) for 10 min, followed by three washes with PBS. Blocking and permeabilization were done with 3% BSA and 0.3% Triton X-100 solution in PBS for 1 h. Primary antibodies were diluted in 1% BSA and incubated overnight at 4 °C. After 1 h of washing, samples were incubated with secondary antibodies conjugated with Alexa-488 or Alexa-594 (Invitrogen, Carlsbad, CA, USA) for 1 h. Nuclei were stained with Hoechst 33342 diluted in 0.1% BSA. All procedures were performed at room temperature unless indicated otherwise.

Computational analysis of mass cytometry data
For data analysis, normalized raw FCS files were normalized and debarcoded using Helios™ pre-installed software prior to analysis using the web-based tool Cytobank, R-based Cytofkit (28), and Matlab-based open source analytic tool CYT (27,56). The FCS files were transformed using an inverse hyperbolic sine (arcsinh) function with a cofactor of 5 and pregated manually to exclude EQ beads, cell debris, cell doublets, and dead cells (Fig. S1d). SPADE and viSNE analyses were performed via Cytobank. SPADE analysis was performed with 50 nodes and 1% target down-sampled events. PhenoGraph and metaclustering were done on gated cells (3,500 cells) (18). The number of neighbors (k parameter) used in PhenoGraph clustering was set to 30, the default number in the CYT program. For metaclustering, the k parameter was set to 8. All data representing clusters in each sample were expressed as heatmaps and Barnes-Hut stochastic neighbor embedding (bh-SNE) fields. Cytofkit was used for cell clustering with the PhenoGraph method, PCA, and analysis of the reprogramming route by diffusion map performed in Fig. 3. The default settings of Cytofkit were used for the analyses.

Statistical analysis
Statistical analysis and visualization of graphs were performed with Microsoft Excel (Microsoft, Redmond, WA, USA) and R. The calculation of Spearman's correlation coefficients and visualization of hierarchy clustering were performed using R.