Label-Free Imaging to Track Reprogramming of Human Somatic Cells

The process of reprogramming patient samples to human-induced pluripotent stem cells (iPSCs) is stochastic, asynchronous, and inefficient, leading to a heterogeneous population of cells. In this study, we track the reprogramming status of patient-derived erythroid progenitor cells (EPCs) at the single-cell level during reprogramming with label-free live-cell imaging of cellular metabolism and nuclear morphometry to identify high-quality iPSCs. EPCs isolated from human peripheral blood of three donors were used for our proof-of-principle study. We found distinct patterns of autofluorescence lifetime for the reduced form of nicotinamide adenine dinucleotide (phosphate) and flavin adenine dinucleotide during reprogramming. Random forest models classified iPSCs with ∼95% accuracy, which enabled the successful isolation of iPSC lines from reprogramming cultures. Reprogramming trajectories resolved at the single-cell level indicated significant reprogramming heterogeneity along different branches of cell states. This combination of micropatterning, autofluorescence imaging, and machine learning provides a unique, real-time, and nondestructive method to assess the quality of iPSCs in a biomanufacturing process, which could have downstream impacts in regenerative medicine, cell/gene therapy, and disease modeling.

T he derivation of donor-specific induced pluripotent stem cells (iPSCs) from somatic cells through reprogramming generates a unique self-renewing cell source for disease modeling, drug discovery, toxicology, and personalized cell therapies. [1][2][3] These cells carry the donor's genome, facilitating elucidation of the genetic causes of disease, and are immunologically matched to the donor, facilitating the engraftment of cells derived from iPSCs. [4][5][6] With several clinical trials underway, 7 there has been significant progress in developing iPSC-based cell therapies in recent years. However, several challenges remain in the field. 8 First, the derivation of high-quality iPSCs must be efficient, rapid, and costeffective to ensure that patients receive their treatments in a timely manner with autologous iPSC-derived products. Second, reprogramming to make iPSCs for both allogeneic and autologous cell therapy could benefit from standardized manufacturing processes to overcome the inconsistencies arising from variability in human material sources, reagents, delivery of reprogramming factors, microenvironmental fluctuations, or inherent stochasticity in epigenetic processes underpinning reprogramming. 9 Typical assays currently used for quality control of good manufacturing practices (GMPs)-grade iPSCs include testing for cell line identity (short tandem repeat analysis, single nucleotide polymorphism analysis, genomic sequencing), genomic instability (G-banding, chromosomal microarray, NanoString technology), pluripotency (marker expression analysis through flow cytometry or immunocytochemistry, embryoid body analysis, teratoma assays, Pluritestä, TaqMan Scorecardä Assay), and residual expression of reprogramming factors (polymerase chain reaction or immunocytochemistry). [10][11][12][13] Each of these methods can be low throughput, labor intensive, time consuming, and require destructive processing.
Nondestructive strategies such as automated machine learning can be used to identify various cell types and structures in cell cultures from bright-field images. [14][15][16][17][18] However, such automated methods to identify iPSCs, in particular, have had limited success in the field. [19][20][21] Deep learning has recently been employed to analyze monoclonal cell cultures of established iPSC lines, 22 but reprogramming cultures involve a higher number of cell fate transitions that have yet to be analyzed through deep learning pipelines. Hence, in complex cultures, like those in reprogramming, new standardized platforms with robust analytical methods for identifying high-quality iPSCs are still needed.
Our strategy to identify iPSCs exploits metabolic and nuclear changes during reprogramming. Somatic cells primarily utilize mitochondrial oxidative phosphorylation (OXPHOS) to support cell proliferation, 23 while pluripotent stem cells favor glycolysis in a manner reminiscent of the Warburg effect in cancer cells. 23,24 During reprogramming, somatic cells thus undergo a metabolic shift from OXPHOS to glycolysis, 25,26 accompanied by a transient OXPHOS burst, resulting in the initiation and progression of reprogramming. [27][28][29] Recent evidence also indicates that this metabolic shift occurs before changes in gene expression and that the modulation of glycolytic metabolism or OXPHOS alters reprogramming efficiency. 24,30,31 High-resolution imaging of reprogramming cells has also identified that nuclear geometry is dramatically altered during reprogramming. [32][33][34] Therefore, simultaneous monitoring metabolic and nuclear changes during reprogramming could identify various cell states within reprogramming cultures.
Optical metabolic imaging (OMI) is a noninvasive and labelfree two-photon microscopy technique that provides dynamic measurements of cellular metabolism at a single-cell level. OMI is based on the endogenous fluorescence of metabolic coenzymes, reduced form of nicotinamide adenine dinucleotide (NADH) and flavin adenine dinucleotide (FAD), 35 that are both used across several cellular metabolic processes. NADH and reduced form of nicotinamide adenine dinucleotide phosphate (NADPH) have overlapping fluorescence properties and are collectively referred to as reduced form of nicotinamide adenine dinucleotide (phosphate) (NAD(P)H). 36 The optical redox ratio, defined as the ratio of NAD(P)H intensity to the sum of NAD(P)H and FAD intensity, provides a measure of the relative oxidation-reduction state of the cell [I NAD(P)H /(I NAD(P)H + I FAD )]. 37,38 Fluorescence lifetime imaging microscopy of NAD(P)H and FAD provides additional information specific to protein binding activity. The two-component decays of NAD(P)H and FAD measure the short (s 1 ) and long (s 2 ) fluorescence lifetimes that correspond to the free or bound states of these coenzymes, [39][40][41] along with fractional contributions of short (a 1 ) and long (a 2 ) lifetimes. Since NAD(P)H and FAD are found predominantly in the cytoplasm, the lack of fluorescence signal in images can also be used to identify cell borders and nuclei. 42 Thus, OMI provides multiple readouts for cell metabolism and nuclear morphometry to track metabolic and nuclear changes of cells undergoing reprogramming.
In this study, we address some of the challenges associated with the biomanufacturing of iPSCs by developing a microcontact printed (lCP) platform 34,43,44 to noninvasively monitor metabolic and nuclear changes over 22 days of reprogramming of human erythroid progenitor cells (EPCs) to iPSCs. We demonstrate that OMI is sensitive to the metabolic and nuclear differences during reprogramming, provide accurate identification of reprogramming status of cells using machine learning models, and subsequently build reprogramming trajectories at the single-cell level. 45 Our label-free, nondestructive, rapid, and scalable method to track reprogramming provides novel insights into human cell reprogramming and could enable the development of new technologies for biomanufacturing high-quality iPSCs.

Reprogramming on patterned substrates
We first designed a lCP substrate to spatially control the adhesion of EPCs undergoing reprogramming. 34,43,46 The lCP substrate is formed by coating circular regions of 300 lm radius, referred to as lFeatures, with Matrigel on a 35-mm ibiTreat dish that allows for cell adhesion. The remaining regions of the dish are then backfilled with polycationic graft copolymer, poly(L-lysine)-graftpoly(ethylene glycol) (PLL-g-PEG), that resists protein adsorption and prevents cell adhesion in these regions ( Supplementary  Fig. S1A). 47,48 The ibiTreat dishes are made of gas-permeable material, enabling maintenance of carbon dioxide or oxygen exchange during cell culture and have high optical quality.
These properties make the dishes suitable for two-photon microscopy during reprogramming. To verify proper coating of the circular lFeature regions, we immunostained for laminin, a Single-cell quantitative analysis of (D) metabolic parameters: optical redox ratio, NAD(P)H s m , FAD s m ; (E) nuclear parameters: area, perimeter, and mean radius (n = 561, 990, and 586 for EPC, IM, and iPSC, respectively). Data are presented as median with interquartile range for each cell type. Statistical significance was determined by one-way ANOVA using the Kruskal-Wallis test for multiple comparisons; ns for p ‡ 0.05, * for p < 0.05, ** for p < 0.01, *** for p < 0.001, **** for p < 0.0001. ANOVA, analysis of variance; EPC, erythroid progenitor cells; FAD, flavin adenine dinucleotide; IM, intermediate; iPSC, induced pluripotent stem cell; NADH, reduced form of nicotinamide adenine dinucleotide; NAD(P)H, reduced form of nicotinamide adenine dinucleotide (phosphate); ns, nonsignificant.

LABEL-FREE IMAGING TO TRACK REPROGRAMMING 177
major component of Matrigel. 49 Fluorescence imaging showed laminin consistently within the circular lFeatures indicating uniform patterning of Matrigel ( Fig. 1A; left). We next assessed the ability of the lCP substrates to enable cell attachment by seeding two different cell types: human dermal fibroblasts (HDFs) and human embryonic stem cells (hESCs). We observed that both HDFs and hESCs remained viable, attached, and confined to the circular lFeatures indicating that the lCP substrates enable spatial control of cell adhesion ( Supplementary Fig. S1B). Next, peripheral blood mononuclear cells (PBMCs) were isolated from healthy human donors and further enriched for EPCs before the delivery of reprogramming factors. We examined the enrichment of EPCs by flow cytometry with erythroid cell surface marker CD71. 50 Flow cytometry confirmed the presence of enriched EPCs showing that >98% of the cells expressed CD71 on day 10 of PBMC culture ( Supplementary Fig. S1C).
To initiate reprogramming, we electroporated EPCs with four episomal reprogramming plasmids 51,52 -encoding OCT4, shRNA knockdown of p53, SOX2, KLF4, L-MYC, LIN28, and miR302-367 cluster-and seeded them onto lCP substrates. We assessed the ability of the lCP substrates to sustain longterm reprogramming studies by performing high-content imaging to track individual lFeatures (>30 lFeatures per 35-mm dish) longitudinally at multiple time points over the *3 weeks of reprogramming time course. Day 22 was chosen as the reprogramming endpoint because there were several lFeatures with at least one iPSC colony at this time point without significant outgrowth beyond the boundaries of the lFeatures.
Although the starting EPCs (day 0) are nonadherent, EPCs undergoing reprogramming start adhering to the lCP substrates on day 8. Cells in the middle of reprogramming (day 8-day 17) that lack EPC markers or iPSC markers are broadly termed as intermediates (IMs), consistent with prior nomenclature in the field. 53 Moreover, endpoint iPSCs on day 22 remain adhered to the lCP substrates within each circular lFeature ( Fig. 1A; right), indicating that lCP substrates can support the full reprogramming of EPCs. Overall, the lCP platform provides unique spatial control over cells and enables high-content quantitative imaging of reprogramming.

OMI reveals metabolic states during reprogramming
Cellular metabolism plays an important role in regulating reprogramming and pluripotency of iPSCs, [54][55][56][57][58] and can be noninvasively monitored through OMI. NAD(P)H is an electron donor and FAD is an electron acceptor. Both are present in all cells as coenzymes and provide energy for metabolic reactions. For example, glycolysis in the cytoplasm generates NADH and pyruvate, whereas OXPHOS consumes NADH and produces FAD ( Fig. 1B; right). Autofluorescence imaging of NAD(P)H and FAD can thus detect the oxidation-reduction state of a cell and is influenced by many biochemical reactions. 35,59 We tracked the autofluorescence dynamics of NAD(P)H and FAD by performing OMI on cells attached to lCP substrates at different time points during EPC reprogramming. In these images, the nucleus remains dark as NAD(P)H is primarily located in cytosol and mitochondria, and FAD is primarily located in mitochondria. The NAD(P)H images were used as inputs for an image analysis software, ilastik, 60 to identify the nuclei. The identified nuclei were then used as an input for a high-content image analysis pipeline in CellProfiler software 61 to segment the cytoplasm, and measure various metabolic and nuclear parameters ( Fig. 1B; left).
We observed a significant increase in the optical redox ratio (iPSC>IM>EPC) during reprogramming (Fig. 1C), indicating that individual EPCs are more oxidized than individual IMs and iPSCs (Fig. 1D). In addition, we noted that patterned IMs and iPSCs have significantly higher optical redox ratios than their nonpatterned counterparts ( Supplementary Fig. S2K). This observation is consistent with previous studies that show that mechanical cues can regulate their relative use of glycolysis. [62][63][64][65] Next, we observed that NAD(P)H and FAD lifetime components undergo biphasic changes during the progress of reprogramming. FAD lifetime components undergo a more significant and pronounced change relative to the NAD(P)H components ( Fig. 1D and Supplementary Fig. S2A-J). On average, the fraction of protein-bound FAD (FAD a 1 ) first decreases from its levels in EPCs to those in IMs and then increases during the IM to iPSC transition, which could be reflective of the OXPHOS burst (Supplementary Fig. S2H). 27-29 FAD s m (Fig. 1D) is inversely related to FAD a 1 and, therefore, undergoes a biphasic change that is opposite to that of FAD a 1 ( Supplementary Fig. S2H). Similar biphasic changes occur in nuclear parameters during reprogramming, which is consistent with our previous study ( Fig. 1E and Supplementary Fig. S3). 34 We compared these measurements on cells undergoing reprogramming to established cell lines and primary cell populations. Both pluripotent stem cell lines-established hESCs and iPSC lines-have similar values for most metabolic and nuclear parameters, as expected. The significant differences observed between hESCs and iPSCs in some metabolic and nuclear parameters (Supplementary Figs. S2 and S3) may be because iPSCs have not undergone any passages in contrast to the hESC line. 24 Fibroblasts from human donors (HDFs) had metabolic parameters significantly different from those of EPCs ( Supplementary  Fig. S2). This difference could be because (1) fibroblasts are adherent whereas starting EPCs are nonadherent and (2) fibroblasts and EPCs have different proliferation rates and energy needs. Taken together, autofluorescence imaging of NAD(P)H and FAD revealed significant dynamic changes for various cell populations during reprogramming. UMAP dimensionality reduction was performed on (A) all 11 metabolic and 8 nuclear parameters, (B) only 11 metabolic parameters, and (C) only 8 nuclear parameters for each cell, projected onto two-dimensional space and enables separation of different cell types (EPC, IM, and iPSC). Each color corresponds to a different cell type. Data are from three different donors. Each dot represents a single cell, and n = 561, 990, and 586 cells for EPC, IM, and iPSC, respectively. (D) Model performance of the different classifiers (random forest, simple logistic, k-nearest neighbor [IBk], naive Bayes) for iPSCs was evaluated by ROC curves using all 11 metabolic and 8 nuclear parameters. The AUC is provided for each classifier as indicated in the legend. (E) Parameter weights for random forest classification of EPCs, IMs, and iPSCs using the gain ratio method. Analysis was performed at a single-cell level using three different donors. (F) Classification accuracy with respect to number of parameters was evaluated based on the gain ratio parameter selection with the random forest model (parameters added from highest to lowest gain ratio in (E). The number of parameters included in the random forest model is indicated on the x-axis. (G) Model performance of the random forest classifier for iPSCs was evaluated by ROC curves using different metabolic and nuclear parameter combinations as labeled. AUC is provided for each parameter combination as indicated in the legend.
(H) Imaging time (left y-axis) and accuracy score (right y-axis) evaluation of the random forest classifier for different metabolic and nuclear parameter combinations as labeled. AUC, area under the curve; OMI, optical metabolic imaging; ROC, receiver operating characteristic; UMAP, Uniform Manifold Approximation and Projection.

OMI enables the identification of iPSCs with high accuracy
To visualize cell states within the entire metabolic and nuclear morphometry data set, Uniform Manifold Approximation and Projection (UMAP) 66 was employed on the multidimensional measurements already described. Neighbors were defined through the Jaccard similarity coefficient computed across the metabolic and nuclear parameters. UMAP was chosen over t-distributed stochastic neighbor embedding (t-SNE) since UMAP ( Fig. 2A) yielded more distinct clusters for two different known cell types-EPCs and iPSCs-than t-SNE ( Supplementary  Fig. S4A). Furthermore, the UMAP algorithm took less time than the t-SNE algorithm to implement with our data set. In addition, UMAP can include nonmetric distance functions while preserving the global structure of the data.
UMAP was next used to visualize subsets of the entire data set to investigate which measurements were associated with different cell states. Distinct cell populations could be derived from data sets built exclusively from the 11 metabolic parameters ( Fig. 2B) and data sets built exclusively from the 8 nuclear parameters ( Fig. 2C). Although these UMAP representations revealed some distinct clusters of EPCs, IMs, and iPSCs, the UMAP generated using both metabolic and nuclear parameters displayed less overlap of cell clusters among EPCs, IMs, and iPSCs ( Fig. 2A).
We also plotted a heatmap representation of the z-score of metabolic and nuclear parameters at the donor level (each row is the mean of a single donor and cell type) to examine heterogeneity arising from individual donors ( Supplementary  Fig. S4B). Despite some donor-to-donor heterogeneity, EPCs and iPSCs could be distinguished visually ( Fig. 2A) based on a combination of 11 metabolic and 8 nuclear parameters.
Next, classification models were developed based on 11 metabolic and 8 nuclear parameters to predict the reprogramming status of cells, that is, as either EPCs, IMs, or iPSCs. Supervised machine learning classification (naive Bayes, K-nearest neighbor) and regression algorithms (logistic regression and random forest) 67 were implemented to test the prediction accuracy for iPSCs properly when all the metabolic and nuclear parameters are used. To protect against overfitting, various classification methods were trained using 15-fold cross-validation on singlecell data from three different donors with reprogramming status assigned based on morphological characteristics. Furthermore, we tested the various classification methods on data collected from CD71 and Nanog immunofluorescence staining with the same cells from three donors (completely independent and nonoverlapping observations). Receiver operating characteristic (ROC; one-vs-rest) curves of the test data revealed highest classification accuracy for predicting iPSCs (area under the curve, AUC = 0.993), IMs (AUC = 0.993), and EPCs (AUC = 0.999) when a random forest model was used (Fig. 2D and Supplementary Fig. S4C, D). We thus used the random forest classification model for further analysis in this study.
Gain ratio analysis on the decision tree within this random forest model revealed that FAD lifetime components, FAD a 1 , FAD s 1 , and FAD s m , are the most important parameters for classifying the reprogramming status of cells (Fig. 2E). This result is consistent with the observation that FAD lifetime components are significantly different among EPCs, IMs, and iPSCs ( Fig. 1 and Supplementary Fig. S2). We then plotted the accuracy score as a function of the number of parameters, wherein the parameters with highest gain ratio values (Fig. 2E) were chosen (one parameter means FADa 1 ; two parameters means FADa 1 , FADs 1 , and so on).
Imaging using only FAD lifetime parameters requires a minimal time of 2.5 min per lFeature (Fig. 2H) without relying on intensity parameters that are associated with higher variability. Such variability can arise from confounding factors associated with intensity levels, such as laser power and detector gain. Hence, FAD lifetime parameters alone are sufficient to classify accurately the reprogramming status of cells.

Pseudotemporal ordering of single cells resolves cellular transitions
By sampling a process over a time course, profiles at a single-cell level can be used to order cells along a ''pseudotemporal'' continuum. Such ordering can resolve cellular transitions during

LABEL-FREE IMAGING TO TRACK REPROGRAMMING 181
complex processes such as organismal development. 45,68 Here we used 11 metabolic and 8 nuclear parameters to construct pseudotime trajectories of cellular reprogramming at a singlecell level using the Monocle3 algorithm. 45,69 Monocle3 is a trajectory inference method that learns combinatorial changes that each cell must go through as a part of a process and subsequently places each cell at its inferred location in the trajectory.
The inferred pseudotime trajectories built on our entire data set consisted of EPCs, IMs, and iPSCs distributed across 10 clusters, 4 branching events, and 1 disconnected branch (Fig. 3A-C). The primary trajectory-colored by pseudotime and actual reprogramming time points-exhibited transitions from EPCs to IMs to iPSCs as expected ( Fig. 3B and Supplementary Fig. S5A).
Trajectory inference indicated that the starting EPCs were heterogeneous and occupied three clusters ( Fig. 3C; clusters: 1, 2, and 3). Although cluster 2 consists of starting EPCs that undergo reprogramming, clusters 1 and 3 constituted the disconnected branch with EPCs that failed to progress through reprogramming. iPSCs predominantly occupied two clusters (clusters 7 and 10) irrespective of the reprogramming time point, whereas IMs belonged to several clusters (clusters 4, 5, 6, 8, and 9) with clusters 6 and 8 concentrated at the unsuccessful reprogramming branches (Fig. 3C).
Overall, these various trajectories provide a detailed map of several cases of reprogramming heterogeneity within human cells. For example, cells that advance right at branch points 1, 2, and 3 (Fig. 3A, B) completely reprogram to iPSCs within 25 days while cells that proceed left at branch points 1 and 3 (Fig. 3A, B) remain as IMs.
Cellular clusters that are adjacent to each other on the reprogramming pseudotime axis exhibit high correlation in their parameter values: that is, EPCs (cluster: 2) have a high correlation with early IMs (cluster: 4), whereas late IMs (clusters: 5, 6, 8, and 9) demonstrate high correlation with iPSCs (cluster: 7) (Fig. 3D). Moderate correlation of cells (cluster: 10) with the starting EPC cluster 1 could indicate more incompletely or partially reprogrammed cells in cluster 10 relative to the iPSCs in cluster 7. When we compared IMs that undergo reprogramming (clus-ter: 9) and the IMs that do not reprogram to iPSCs (cluster: 6), we noted differences in their NAD(P)H lifetime components, indicating that these parameters might play a role in identifying reprogramming cell fate.
To further examine the parameters that distinguished the cell clusters, we performed another correlation analysis using Moran's I, 70 which is a statistic that reports whether cells at nearby positions on a trajectory will have similar (or dissimilar) expression levels for a given parameter (Supplementary Fig. S5B). When the parameters were ranked by Moran's I, FAD lifetime parameters (FAD s 1 , s 2 , s m ) were most important in distinguishing clusters followed by NAD(P)H lifetime parameters (NAD(P)H s 2 , a 1, s 1 ), in agreement with expression level maps ( Supplementary Fig. S5C-H).
This result is consistent with high gain ratio values for FAD lifetime parameters (Fig. 2E) and the observation FAD lifetime parameters are significantly different among EPCs, IMs, and iPSCs (Fig. 1D). When we plotted the identified important metabolic parameters as a function of pseudotime, we observed that they undergo biphasic changes during reprogramming that could be representative of the OXPHOS burst ( Fig. 3E-J). These pseudotime trajectories complement the UMAP visualizations ( Fig. 2A-C) by providing higher temporal resolution of changes occurring during reprogramming.

Isolation of high-quality iPSCs
Although visualizing reprogramming heterogeneity at a high temporal resolution and single-cell resolution with our methods can be insightful, the terminal goal of any reprogramming platform is to successfully isolate iPSCs that can be used for downstream applications. As proof of concept, we used a combination of OMI, lCP platform, and machine learning models developed in this study to isolate high-quality iPSCs (Fig. 4). First, we tracked the metabolic and nuclear parameters of lFeatures throughout the reprogramming time course using OMI (Fig. 4A). Second, we employed our random forest classification model to predict the reprogramming status of the tracked lFeatures (Fig. 4B).

LABEL-FREE IMAGING TO TRACK REPROGRAMMING 183
Third, we inferred the pseudotimes during the reprogramming time course to monitor the progress of the lFeatures along the reprogramming trajectory (Fig. 4C). Finally, we performed immunostaining on the lFeatures. The reprogramming status predictions made by the machine learning models correlated well with the cell markers detected by immunostaining (Fig. 4D).
Finally, we isolated iPSCs from the lCP culture platform based on the predictions made by the random forest classification model. The physical separation of lFeatures from one another, combined with a high fraction of predicted iPSCs, even up to 100% throughout the lFeature, resulted in easy picking and isolation of iPSCs. We further confirmed that the isolated iPSC lines expressed pluripotency markers (Fig. 4E) and showed no genomic abnormalities (Fig. 4F), indicating that our approach supports the generation of genetically stable iPSC lines.

Discussion
In this study, we report a noninvasive, high-throughput, quantitative, and label-free imaging platform to predict the reprogramming outcome of EPCs by combining micropatterning, live-cell autofluorescence imaging, and automated machine learning. We can predict the reprogramming status of EPCs at any time point during reprogramming with an accuracy of *95% and model performance of *0.99 (AUC of ROC) using a random forest classification model with 11 metabolic parameters and 8 nuclear parameters (Fig. 2G and Supplementary  Fig. S4C-F). In addition, we provide a single-cell roadmap of EPC reprogramming, which reveals diverse cell fate trajectories of individual reprogramming cells (Fig. 3).
Recent evidence indicates that metabolic changes during reprogramming include decreasing OXPHOS and increasing glycolysis, 26,71 along with a transient hyperenergetic metabolic state, called OXPHOS burst. This OXPHOS burst occurs at an early stage of reprogramming and shows characteristics of both high OXPHOS and high glycolysis, which could be a regulatory cue for the overall shift of reprogramming. 28,29,72,73 These changes are accompanied by alterations in the amounts of corresponding metabolites and have been confirmed by genomewide analyses of gene expression, protein levels, and metabolomic profiling. 53,[74][75][76] The shifts in cellular metabolism affect enzymes that control epigenetic configuration, 77 which can impact chromatin reorganization and provide a basis for changes in nuclear morphology as well as gene expression during reprogramming. 34,78-81 Consistent with these studies, the redox ratio increases during reprogramming (Fig. 1D), which could be indicative of increased glycolysis during reprogramming. 82 The changes in NAD(P)H and FAD lifetime parameters that occur during reprogramming ( Fig. 1 and Supplementary  Fig. S2) could reflect changes in quencher concentrations, such as oxygen, tyrosine, or tryptophan, or changes in local temperature and pH. 35,83,84 Specifically, the biphasic changes in the metabolic and nuclear parameters could be due to the increased production of reactive oxygen species (ROS) by mito-chondria 35,59,85,86 during the OXPHOS burst. The generated ROS could further serve as a signal to activate nuclear factor (erythroid derived 2)-like-2 (NRF-2), which then induces hypoxiainducible factors (HIFs) that promote glycolysis during reprogramming by increasing the expression levels of the glycolysisrelated genes. 25,73,75 Moreover, the importance of FAD parameters for distinguishing various reprogramming cell types (Figs. 1D and 2E) could point to the significant changes in the mitochondrial environment during reprogramming. The differences in NAD(P)H lifetime parameters among IMs that successfully undergo reprogramming and those that do not (Fig. 3D) may suggest a role for molecular pathways involving NAD(P)H in impacting reprogramming barriers and thus the end cell fate of reprogramming cells.
The classification models trained on all 11 metabolic and 8 nuclear parameters had the highest accuracy. Random forest models using only FAD lifetime parameters, in particular, yielded comparatively high ROC AUC values ( Fig. 2G and Supplementary  Fig. S4E, F). In addition, FAD lifetime parameters were more accurate for predicting reprogramming status than using nuclear parameters alone, which can be obtained using wide-field or confocal fluorescence microscopy.
Imaging only FAD lifetime parameters instead of imaging all the parameters significantly reduced the time of imaging from 7 to 2.5 min per lFeature (Fig. 2H). This reduction in imaging time is especially helpful when assessing multiple lFeatures for iPSC quality at larger scales, and lifetime measurements benefit from fewer confounding factors and less variability compared with intensity measurements.
Prior studies on the heterogeneity of reprogramming relied on bulk analysis [87][88][89][90] or single-cell analysis [91][92][93][94][95][96][97][98][99] techniques. Prior bulk analyses encountered challenges in characterizing the variability in both the starting cell population and during fate conversion, owing to the variable kinetics and low efficiency of reprogramming, and single-cell techniques disrupted the cellular microenvironment, resulting in significant changes in the biophysical properties of cells undergoing reprogramming. Our methods overcome these challenges with the combination of a lCP culture platform, OMI, and machine learning.
First, the lCP platform dissects cell cultures undergoing reprogramming into hundreds of cell subpopulations that enables easy tracking of cells undergoing reprogramming with imaging time as low as 2.5 min per lFeature. Circular and isolated lFeatures, however, are not necessary to implement OMI. Moreover, the lCP platform ensures an intact microenvironment for reprogramming cells while enabling single-cell analysis. Second, OMI provides single-cell measurements nondestructively to assess the influence of neighboring cells and provides high temporal resolution for time-course studies of reprogramming. Finally, machine learning with trajectory inference is applied here to a new type of cellular measurement, single cell metabolism. These methods excel in analyzing time course data containing asynchronous processes within cellsas seen in prior studies with flow cytometry and gene expression data. 45,74,97,98,100 Machine learning here overcomes the problems of reprogramming trajectories built based on absolute LABEL-FREE IMAGING TO TRACK REPROGRAMMING time points. Overall, these methods could aid in the identification of somatic cells or early reprogramming cells that are refractory toward reprogramming and thus increase the success rate of iPSC generation from patient-derived primary cells.
Our methods could be adapted for industrial-scale GMPcompliant manufacturing system. First, the lCP platform involves direct extracellular matrix printing onto optically clear substrates (Supplementary Fig. S1A) and does not involve any gold coating, unlike traditional microcontact printing methods. 101,102 Therefore, the lCP platform is cost-effective and relatively simple because it does not require cleanroom access. Plus, our lCP platform utilizes prior advances in the field with xeno-free and feeder-free medium, thus preventing reprogramming inconsistencies arising from the undefined nature of xeno components. Second, we used EPCs isolated from peripheral blood as the starting cell type for reprogramming due to their lack of genomic rearrangements and demonstrated reprogramming ability. 103,104 Moreover, peripheral blood collection is a routine laboratory procedure. The use of EPCs with episomal reprogramming plasmids is likely to generate genetically stable iPSCs devoid of reprogramming factors, as seen in prior studies with EPCs 103,104 and episomal plasmids. 51,52,105,106 Third, the autofluorescence imaging technique is label-free, unlike other common methods to study metabolism such as electron microscopy, immunocytochemistry, and colorimetric metabolic assays. Autofluorescence imaging provides nondestructive real-time monitoring of live cells with lower sample phototoxicity compared with single-photon excitation. 107 Taken together, the processes of lCP platform fabrication, reprogramming, cell culture, autofluorescence imaging, iPSC identification based on machine learning models, and iPSC isolation can all be automated and extended to different reprogramming methods 108 (e.g., mRNA, Sendai virus), to other starting cell types (e.g., fibroblasts, keratinocytes), to other parameters (e.g., cell morphology, 19-21 mitochondrial structure 71,76,109 ), and to other processes (e.g., differentiation [110][111][112][113]. Limitations of our current approach include imaging resolution, culture duration, and per-lFeature image analysis. First, comprehensive three-dimensional imaging of each lFeature could provide maps at higher resolution to further dissect the metabolic and nuclear changes occurring throughout the entire depth of the reprogramming cultures. Second, any discrepancy in the z-plane during OMI and confocal fluorescence image acquisition on the same sample could lead discrepancies in the data that lead to poor classification of cell types by the random forest models. We mitigated this issue by programming a fixed z-plane for both OMI and confocal fluorescence image acquisition. Third, there is a limited duration of culture before cells overgrow within the lFeature. This could also result in cell detachment from the lFeature, which is difficult to image with OMI. For the reprogramming experiments described here, circular features with 300 lm radius have been used for *22 days of culture, although the cell seeding density or micropatterned geometry 34,43,101 could be easily changed. Fourth, because our primary focus was on identifying iPSCs that could be isolated to establish cell lines, this study did not distinguish the diverse cell states of IMs between day 8 and day 17 of reprogramming and subtle changes in iPSC state. For instance, the intensity levels of NANOG and other key pluripotency transcription factors were not accounted for in our analyses, making it difficult to distinguish between partially reprogrammed cells, early and late stage iPSCs, and competing cells in culture. 100,[114][115][116] Higher resolution staining for NANOG along with additional stem cell markers (e.g., TRA-1-60, MYC, OCT4, E-CADHERIN) could provide further insights into the differences between cell clusters identified in our analysis. Finally, imaging analysis was performed at different reprogramming time points on a per-lFeature basis. Tracking single cells within the lFeature during reprogramming using cell tracking algorithms 117 could provide insights into metabolic and nuclear changes during reprogramming at a higher resolution.
Overall, we developed a high-throughput, noninvasive, rapid, and quantitative method to predict the reprogramming status of cells and study reprogramming heterogeneity. Our studies indicate that OMI can predict the reprogramming status of cells, which could enable real-time monitoring during iPSC manufacturing, thereby aiding in the identification of high-quality iPSCs in a timely and cost-effective manner. Similar technologies could impact other areas of cell manufacturing such as direct reprogramming, differentiation, 110 and cell line development.

The Bigger Picture
Cell cultures undergoing epigenetic reprogramming are complex, as cells span a broad spectrum of cell states. In this study, we use two-photon microscopy to acquire high-dimensional data on the nuclear and metabolic characteristics of cells and develop machine learning models to predict reprogramming outcomes. This analysis can offer an orthogonal approach to investigating human somatic cell reprogramming that is complementary to the current studies of transcript and protein expression within reprogramming cells.
With further development, strategies using these imaging techniques could enable a rapid, automated, and standardized method to isolate iPSCs. This proof-of-principle study indicates that an image-based approach in conjunction with trajectory inference methods can elucidate cellular and subcellular changes that accompany human cell fate transitions.

EPC isolation and cell culture
EPCs were isolated from fresh peripheral human blood that was obtained from healthy donors (Interstate Blood Bank, Memphis, TN). Research using purchased de-identified blood specimens are not considered human subjects research under the US Common Rule. Blood was processed within 24 h of collection, where hematopoietic progenitor cells were extracted from whole blood using negative selection (RosetteSep; STEMCELL Technologies) and cultured in polystyrene tissue culture plates in erythroid expansion medium (STEMCELL Technologies) for 10 days to enrich for EPCs.
Enriched EPCs from day 10 were examined by staining with APC antihuman CD71 antibody (1:100; 334107; Biolegend) and incubating for 1 h at room temperature (RT). Data were collected on Attune Nxt flow cytometer and analyzed with FlowJo.

Micropattern design and polydimethylsiloxane stamp production
First, a template with the feature designs was created in AutoCAD (Autodesk). The template was then sent to the Advance Reproductions Corporation, MA, for the fabrication of a photomask, and a 6-inch (0.15 m) patterned silicon (Si) wafer was fabricated by the Microtechnology Core, University of Wisconsin-Madison, WI. 118 Using soft photolithography techniques, the Si wafer was spin coated with an SU-8 negative photoresist (MICRO CHEM) and exposed to UV light. The Si mold was then developed for 45 min in SU-8 developer (Sigma) that yielded features with a height of 150 lm. The Si mold was then washed with acetone and isopropyl alcohol. Elastomeric stamps used for microcontact printing were generated by standard soft lithographic techniques. The Si mold was rendered inert by overnight exposure in vapors of (tridecafluoro-1, 1, 2, 2tetrahydrooctyl) trichlorosilane. Polydimethylsiloxane (PDMS; Sylgard 184 silicone elastomer base, 3097366-1004; Dow Corning) was prepared at a ratio of 1:10 curing agent (Sylgard 184 silicone elastomer curing agent, 3097358-1004; Dow Corning) and degassed in a vacuum for 30 min. The PDMS was then poured over the SU-8 Si mold on a hot plate and baked at 60°C overnight to create the PDMS stamp.
lCP well plate construction lCP substrates were constructed based on previous studies. 47,48,119 In brief, PDMS stamps with 300 lm radius circular features were coated with Matrigel (WiCell Research Institute) for 24 h. After 24 h, the Matrigel-coated PDMS stamp was dried with nitrogen and placed onto 35 mm cell culture-treated ibiTreat dishes (81156; Ibidi). A 50 g weight was added on top of the PDMS stamps to ensure even pattern transfer from the Matrigel-coated PDMS stamp to the ibiTreat dish.
This setup was incubated for 2 h at 37°C. The 35 mm ibiTreat dish was then backfilled with PLL (20 kDa)-g-(3.5)-PEG (2 kDa) (Susos), a graft polymer solution with a 20 kDa PLL backbone with 2 kDa PEG side chains, and a grafting ratio of 3.5 (mean PLL monomer units per PEG side chain), by using 0.1 mg/mL solution in 10 mM HEPES buffer for 30 min at RT. The ibiTreat dish was then washed with phosphatebuffered saline (PBS) and exposed to UV light for 15 min for sterilization to yield the micropatterned substrate.

Isolation of iPSCs
To isolate high-quality iPSC lines, candidate colonies were picked from micropatterns using a 200 lL micropipette tip and transferred to Matrigel-coated polystyrene tissue culture plates in mTeSR1 medium (WiCell Research Institute). If additional purification was required, one additional manual picking step with a 200 lL micropipette tip was performed. During picking and subsequent passaging, the culture medium was often supplemented with the Rho kinase inhibitor Y-27632 (Sigma-Aldrich) at a 10 lM concentration to encourage cell survival and establish clonal lines. iPSCs obtained from EPCs were maintained in mTeSR1 medium on Matrigel-coated polystyrene tissue culture plates and passaged with ReLeSR (STEMCELL Technologies) every 3-5 days. All cells were maintained at 37°C and 5% CO 2 .
Secondary antibodies were obtained from Thermo Fisher Scientific and applied in a blocking buffer of 5% donkey serum for 1 h at RT at concentrations of 1:400 to 1:800. A Nikon Eclipse Ti epifluorescence microscope was used to acquire single 10 · images of each micropattern, and a Nikon AR1 confocal microscope was used to acquire 60 · stitched images of each micropattern using the z-plane closest to the micropatterned substrate for reprogramming studies. In brief, EPCs are identified as CD71 + ; Nanog À , IMs are indicated as CD71 À ; Nanog À , and iPSCs are indicated as CD71 À ; Nanog + .

Autofluorescence imaging of NAD(P)H and FAD
Fluorescence lifetime imaging was performed at different time points during reprogramming by an Ultima two-photon microscope (Bruker) composed of an ultrafast tunable excitation laser source (Insight DS+; Spectra-Physics) coupled to a Nikon Ti-E inverted microscope with time-correlated single-photon counting electronics (SPC-150; Becker & Hickl). The laser source enables sequential excitation of NAD(P)H at 750 nm and FAD at 890 nm. NAD(P)H and FAD images were acquired through 440/80 and 550/100 nm bandpass filters (Chroma), respectively, using gallium arsenide phosphide photomultiplier tubes (H7422; Hamamatsu).
The laser power at the sample was *3.5 mW for NAD(P)H and 6 mW for FAD. Lifetime imaging using time-correlated single-photon counting electronics (SPC-150; Becker & Hickl) was performed within Prairie View Atlas Mosaic Imaging (Bruker Fluorescence Microscopy) to capture the entire lFeature. Fluorescence lifetime decays with 512-time bins were acquired across 512 · 512-pixel images with a pixel dwell time of 4.8 ls and an integration period of 60 s. Photon count rates were *1-5 · 10 5 and monitored during image acquisition to ensure that no photobleaching occurred.
All samples were placed on a stage-op incubator and illuminated through a 40 · /1.15 NA objective (Nikon). The short lifetime of red blood cell fluorescence at 890 nm was used as the instrument response function and had a full-width half maximum of 240 ps. A yellow green (YG) fluorescent bead (YG microspheres, Polysciences Inc.; s = 2.13 -0.03 ns, n = 6) was imaged daily as a fluorescence lifetime standard. 35,120 Image analysis Fluorescence lifetime decays were analyzed to extract fluorescence lifetime components through SPCImage software (Becker & Hickl). A threshold was used to exclude pixels with low fluorescence signals (i.e., background). A bin of 3 · 3 pixels was used to maintain spatial resolution, the fluorescence lifetime decay curve was convolved with the instrument response function and fit to a two-component exponential decay model, I(t) = a 1 e Àt/s 1 + a 2 e Àt/s 2 + C, where I(t) is the fluorescence intensity as a function of time t after the laser pulse, a 1 and a 2 are the fractional contributions of the short and long lifetime components, respectively (i.e., a 1 + a 2 = 1), s 1 and s 2 are the short and long lifetime components, respectively, and C accounts for background light.
Both NAD(P)H and FAD can exist in quenched (short lifetime) and unquenched (long lifetime) configurations 39,40 ; the fluorescence decays of NAD(P)H and FAD are, therefore, fit to two components. Fluorescence intensity images were generated by integrating photon counts over the per-pixel fluorescence decays.
Images were analyzed at the single-cell level to evaluate cellular heterogeneity. 121 A pixel classifier was trained on 15 images using ilastik 60 software to identify the pixels within the nuclei in NAD(P)H images. An object classifier was then used to identify the nuclei in NAD(P)H images using the pixel classifier along with the following parameters: Method = Simple, Threshold = 0.3, Smooth = 1, Size Filter Min = 15 pixels, Size Filter Max = 500 pixels. A customized CellProfiler 61 pipeline was then used to obtain metabolic and nuclear parameters. The CellProfiler pipeline applied the following steps: Primary objects (nuclei) were inputted from ilastik. Secondary objects (cells) were then identified in the NAD(P)H intensity image by outward propagation of the primary objects.
Cytoplasm masks were determined by subtracting the nucleus mask from the cell mask. Cytoplasm masks were applied to all images to determine single-cell redox ratio and NAD(P)H and FAD lifetime parameters. A total of 11 metabolic parameters were analyzed for each cell cytoplasm ( Supplementary Fig. S1D): NAD(P)H intensity (I NAD(P)H ), NAD(P)H a 1 , NAD(P)H s 1 , NAD(P)H s 2 , NAD(P)H mean lifetime (s m = a 1 s 1 + a 2 s 2 ), FAD intensity (I FAD ), FAD a 1 , FAD s 1 , FAD s 2 , FAD s m , optical redox ratio [I NAD(P)H /(I NAD(P)H + I FAD )]. A total of eight nuclear parameters were analyzed for each nucleus: area, perimeter, MeanRad, NSI, solidity, extent, number of neighbors (#Neigh), and distance to closest neighbor (1stNeigh).
Representative images of the optical redox ratio, NAD(P)H s m and FAD s m , were computed using the Fiji software.
z-Score hierarchical clustering z-Score of each metabolic and nuclear parameter for each cell was calculated. z-Score = (l observed À l row )/r row , where l observed is the mean value of each parameter for each cell; l row is the mean value of each parameter for all cells together, and r row is the standard deviation of each parameter across all cells. Heatmaps of z-scores for all OMI variables were generated to visualize differences in each parameter between different cells. Dendrograms show clustering based on the similarity of average Euclidean distances across all variable z-scores. Heatmaps and associated dendrograms were generated in Python.

Classification methods
Random forest, simple logistic, k-nearest neighbor (IBk), and naive Bayes classification methods were trained to classify reprogramming cells into EPCs, IMs, and iPSCs using Weka software. 122 All data were randomly partitioned into training and test data sets using 15-fold crossvalidation for training and test proportions of 93.3% (1994 cells) and 6.7% (143 cells), respectively. Cross-validation of EPCs, IMs, and iPSCs was performed using immunostaining data. Each model was replicated 100 times; new training and test data were generated before each iteration.
Parameter weights for metabolic and nuclear parameters were extracted using the GainRatioAttributeEval function in Weka to determine the contribution of each variable to the trained classification models. One-vs-rest ROC curves were generated to evaluate the classification model performance on the classification of test set data and are the average of 100 iterations of data that was randomly selected from training and test sets. All the ROC curves displayed were constructed from the test data sets using the model generated from the training data sets.

Karyotyping
Cells cultured for at least five passages were grown to 60-80% confluence and shipped for karyotype analysis to WiCell Research Institute, Madison, WI. G-banded karyotyping was performed using standard cytogenetic protocols. 123 Metaphase preparations were digitally captured with Applied Spectral Imaging software and hardware. For each cell line, 20 GTL-banded metaphases were counted, of which a minimum of 5 was analyzed and karyotyped. Results were reported in accordance with guidelines established by the International System for Cytogenetic Nomenclature 2016. 124

Pseudotime trajectories
To order cells by pseudotime, EPCs were designated the starting point. We used Monocle3 to define a pseudo-reprogramming time trajectory, termed pseudotime, where cells are linearly ordered relative to their progress or change in metabolic and nuclear parameters relative to the starting EPC population. The lengths of the trajectory between each branch point were used to define state by the Monocle3 algorithm implemented in Rstudio Version 1.3.1073 as described in CodeS1.R file.