The Distribution of Genomic Variations in Human iPSCs Is Related to Replication-Timing Reorganization during Reprogramming

reprogramming, indicative of overall genome reorganization. We found that early- and late-replicating domains in iPSCs are differentially affected by copy-number gains and losses and that in particular, CNV gains accumulate in regions of the genome that change to earlier replication during the reprogramming process. This differential relationship was present irrespective of reprogramming method. Overall, our ﬁndings reveal a functional association between reorganization of replication timing and the CNV landscape that emerges during reprogramming.


INTRODUCTION
The genome is topologically organized in three-dimensional space within the nucleus and is highly dynamic as cell fate changes during normal development, as well as in disease states, such as cancer (Ryba et al., 2012). The significant cell-fate changes that occur during embryonic stem cell (ESC) differentiation and reprogramming of somatic cells to pluripotency also involve genome-wide resetting of the higher-order chromatin structure (Watanabe et al., 2013). In both scenarios, genome reorganization precedes gene expression changes, suggesting a potential causal relationship (Apostolou et al., 2013;Phillips-Cremins et al., 2013;Wei et al., 2013;Zhang et al., 2013).
Higher-order genome organization has been shown to significantly influence the distribution of genomic aberrations in both immortalized somatic cells and cancer cells, but the underlying mechanism remains elusive (De and Michor, 2011;Fudenberg et al., 2011;Koren et al., 2012;Schuster-Bö ckler and Lehner, 2012). One method of mapping genomic organization is by segmenting the genome into domains based on replication timing, which occurs in a tightly regulated cell-type-specific manner . Domains with similar replication timing tend to colocalize within the nucleus, and as a result, genome organization defined by replication timing overlaps with other methods of defining chromatin topology, including DNase I-hypersensitivity profiles, various epigenetic markers, and genome-wide chromatin-interaction mapping (e.g., Hi-C) (Dixon et al., 2012;Hu et al., 2013;Ryba et al., 2010). Several studies have revealed that the cancer mutational landscape is closely associated with replication timing by demonstrating that copy-number variation (CNV) boundaries tend to have similar replication timing, and gains and losses distribute differentially with respect to replication timing (De and Michor, 2011;Liu et al., 2013;Schuster-Bö ckler and Lehner, 2012). These studies share one important limitation: the replication-timing maps were not generated from the same cell types in which the CNVs were analyzed. This limitation is significant because cellular identity is closely coupled to genome organization; in fact, only half of the genome has stable replication timing across all cell types, with the remainder being organized in a highly cell-type-specific manner (Hansen et al., 2010;Hiratani et al., 2008Hiratani et al., , 2010. Thus, when an unmatched replication-timing profile is used, up to 50% of the genome may not be accounted for during the analysis due to the cell-type specificity of genome organization. How cell fate change and its associated genome reorganization are related to genome variation has not been examined. We overcome the limitation of unmatched replication-timing maps by generating replication-timing profiles of both induced pluripotent stem cells (iPSCs) and their parental fibroblasts in order to explore the link between the CNV landscape and genome reorganization due to cell fate change. We show that nuclear reprogramming results in dramatic replication-timing reorganization that influences the observed CNV landscape in iPSCs.

CNVs Reside within Replication-Timing Domains
Primary dermal fibroblasts from one healthy volunteer (wild-type, WT) were reprogrammed using retroviral transduction of the standard Yamanaka factors to generate multiple iPSC lines (Takahashi et al., 2007) (Figure S1). Replication-timing profiles were generated from an iPSC line and the parental fibroblasts. Newly replicated DNA from early and late S phase was differentially labeled and hybridized to a whole-genome oligonucleotide microarray. The ratio of the abundance of each probe in the early versus late fractions generates a replication profile, which reveals clear demarcation between megabase-sized regions of coordinated replication called replication domains ( Figure 1A). Computational segmentation algorithms can be used to define the replication domains (Ryba et al., 2011). The iPSC profile we generated is identical to previously published profiles of both human ESCs and iPSCs, reflecting human pluripotent stem cell-specific genome organization ( Figure S1D) .
To map the corresponding genome-variation landscape of iPSCs, we analyzed the fibroblast and low-passage (p6) iPSC lines using high-resolution Affymetrix SNP Array 6.0 (Tables S1  and S2). CNV calls were validated using quantitative PCR with an overall validation rate of 91% (Supplemental Experimental Procedures). Raw data from the replication-timing profiles also  A) Newly replicated DNA from early and late S phase is differentially labeled and hybridized to a whole-genome oligonucleotide microarray to produce a replicationtiming profile. Loess-smoothed replication-timing ratio (filled gray) of the first 50 Mb of chromosome (Chr) 1 is shown as an example. Segmentation algorithms are used to define replication domains (black segments) as regions with similar replication-timing ratio. TTRs are defined as genomic regions with a significant rate of replication-timing ratio change (purple). TTRs cover approximately one-third of the genome. CNVs (green segments) are plotted above the replication profile, and they can fall either entirely within individual domains or cross boundaries and involve more than one domain. A CNV may or may not involve a TTR. (B) Permutation analysis demonstrates that CNVs are significantly more likely to reside within individual replication domains than expected by chance in both iPSCs and parental fibroblasts (fib). Boxes display the percentages obtained from 1,000 simulations, and red diamonds represent the actual percentages. (C) TTRs are not enriched within CNVs. The actual amount (black) of TTR and non-TTRs affected by CNVs, compared to expected numbers (gray) based on random distribution, is shown with p values (Fisher's exact test) above each plot. See also Figure S1.
independently validated all the homozygous losses detected by SNP arrays in the same cell lines ( Figure S1E) (Ryba et al., 2012). Most CNVs of iPSCs were not detected in the parental fibroblasts and are heretofore referred to as ''iPSC manifested'' (Tables S1 and S2). iPSC-manifested CNVs in low-passage iPSCs could represent clonal expansion of low-frequency genetic aberrations in the tissue of origin or de novo mutations during the reprogramming process (Liang and Zhang, 2013).
We found that, whereas two-thirds of CNVs detected in iPSCs involve genic regions (including exons and introns), the affected genes do not fall into any clear functional groups by Gene Ontology analysis (Table S3). The CNV-affected genes did not show any particular relationship to pluripotency, nor did we find any enrichment of tumor suppression genes or oncogenes. Our data are similar to other published reports that failed to demonstrate consistent functional consequences due to genomic aberrations in iPSCs (Amps et al., 2011;Hussein et al., 2011;Martins-Taylor et al., 2011).
We mapped the iPSC CNVs to the newly generated replication-timing profiles and found that more than 95% of CNVs are contained within individual replication domains, irrespective of whether the iPSC or the parental fibroblast profile was used. Permutation analysis using size-controlled random CNV lists demonstrates that the detected CNVs are significantly more likely to reside completely within individual replication domains and do not span domain boundaries than would be expected to occur by chance ( Figure 1B).
Timing transition regions (TTRs), defined by genomic regions with a significant rate of replication-timing ratio change, have different replication kinetics compared to the center of the replication domain. TTRs probably represent sequences vulnerable to DNA damage (Labib and Hodgson, 2007;Watanabe et al., 2002Watanabe et al., , 2004. However, we did not find an enrichment of TTR sequence in CNV-affected genomic regions in iPSCs ( Figure 1C).

Impact of Replication-Timing Dynamics on the CNV Landscape
Reprogramming and ESC differentiation involve global changes in chromosome structure that include temporal reorganization of replication domains and spatial reorganization of the genome; in fact, one-half of the genome changes organization at some stage during development (Hiratani et al., 2008;Orkin and Hochedlinger, 2011;Ryba et al., 2010). Comparison of fibroblast and iPSC replication-timing profiles revealed that during reprogramming, 40% of the genome changes replication timing, as defined by a greater than 0.5 difference in the early/late ratio between fibroblast and iPSC profiles ( Figure 2A). This cutoff corresponds to the amount of time (about 1 hr) required to complete a replication factory (Leonhardt et al., 2000) and is the same cutoff for significance for transcription (D.M.G., unpublished data). This change is nearly double in magnitude compared to the change observed when ESCs are differentiated into neural precursor cells (Hiratani et al., 2008), reflecting the extensive genome reorganization during nuclear reprogramming.
To test how genome reorganization influences the genomevariation landscape, we determined the fibroblast and iPSC replication-timing values of genomic loci affected by iPSCmanifested CNVs using cell-type-specific replication-timing profiles. The distribution of replication-timing values of CNV gain loci is statistically different in fibroblasts and iPSCs; they are predominantly late replicating in fibroblasts but early replicating in iPSCs. In contrast, loci affected by CNV losses are generally late replicating in both cell types. (Figure 2B).
To better understand the dynamics of the CNV landscape, a genome-wide representation of replication-timing dynamics during reprogramming was generated by plotting fibroblast versus iPSC replication-timing values for all 620,000 probes on the array ( Figure 2C). In this plot, the gray diagonal region represents genomic loci with no change in replication timing during reprogramming, defined as a less than 0.5 difference in replication-timing ratio between fibroblasts and iPSCs. The portions of the genome that replicate either earlier ( Figure 2C, red) or later (blue) in iPSCs than in fibroblasts fall outside the diagonal. When the replication-timing values of CNVs were plotted on this genome background, we show that CNV gains are enriched in the genome compartment that changes to earlier replication during reprogramming. This scatterplot presentation reveals the replication-timing dynamics underlying the distribution shift of gains seen in Figure 2B. In contrast, the majority of losses reside in late replicating regions in both fibroblasts and iPSCs ( Figure 2C), consistent with the absence of a shift in the CNV loss distribution ( Figure 2B). Together, these findings further support the strong relationship between replication timing and genome variation by demonstrating that the distribution of CNVs observed in low-passage iPSCs is influenced by replication-timing reorganization associated with cell fate change.
Human iPSC genomic variation has been studied using a variety of approaches. Hussein et al. and Laurent et al. generated iPSC using the same retroviral factors as we did here, but whereas Hussein et al. also used the Affymetrix 6.0 SNP array, Laurent et al. used the Illumina Omniquad v.1 SNP array (Hussein et al., 2011;Laurent et al., 2011). More recently, Abyzov et al. used whole-genome sequencing to identify CNVs in retroviral reprogrammed iPSC lines (Abyzov et al., 2012). None of these studies considered the role of higher-order chromatin organization on genomic variation, and their data sets serve as de facto controls to exclude potential lab-specific, CNV detection platform-specific, or parental fibroblast-specific effects in our experiments. Therefore, we analyzed these data sets relative to our newly generated replication-timing profiles. To maintain consistency with our experimental approach, we limited the analysis to data sets of low-passage iPSCs generated from dermal fibroblasts. An overall similar trend was seen using data sets from different laboratories: CNV distribution is nonrandom, with gains predominating in regions changing to earlier replication in iPSCs ( Figure S2B).

Nonrandom, Differential CNV Distribution Is a Generalized Feature of Reprogramming
To exclude the possibility that the nonrandom replication-timing distribution of the CNVs is due to a direct effect of the retroviral vectors (RVs) used for reprogramming, we applied our analysis to iPSCs generated using other reprogramming methods. To this end, we generated iPSCs using episomal (EP) factors that do not integrate into the genome. In low-passage EP iPSCs, like RV iPSCs, CNV gains are significantly enriched in regions (legend continued on next page) of the genome whose replication timing changed to earlier during reprogramming. Again, we found that the majority of CNV losses occur at regions that do not significantly change replication timing during reprogramming (Figures 3A and S3).
Next, we analyzed Hussein et al.'s data on sister iPSC lines derived from the same parental fibroblast line (human foreskin fibroblasts, HFFs) using either retroviruses or piggyBac transposons (PBs) (Hussein et al., 2011). Distribution plots similar to Figure 2B from multiple data sets were directly compared to each other ( Figures 3B and S3). Gains and losses from these PB and RV iPSCs had a similar nonrandom distribution with respect to replication-timing dynamics during reprogramming: gains were likely to have earlier replication in iPSCs compared to fibroblasts, whereas losses maintain late replication. Together, these findings demonstrate that the trend for nonrandom, differential CNV gain and loss accumulation as a function of change in replication timing is a generalized feature of reprogramming.

DISCUSSION
Improved mapping and understanding of the higher-order genome organization is providing a clearer picture of how genome functions, such as gene expression and genome integrity, are related to its spatial organization (Cavalli and Misteli, 2013). A static association between genome organization and genome aberration has been described, but how the mutagenic landscape is affected by changes in genome organization has not been studied. Here, by leveraging reprogramming to alter cell fate and analyzing the genome aberration landscape before and after using cell-type-matched genome-wide replication-timing profiles, we show that the CNV landscape of iPSCs is spatially constrained by chromatin organization, and the distribution of CNVs is influenced by changes in replication timing due to reprogramming.
Some studies have reported that reprogramming results in genetic aberrations such as point mutations and CNVs in human iPSCs (Gore et al., 2011;Hussein et al., 2011;Laurent et al., 2011;Martins-Taylor et al., 2011;Mayshar et al., 2010). However, more recently, others have demonstrated that the majority of iPSC CNVs preexisted in cells within the parental population (Abyzov et al., 2012;Cheng et al., 2012;Quinlan et al., 2011). Consistent with these reports, we found shared iPSCmanifested CNVs among the iPSC clones in each of our reprogramming experiments, albeit at low frequency. Interestingly, we also found that about 50% of iPSC-manifested iPSC CNVs have overlap with known CNVs in the Database of Genomic Variation (MacDonald et al., 2014), and these likely represent previously existing low-frequency CNVs in the starting cell population.
Ultimately, the accumulation of genomic changes in any genome will result from a combination of mutation rates, repair rates, and selection. In a completely stochastic process, akin to genetic drift, CNV gains and losses would be distributed randomly. Our results show that this is not the case because CNVs are nonrandomly distributed in iPSCs as a function of replication-timing change. iPSC-manifested gains predominate in loci that change to earlier replication during reprogramming, whereas losses occur in loci that remain late replicating. The mechanism and timing of this relationship are not immediately apparent, but it is conceivable that both preexisting and de novo CNVs are subject to the influence of the extensive genome reorganization (Figure 4).
Switching to earlier replication involves epigenetic resetting that results in increased active chromatin modifications, such as histone acetylation (Hiratani et al., 2008;Schwaiger et al., 2009) and an overall open chromatin state. These regions contain genes essential to the cell's new functions and are therefore more likely to be under increased selection during reprogramming. Although this functional selection is plausible, it is difficult to reconcile with the inability to link CNVs in iPSCs to gene expression changes. Neither our gene expression analysis nor the published literature has consistently identified gene expression changes that correlate with genomic aberration to prove that this mechanism is dominant (Amps et al., 2011;Hussein et al., 2011;Martins-Taylor et al., 2011). In fact, in a recent analysis of 125 human pluripotent stem cell lines, the most commonly identified aberration was a minimal amplicon at 20q11.21 (affecting >20% of lines), yet there was no correlation between the presence of this amplicon and the expression of the genes it contains (Amps et al., 2011).
Variable efficiency of DNA damage response (DDR) pathways in different chromatin compartments could contribute to the differential distribution of gains and losses. It is possible that DNA damage occurring in early S phase is more accurately repaired due to higher-fidelity repair mechanisms, including copying from the correct template (Chang and Cimprich, 2009). Not only is DDR more efficient in open chromatin, but also heterochromatin appears to retard repair processes and could potentially lead to accumulation of deletions in the late replicating compartment (Goodarzi et al., 2008;Murga et al., 2007;Rü be et al., 2011). Another mechanism to explain differential distribution of de novo mutations is that replication-timing reorganization could adversely affect the interaction between replication and transcription machineries, leading to replication and transcription collisions, and consequently to double-strand breaks and CNVs (Helmrich et al., 2013).
Interestingly, we noticed subtle differences in CNV occurrence among the three reprogramming methods. In contrast to RV iPSCs, PB and EP iPSCs overall had significantly more CNV gains compared to losses ( Figure 3B), suggesting that there (C) Replication-timing values of CNV gains (top) or losses (bottom) were determined using the profiles of both fibroblasts and iPSCs and were shown as a scatterplot with density-smoothed genome-wide replication-timing values (RTV) of over 620,000 probes serving as background. The gray central region represents $60% of the genome with no change in replication timing defined as iPSC RTV À fibroblast RTV <±0.5. Regions with greater than +0.5 change (iPSC RTV À fibroblast RTV) are classified as ''earlier in iPSC'' (red), and those with less than À0.5 change (iPSC RTV À fibroblast RTV) are classified as ''later in iPSCs'' (blue). The bar graphs depict the distribution of CNV gains or losses compared to the genome-wide distribution of RTVs among these regions. Upward-pointing arrowheads indicate statistically significant enrichment of CNVs in the region, whereas downward-pointing arrowheads indicate statistically significant depletion of CNVs compared to the genome distribution in corresponding regions based on binomial test. Number of arrowheads correlates with p value significance (see Figure S2). may be a differential impact of distinct reprogramming methods on different genomic compartments and the iPSC CNV landscape. This finding warrants further investigation and is likely intertwined in the complex interplay of various other internal and external genomic stresses, the cell's DDR network, and selective pressure exerted by the cell's environment.  Figure 2D. Matched BJ fibroblast replication-timing profile is used. p values are shown in Figure S3A. (B) Kernel density plots show the replication-timing distribution of CNV gains and losses detected in iPSCs derived using retroviral (purple), PB (blue), or EP (green) vectors. RV and PB iPSCs were derived from the same parental lines (Hussein et al., 2011). EP iPSCs were reprogrammed from HFF line BJ. Gains enrich in regions that change to earlier replication during reprogramming in all cases. (Numbers of CNVs in parentheses. p values from K-S tests are shown in Figure S3B.) In sum, genome-wide analysis of iPSC CNVs using the replication-timing profiles of both fibroblasts and iPSCs demonstrates an important functional association between genome variations and replication-timing reorganization resulting from reprogramming. We have focused on CNVs and replication timing during reprogramming, but in the future, genomic aberrations identified by other techniques (e.g., whole-genome sequencing) as well as epigenomic changes can be similarly mapped to dynamic genome organization landscapes. Understanding the link between genome organization and function will be enhanced by improved biochemical techniques and computational approaches to allow higher throughput and improved resolution (Misteli, 2012;Wang et al., 2012). Because iPSC technology readily allows the manipulation of cell fate change, it will serve as a powerful experimental platform to further study the relationship between genome organization and function.

EXPERIMENTAL PROCEDURES
Generation, Characterization, and Genome-wide Analyses of iPSCs WT fibroblasts hFib2 were reprogrammed with RVs as described by Park et al. (2008). For EP vector-mediated reprogramming, vectors were obtained from Addgene (pCXLE-hUL, pCXLE-hSK, and pCXLE-hOCT3/hOCT4) and introduced into BJ fibroblasts by electroporation, and reprogrammed colonies were picked and expanded. Immunofluorescent staining of pluripotency markers Oct4, Nanog, SSEA-4, Tra-1-60, and Tra-1-81 and teratoma formation assays demonstrated the pluripotency of all the established iPSC lines. Genomewide expression profiling with Affymetrix Human Genome U133 Plus 2.0 microarrays and CNV analysis with Affymetrix SNP 6.0 arrays were performed according to manufacturer's instructions at the Genotyping and Microarray Center of Coriell Institute for Medical Research. Replication-timing profiling was performed and analyzed as in Ryba et al. (2011).

Computational Analyses
To statistically analyze if CNVs fall within replication-timing domains ( Figure 1B), we performed permutation analysis. For each simulation, we generated a new list of CNVs by assuming uniform distribution of CNV start positions along chromosomes (omitting centromeric regions) while keeping their size distribution. For each new list, we mapped to the replication domain profile and obtained the percentage of new CNVs falling within domains. We repeated the permutation 1,000 times. The permutation p value is defined as the number of permutations yielding a higher percentage than the actual experimental result. In Figures  2 and 3, the Kolmogorov-Smirnov (K-S) test was used to detect differences between distributions. Binomial tests were used to assess the statistical significance of CNV enrichment in the genomic regions. The null hypothesis is random distribution, i.e., no CNV gain or loss enrichment in the regions of interest.

ACCESSION NUMBERS
The GEO accession number for the replication-timing microarray, SNP array, and gene expression array reported in this paper is GSE46848.

SUPPLEMENTAL INFORMATION
Supplemental Information includes Supplemental Experimental Procedures, three figures, and three tables and can be found with this article online at http://dx.doi.org/10.1016/j.celrep.2014.03.007.

ACKNOWLEDGMENTS
We thank R. Zhao, S. Agarwal, P. Cahan, and S. Sunyaev for critical reading and discussion of the manuscript and T. Ryba for computational help.  Individual cells in the starting population have a unique set of preexisting background CNVs (CNV gains are indicated with a circled +; CNV losses are indicated with a circled À), which can be captured by the clonal nature of reprogramming. IPSC-manifested mutations may also lead to de novo CNVs that occur during the reprogramming process (noncircled +, À). Preexisting and de novo mutations may contribute to reprogramming fitness. The extensive genomic and epigenomic changes associated with reprogramming can in part be reflected by changes in replication timing (gray indicates no change during reprogramming, red shows earlier in iPSCs, and blue indicates later in iPSCs). Early and late replicating domains in iPSCs are differentially affected by copy-number gains and losses, during reprogramming. Specifically, CNV gains accumulate in regions of the genome that change to early replication during the reprogramming process. In contrast, CNV losses tend to be late replicating in both fibroblasts and iPSCs. This specific CNV distribution, like individual mutations, contributes to reprogramming fitness.