From the chromatin interaction network to the organization of the human genome into replication N/U-domains

The three-dimensional (3D) architecture of the mammalian nucleus is now being unraveled thanks to the recent development of chromatin conformation capture (3C) technologies. Here we report the results of a combined multiscale analysis of genome-wide mean replication timing and chromatin conformation data that reveal some intimate relationships between chromatin folding and human DNA replication. We previously described megabase replication N/U-domains as mammalian multiorigin replication units, and showed that their borders are ‘master’ replication initiation zones that likely initiate cascades of origin firing responsible for the stereotypic replication of these domains. Here, we demonstrate that replication N/U-domains correspond to the structural domains of self-interacting chromatin, and that their borders act as insulating regions both in high-throughput 3C (Hi-C) data and high-resolution 3C (4C) experiments. Further analyses of Hi-C data using a graph-theoretical approach reveal that N/U-domain borders are long-distance, interconnected hubs of the chromatin interaction network. Overall, these results and the observation that a well-defined ordering of chromatin states exists from N/U-domain borders to centers suggest that ‘master’ replication initiation zones are at the heart of a high-order, epigenetically controlled 3D organization of the human genome.

As recently reviewed in [31], we described the megabase-sized replication 'U/N-domains' as multiorigin units of DNA replication of the mammalian genome [32][33][34][35][36][37][38]. When using a waveletbased multiscale analysis of MRT data [39,40] for seven human cell types, about half of the genome could be segmented in megabase-sized U-shaped MRT domains [37,[41][42][43], as illustrated in figures 1(a) and (c). Significant overlap was observed between the MRT U-domains of different cell types and also between U-domains and the germline replication domains previously defined as exhibiting an N-shaped nucleotide compositional skew profile [32][33][34][35][36]44]. From the demonstration that the average fork polarity is directly reflected by both the compositional skew and the derivative of the MRT profile [37,38,45,46], we argued that the Nshape of the latter in MRT U-domains sustains the existence of megabase-sized gradients of  [40] from early 0 to late 1 along a 12-Mb-long fragment of human chromosome 1 in the GM06990 cell line. The horizontal colored bars correspond to four N/U-domains, identified in this fragment as MRT U-domains [37,42] (red: 200-kb borders, blue: 400-kb center, green: interior). (b) Intrachromosome GM06990/NcoI Hi-C contact matrix [15] along the same fragment of human chromosome 1. Each pixel represents all interactions between pairs of 100-kb loci; intensity corresponding to the total number of reads is color-coded according to the color map. The dashed squares correspond to the four replication N/U-domains. (c) Average MRT profiles (±SEM) inside GM06990 MRT U-domains (blue); positions within each U-domain were rescaled between 0 and 1 so that the left (0) and right (1) borders are aligned. Numerical MRT (○, averaged over 6000 simulations) of the replication of a 1-Mb domain, obtained with the cascade model of origin firing [31,48].
profiles (±SEM) (blue) and mean replication fork polarity from simulations (○). replication fork polarity in somatic and germline human cells (figure 1(d)) [37,38,42]. Recent sequencing of Okazaki fragments provides experimental confirmation of this linear, N-shaped change of fork polarity across MRT U-domains [47]. We collectively refer to these replication domains as N/U-domains [31]. We further proposed a model for the spatiotemporal program of replication along U-domains where, in addition to a small rate of random origin firing, replication preferentially initiates from the 'master' initiation zones at the N/U-domain borders, followed by a stochastic cascade of secondary initiations propagating to the domain center, possibly by forkstimulated origin firing [31,38]. Numerical simulations of a model including these features show that these ingredients sufficiently explain U-shaped MRT domains that present a linear gradient of fork polarity (figures 1(c) and (d)) [31,48]. When investigating the large-scale organization of human genes inside these replication domains, a remarkable organization was revealed [34,36]. Specifically, highly expressed genes in a given cell type are over-represented close to the corresponding N/U-domain borders [49]. Mapping experimental and numerical chromatin mark data in these domains shows that the 'master' replication initiation zones at the N/U-domain borders are specified by a region (∼200 kb) of open and transcriptionally active chromatin [37,50]. The additional comparative analysis of N/U-domains and Hi-C data has revealed that these functional domains are intimately linked to genome 3D architecture. In our previous work, we mainly reported results in the human erythroid cell line K562, using Hi-C data obtained via the HindIII restriction enzyme [37]. To assess the robustness of this link between functional domains and genome 3D architecture, we reproduce here these genome-wide analyses in human lymphoblastoid cell lines using a replication timing profile in GM06990 [40], Hi-C data obtained with the NcoI restriction enzyme (GM06990/NcoI) [15], and chromatin marks data in the related GM12878 cell line, gathered by the ENCODE project [51].

Replication N/U-domains and the 3D organization of a chromosome into structural domains
Replication occurs at discrete foci in the nucleus according to a specific spatiotemporal organization [52,53]. Throughout the S phase, new replication foci de novo appear immediately next to the previously active foci [54][55][56][57], and a substantial spatial segregation between early-and late-replicated loci has been observed [58][59][60][61]. In addition, from the observation that thousands of replication forks are only found in hundreds of discrete BrdUlabeled foci, it has been proposed that DNA sequences replicating at the same time could gather [2,5,58,62,63], possibly through the anchoring of chromatin loops on the nuclear skeleton (reviewed in [41,64,65]). Moreover, other experiments showed that the spacing of anchoring sites on the nuclear skeleton increased with the replication fork velocity, and that the efficiency of an origin in a given S phase depended on its proximity to an anchorage point in the previous G1 phase [66]. Hence, there is a clear relationship between DNA replication and the pattern of the 3D chromatin interactions in the nucleus. The organization of the human spatiotemporal replication program into MRT U-domains [37] provided us with the opportunity to test this idea more systematically and to address the extent to which a tertiary chromatin-structure counterpart to these functional replication domains exists.
The original structural partitioning of the chromosomes derived from intrachromosomal Hi-C interactions (using a principal component analysis of the Hi-C data over each chromosome) and the normalization performed on the Hi-C data [15] gave as much weight to the very-distant interactions (over tens of megabases) as it gave to the more local contacts (below a few megabases), despite the fact that the latter are the most frequent contacts. Complementarily, we focused [37] on interactions between loci separated by short genomic distances (≲10 Mb), over which the contact probabilities are the highest [15]. As illustrated in figure 1(b), it appears that replication N/U-domains correspond to matrix square-blocks of enriched interactions. This clearly suggests that the segmentation of the genome into replication N/U-domains coincides, to some extent, with the segmentation into structural domains identified by the analysis of more recent chromatin interaction data [22,23]. Consistent with previous results, the matrix square-block structure underlines that the two early replicating zones that border a N/U-domain have high contact probability as the signature of 3D spatial proximity. However, the matrix square-block structure also signifies high contact probability of the two early replicating borders with the late replicating N/U-domain center, and sparse interactions between loci in separate N/U-domains. In other words, locally we do not observe a structural segregation between early-and late-replicating loci, and N/U-domain borders appear to correspond to structural insulating barriers. Further examination of the average behavior of intrachromosomal contact probability as a function of genomic distance for the complete genome corroborates these observations: the mean number of interactions between two 100-kb loci of the same N/U-domain decays when their genomic distance increases, as observed genome-wide (figure 2(a)). Importantly, the mean number of pairwise interactions is significantly higher inside the N/U-domains than genome-wide, and this seems to depend on the N/U-domain length. In particular, we found that the smaller the domain, the higher the mean number of interactions, which is probably a signature of a more open chromatin structure. When comparing the contact probability between loci inside an N/U-domain and loci lying in neighboring N/U-domains (figure 2(b)), we observed that their ratio increases with the distance separating the loci. Ratio values lower than one at short distances may be related to an excess of interactions in the open chromatin region at N/U-domain borders [37,50]. Ratio values significantly greater than one for long distances corroborate the correspondence between structural barriers and N/U-domain borders. The latter observation is strengthened by the fact that N/U-domain borders are significantly enriched in the insulator-binding protein CTCF (figure 2(c)) [37], which is known to be involved in chromatin-loop formation, conditioning communication between transcriptional regulatory elements [67][68][69][70].
The Hi-C data do not always allow the direct visualization of the chromatin contacts for chosen loci, principally because of insufficient sequencing depth to cover all the combinatorial possibilities. This led us to use 4C methodology to explore the structural interaction partners of 10 viewpoints distributed along a large region (20 Mb) of human chromosome 5 [25]. The MRT profile of the region of interest presents large fluctuations with well-defined U-domains that are very well conserved across the seven cell lines we analyzed [37] (data not shown). In figure 3, we present 4C profiles for a viewpoint located in the middle of an N/U-domain and a viewpoint located on a well-defined timing peak at the border between two successive N/U-domains that are representative of the results for the 10 viewpoints analyzed [25]. Interaction frequency profiles are maximum at the viewpoint location, and quickly decrease to a few percents of the maximum value over a few megabases distance from the considered viewpoint. Importantly, this decrease does not follow a simple pattern. For the viewpoint located in the center of an N/ U-domain, we observed that 4C interactions are mainly concentrated between the two timing peaks at the N/U-domain borders (figure 3, top). This very nicely confirms that contacts between late-replicating regions in N/U-domain central regions are somehow limited inside the replication N/U-domains, and thus that there are self-interacting chromosomal domains bordered by timing peaks. Interestingly, beyond this local regime, late-replicated loci preferentially interact with other late-replicating regions (data not shown) [25], as suggested by the previous Hi-C data analysis. In contrast, for the viewpoint located at a timing peak, we observed peaks in the interaction frequency profile that colocalize with nearly every neighboring timing peak (figure 3, bottom), even with those located 10 Mb away. At larger distances, profiles of long-range interactions from timing peak viewpoints faithfully reproduce the replication timing profile (data not shown), which demonstrates the astonishing persistence, all over the chromosome, of a timing-linked long-range conformation of chromatin [25]. Each line of the Hi-C interaction matrix is equivalent to the 4C profile for the viewpoint situated on the diagonal, allowing us to computationally reproduce these analyses for all N/U-domains. We computed the average 4C-like profiles for viewpoints located at N/U-domain borders (see caption of figure 2(d)). We observed that both ends of the N/U-domains present preferential interactions (figure 2(d)); on average, there exists 1.5 (resp. 2) times more interactions between the two GM06990 (resp. K562) N/U-domain borders than expected, given their genomic distance ( figure 2(d)). These results strongly suggest that N/U-domain borders loop out the intervening late-replicating region to contact each other. Note that all the results reported in figure 3 are robustly observed in cycling and G0-blocked cells, indicating that the observed organization likely reflects some permanent elements of the chromosomeʼs 3D structure.

'Master' replication initiation zones are long-range interconnected hubs in human chromatin interaction networks
These initial results prompted us to go further into the characterization of replication domains as fundamental units of the chromatin tertiary structure. We used graph theory to objectively quantify the importance of the 'master' replication initiation zones at N/U-domain borders in the genome-wide Hi-C chromatin network [71].

Graph theory tools to analyze interaction network complexity
Graphs [72] have become extremely useful as a representation of a wide variety of complex systems in social sciences, biology, computer sciences, and engineering [73][74][75][76]. An undirected graph consists of a set, V, of n vertices, v i , and a set, ⊆ × E V V, of m edges, e ij , with associated weights, > w 0 ij . Note that unweighted graphs correspond to = w 1 ij . It is often useful to consider a matrix representation of a graph. The adjacency matrix, , is an × n n square matrix whose entry, a ij = i j n ( , 1 ,.., ), is equal to the weight, w ij , and is zero otherwise. To identify and quantify the vertices that occupy critical positions in a network, centrality measures have been proposed, including the degree, betweenness, and eigenvector centralities [73,[76][77][78][79]. They can all be derived from the adjacency matrix. An obvious order of the vertices of a graph can be established by sorting them according to their degree, d. The corresponding degree centrality, C d , is defined as [80]: It is a local centrality measure, since it takes into account only the local structure (vertex neighborhood) of the graph. Betweenness centrality, C b , measures the extent to which a vertex lies between other vertices on their geodesic paths [80]: where σ jk is the number of shortest paths* connecting v j and v k , while σ v ( ) jk i is the number of shortest paths connecting v j and v k and passing through v i . A vertex with high betweenness centrality can potentially influence the spread of information through the graph by facilitating, hindering, or even altering the communication between other vertices. To distinguish vertices that are linked to well-connected vertices, and so influence many others in the graph either directly or indirectly through their connections, Bonacich [81] has proposed the following definition of the spectral centrality: Interaction maps are positively defined and symmetric, and so are prone to being represented and analyzed using graph theory [82]. A natural method involves the use of interaction matrices as the adjacency matrix, , of a graph, where the vertices, v i , are non overlapping DNA loci and the edges, e v v ( , ) i j , link interacting loci. Two somehow complementary graph descriptions can be used: a weighted network where the weights assigned to each edge are the number of corresponding binary interactions, and a non weighted network where the entry, a ij , of  is 1 if v i and v j are interacting, and 0 if they are not interacting. Because the number of interactions decreases quickly when increasing the seperation, s, between the loci (like − s 1 ) [15,82], the weighted network focuses on interactions between loci seperated by short genomic distances (⩽10 Mb) over which contact probabilities are the highest. Alternatively, the non weighted network equally takes into account short-range (⩽10 Mb) and very-long-range interactions within a chromosome (⩾40 Mb). In the latter case, we removed from the data all binary interactions that were present only once (t = 1) or twice (t = 2), as some of these data may well be attributed to experimental noise (t = 0 corresponds to no thresholding).

Application of graph theory to Hi-C data
We applied the above methodology to the GM06990/NcoI Hi-C interaction matrix at the 100kb resolution. For both the unweighted and weighted graph descriptions, replication domain borders are local maxima of the three centralities, confirming that they correspond to critical vertices in the Hi-C interaction network (figures 4(a)-(c)). The degree centrality, C d (equation (1)), for the unweighted graph decreases almost linearly when moving inside the replication domains ( figure 4(a)). As shown in figures 4(d) and (e), where the numbers of incident vertices to replication domain borders and centers are compared for different contact thresholds t = 0, 1, and 2, this decrease of C d mainly results from the fact that borders have more long-range connections than centers (up to two-fold for n = 2). When considering the weighted graph description, the relative decrease of C d from border to center is much more pronounced than before (figure 4(a)), and this occurs regardless of the intragenomic length of the incident edges (> or ⩽4 Mb). This is the demonstration that replication domain borders have a much higher intrachromosomal contact frequency with other short-distance or long-distance loci than domain centers [37]. In figure 4(b), the betweenness centrality, C b (equation (2)), is also shown to decrease faster for the weighted representation than the unweighted one. This quantifies the role of these replication domain borders as 'hubs' in the chromatin interaction network. These hubs are not only important actors in mediating long-range interactions (>4 Mb), they also play a fundamental role at short distances (⩽4 Mb), where the very sharp decrease of C b observed in figure 4(b) enlightens the insulator properties of the replication domain borders that prevent cross-talk between neighboring domains, likely establishing self-interacting, independent expression domains [37]. In figure 4(c), the eigenvector centrality, λ C (equation (3)), also decreases from replication domain borders to centers. The comparison of the relative decay obtained with the weighted and unweighted graph representations further strengthens the fundamental organizing role of the replication domain borders, which are hubs that predominantly interact at short distances (⩽4 Mb) with nearby hubs as a possible signature of 3D spatial proximity. The results reported in figure 4(c) also show that these hubs have preferential long-range interactions with other very distal hubs. To visualize these interaction graph properties, we present in figure 5 the results of a simple, two-dimensional (2D), particle-like model where genomic loci both attract each other like springs based on their observed interactions, and repel each other like magnets to avoid over-aggregation [71]. Genomic loci are arranged in a worm-like fashion following their ordering along the chromosome due to the quick decay in contact frequency as a function of genomic distance [15]. Replication domain borders are found closer to each other in the graph layout obtained from this particle model, unlike centers that mainly interact at short distances and are found looping out at the periphery  (1)), 〈 〉 C b (equation (2)) and 〈 〉 λ C (equation (3)) are represented relative to their maximal value. (d) Probability of interaction between 100-kb loci at replication domain borders (resp. centers) with other 100-kb loci on the same chromosome versus their intragenomic distance; interactions with the two 100-kb loci at the each N/U-domain border (red) and the four 100-kb loci at the domain centers (blue) are considered. (e) Ratio between the probabilities of interaction of replication domain borders and centers (shown in (d)) versus the intragenomic interaction distances for different contact thresholds t = 0 (◯), 1(□), and 2 (△). To better discriminate centers from borders, only replication domains longer than 2 Mb were taken into consideration in (d) and (e). of this stationary, worm-like final configuration, which illustrates the central role of the interaction network hubs in the specification of the 3D architecture of the human chromosome. Recent analyses of the network of interaction between human organ systems showed that there are different network topologies corresponding to different physiological states, and that pairs of systems can communicate and synchronize their function through simultaneously coexisting forms of coupling [83][84][85]. It will be interesting to see if the concepts introduced in network physiology apply when considering the chromatin-mediated coregulation of nuclear functions.

Organization of chromatin states along replication N/U-domains
There are tens of epigenetic marks that can be either present or absent on a chromatin locus, suggesting that the number of distinct chromatin states is huge. Several independent analyses [19,[87][88][89] have shown that most epigenetic mark combinations in human, Drosophila melanogaster, and Arabidopsis thaliana are not (or are only minimally) observed, so that the number of distinct epigenetic states is in fact very limited (⩽15). In order to go beyond our previous analysis of chromatin mark distribution in replication domains that consisted of computing the average profile of their presence/absence as a function of the distance to the domain borders [37,50], we recently followed these pioneering approaches to more precisely characterize the link between the mean replication timing and the epigenetic status. Using principal component analysis [90] and classical clustering [91] on a set of 13 epigenetic marks at a spatial resolution of 100 kb (corresponding to MRT profile resolution), we reduced the apparent complexity of the data set to four major groups of chromatin marks with shared features [92]. The small number of states found, compared to [88] who described 15 states in human, is likely due to the much larger resolution used in this study. These states have specific replication timing, an early transcriptionally active euchromatin state (C1), a mid-early  figure 1, when using Gephi software [86] to represent the chromosome 1 intrachromosomal interactions obtained using GM06990/NcoI Hi-C data at 100-kb resolution. Gephi force-directed algorithm called Force Atlas 2 amounts to simulate a physical system of particles (vertices) distributed in a plane (2D); vertices repulse each other like magnets, while edges attract the vertices they connect like springs. These forces create a dynamics that converges to a balanced final state, which is expected to highlight the connectivity of the edges. Vertices are colored according to their position relative to N/U-domains: 200-kb borders (red), 400-kb center (blue), interior (green), exterior (black) (see figure 1(a)). repressive type of chromatin (C2) associated with polycomb complexes, and two late replicating states: a silent state (C3) not enriched in any available marks, and a gene-poor HP1associated heterochromatin state (C4). When mapping these chromatin states inside the megabase-sized N/U-domains, it appears that the associated replication-fork polarity gradient [37,38] corresponds to a directional path across three chromatin states, from C1 at the N/Udomain borders followed by C2 and C3 at the centers ( figure 6). For the lymphoblastoid data analyzed here, state C4 is rather evenly distributed along the N/U-domains, which contrasts with the situation in the K562 cell line, where C4 is mainly localized in the central regions of the largest domains (≳2 Mb) [92]. Thus, in K562, the gradient of chromatin states involves the four states from C1 to C4. Interestingly, analysis of the half of the human genome not covered by N/U-domains is consistent with early and late replication loci occurring in separate compartments; the former correspond to gene-rich, high-GC domains of intermingled chromatin states C1 and C2, whereas the latter correspond to gene-poor, low-GC domains of alternating chromatin states C3 and C4, or long C4 domains. The relationship between the distribution of chromatin states, the spatiotemporal replication program, and the 3D genome architecture could be the key to understanding the chromatin complex organization [93] and its role in regulating nuclear functions [94,95].

Conclusion
To summarize, the organization of the human genome into replication domains covering about half of the genome provides us with an original point of view to unify the interplay between replication, transcription, epigenetic modifications, and nuclear organization. 'Master' replication initiation zones at N/U-domain borders are major actors in the spatiotemporal replication program [31,37,38,50], acting as insulating regions between chromatin structural domains of independent expression and duplication [25,37,71]. These regions of open and transcriptionally active chromatin also play a central role in the genomeʼs 3D architecture by mediating long-range interactions among distant DNA elements both within and in between chromosomes. This property is consistent with the observation that N/U-domain borders are significantly enriched in the insulator-binding protein CTCF (figure 2(c)), which may contribute to the formation of chromosomal hubs of interactions across chromosomes [82]. Analysis of the spatial proximity of evolutionary breakpoints between human and mouse suggests that some aspects of genome 3D architecture are conserved across very large evolutionary distances [96,97]. This suggests that replication/structural domains could be major determinants of genome evolution, which is consistent with the observation that some specific replication timing patterns are conserved between human and mouse syntenic regions of related cell types despite the length of evolutionary divergence [30], and with our previous finding that replication N/Udomain borders are significantly enriched in mammalian evolutionary breakpoints, which suggests that evolution typically shuffled these structural and functional domains rather than breaking and fusing them [96]. Replication/structural domains are also likely central to genome regulation on shorter time scales. Comparative analyses of replication timing profiles during cell differentiation in mammals underline that there are important dynamical changes of replication timing profiles during development, leading to cell-type specific patterns of replication [30,95,98,99]. Meta-analyses [100,101] of replication timing profiles [40], Hi-C data [15], and somatic copy-number alterations (SCNAs) observed in cancer samples from diverse cancer types [102] showed that SCNAs tend to fuse genomic regions that, prior to the rearrangement, spatially co-localized within the nucleus and have similar replication timing. Hence, replication N/U-domains likely correspond to fundamental structural and functional units underlying the plasticity of replication and structural domain organization, gene expression, and chromatin states both across different cell lines and across different organisms. They provide a framework for further studies in different cell types, in both health and disease.