Polycomb silencing: from linear chromatin domains to 3D chromosome folding

Polycomb group (PcG) proteins are conserved chromatin factors that regulate key developmental genes. Genome wide studies have shown that PcG proteins and their associated H3K27me3 histone mark cover long genomic domains. PcG proteins and H3K27me3 accumulate in Pc nuclear foci, which are the cellular counterparts of genomic domains silenced by PcG proteins. One explanation for how large genomic domains form nuclear foci may rely on loops occurring between speciﬁc elements located within domains. However, recent improvement of the chromosome conformation capture (3C) technology, which allowed monitoring genome wide contacts depicts a more complex picture in which chromosomes are composed of many topologically associating domains (TADs). Chromatin regions marked with H3K27me3 correspond to one class of TADs and PcG proteins participate in long-range interactions of H3K27me3 TADs, whereas insulator proteins seem to be important for separating TADs and may also participate in the regulation of intra TAD architecture. Recent data converge to suggest that this hierarchical organization of chromosome domains plays an important role in genome function during cell proliferation and differentiation. would occur preferentially between genomic regions containing the same chromatin marks.

In past decades, the question of the 3D genome folding inside the cell nucleus was mainly studied using microscopy approaches. These analyses identified nuclear compartments and chromosome territories and showed that gene positioning is not random inside cell nuclei. The recent development of Chromosome Conformation Capture (3C) technologies greatly improved our perception of chromatin fibre folding. In this review, we will focus on the regulation of chromosome domains and their three-dimensional organization by PcG proteins.
Chromatin regulation by Polycomb group proteins: from genomic domains to nuclear foci Drosophila genome wide studies show that PcG proteins bind to discrete genomic elements including the previously characterized PcG response elements (PREs), namely DNA regions that are necessary and sufficient to recruit PcG proteins and silence flanking genes. Moreover, individual discrete PREs cluster into large genomic domains, named Polycomb domains that are covered with histone H3K27me3, a histone modification exquisitely specific to PcG silencing [1,2]. Although the relevance of discrete PRE has been previously demonstrated in Drosophila [3,4], the functional significance of large genomic domains remains puzzling. In microscopy, PcG proteins and histone H3K27me3 accumulate in discrete Polycomb (PC) foci that have been also named ''Polycomb bodies'' [5,6], although the appropriateness of this denomination has been recently called into question [7]. An important question is whether these PC foci are preformed structures that may store PC proteins or to which PcG target genes must migrate in order to be silenced or, in contrast, whether they self assemble as a result of recruitment of PcG proteins to their target genes ( Figure 1). A dynamic exchange between PcG proteins in the nucleoplasm and those located within PC foci has been shown by using fluorescence recovery after photo-bleaching in Drosophila and mammalian embryonic stem cells [8,9]. Of note, the SAM domain of one PcG protein, Phc2, is important for clustering through head to tail macromolecular polymerization and could favor PcG protein accumulation in discrete nuclear foci [10]. Immuno-FISH experiments demonstrate that PcG-mediated gene silencing occurs within PC foci [11]. For instance, Fab-7, a PRE-containing region controlling the expression of the gene Abd-B, is found within PC foci in the head of Drosophila embryos, where Abd-B is repressed, whereas in the posterior part, where Abd-B is expressed, Fab-7 is located outside PC foci [12 ,13]. Furthermore, the amount of PcG proteins within PC foci correlates with the size of the genomic domains forming them. Large genomic domains such as the Hox complexes form intense PC foci, whereas narrow genomic domains are found in weak PC foci. When genes located in homologous chromosomes pair, the underlying PC foci are more intense than in nuclei where the same genes do not pair [14]. Taken together, these data indicate that PC foci are not structures onto which PcG target genes have to be directed for silencing. Instead, PcG proteins bound to chromatin marked with H3K27me3 form PC foci because their target chromatin fibres fold into small discrete nuclear volume parcels (Figure 1).
To study the folding of the chromatin fibre and explain how large genomic domains covered with histone H3K27me3 can form PC foci in the cell nucleus, 3C technology was used in order to monitor interactions between chromatin segments. PREs located in the Drosophila bithorax complex can contact other PREs of repressed Hox genes. These multiple loops within a genomic domain describe a repressive chromatin hub which is dependent on Polycomb [13]. In addition, the Drosophila gypsy insulator can prohibit contacts between a PRE and a distal promoter. This insulator-dependent chromatin conformation confines H3K27me3 and PcG proteins within a specific domain, suggesting that endogenous insulators may confine chromatin loops within Polycomb domains without affecting adjacent genomic regions [15]. In mammalian embryonic stem cells, the locus GATA-4 has a multi-loop conformation which depends on PcG proteins. Multiple internal longrange contacts rely on silencing because they are completely lost after the differentiation signal inducing GATA-4 expression [16]. Taken together, these works suggest that multiple loops in chromatin regions repressed by PcG proteins might cluster PREs and explain the generation of chromatin structures giving rise to discrete PC foci in microscopy. Nevertheless, one should be cautious about the interpretation of 3C data. Indeed, even if 3C identifies numerous loops between discrete genomic elements such as PREs, promoters, enhancers, insulators [17][18][19], the unknown frequency of these chromatin contacts, the ability to only detect bipartite and not multipartite chromatin interactions and the lack of simultaneous information about the neighboring regions prevent an understanding of the exact 3D folding path of the chromatin fibre.

From loops to topological domains
A modification of the 3C technology by using an unbiased approach to monitor all the contacts made by a genomic bait of interest (4C) has revealed a more complex conformation of PcG-bound chromatin. Two studies using 4C in Drosophila to map contacts established by PcG target loci revealed that most of the contacts made by the bait regions are precisely confined with the genomic region covered by H3K27me3 in which the bait is located. Other interactions are mostly limited to other PcG target loci located on the same chromosome arm. Therefore, baits located in PcG regulated chromatin have interaction profiles similar to the genome wide distribution of these proteins and histone H3K27me3. In contrast, baits located outside PcG genomic regions only establish few contacts with PcG chromatin [12 ,20 ]. This is consistent with previous 4C results indicating spatial separation of active and inactive regions and suggests that the partition of the genome into physical domains, each characterized by high internal chromatin interactions and a lower degree of interactions with chromatin outside of the domain borders is not restricted to PcG chromatin [18,[21][22][23].
This chromatin contact behaviour has been generalized by applying a global approach, called Hi-C, which maps genome wide chromatin interaction frequencies [24]. Recent Hi-C analyses with increased sequencing depth in mammalian and Drosophila genomes identified large chromatin interaction domains (megabase-sized in mammals, about ten fold smaller in fruit flies). Although the mechanisms responsible for the formation of large chromatin domains are not understood, the Hi-C data also revealed that frequent contacts occur throughout the whole chromatin domain and not only resume to loops between discrete genomic elements ( Figure 2). These physical modules, named TADs, have been found to correlate with the epigenetic mark distribution along chromosomes. Two main kinds of TADs could be distinguished with this approach: active chromatin forms relatively short domains with a relatively extended configuration (as indicated by a rapid decrease in contact frequency with increasing genomic distance), whereas silenced chromatin forms larger and more compact domains, where the contact frequency decays more slowly with increasing distance. Strikingly, the boundaries of TADs match quite well the distribution of insulator proteins such as CTCF along the genome [25 ,26 ]. In Drosophila, specific combinations of insulator proteins are enriched at TAD borders. Moreover, active chromatin preferentially locates at borders, whereas silenced chromatin is found in the interior of TADs [27]. Chromatin interaction analysis by another highthroughput 3C variant approach named ChIA-PET identified the CTCF-chromatin interactome in pluripotent mammalian cells. CTCF-mediated interactions also underline the partition of the genome into chromatin domains and reveal extensive contacts between promoters and regulatory elements [28].
One clear determinant of chromatin fibre folding into topological domains is the linear distribution of chromatin marks along the genome, since interaction maps and genomic distribution of chromatin marks give a similar view of a genome segmented in domains [25 ,27,29 , 30 ,31 ]. The function of insulators with regard to genome segmentation and formation of topological domains has recently been addressed in Drosophila. Indeed, insulators mapped at the borders of H3K27me3 domains in Drosophila. Surprisingly, one paper indicated that the knockdown of dCTCF (a major component of insulators) induces a decrease of H3K27me3 throughout H3K27me3 domains and no spread of H3K27me3 outside domain boundaries [32]. Another report showed that insulators restrict the spreading of this histone mark in only few chromatin regions bound by PcG proteins, and no major change in genome expression was observed after knockdown of insulator proteins in cultured cells [33]. Although these knock down data await confirmation by null mutations, they suggest that the inherent composition of chromatin domains may suffice to set up domain boundaries and insulator proteins might consolidate them and increase the precision of boundary positions.
Similarly to the genomic distribution of chromatin marks, TADs are also related to the replication timing of the genome. It was well established that gene-rich, open transcribed chromatin replicates early in S-phase, whereas silent, gene-poor chromatin is replicated late. Noteworthy however, the mammalian replication timing profiles are well correlated to the Hi-C matrices [34,35]. Consistently, there are more inter-chromosomal interactions than expected between regions having similar replication timing [36]. Interestingly, long range chromatin contacts are conserved between cycling and resting cells [35]. In Drosophila, replication timing programs mirror chromatin contact profiles in the BX-C PcG target locus, as well as PcG distribution and gene expression profiles in two cell lines having different BX-C gene expression [37]. This indicates that the relation between chromosome domain architecture and their replication programs is a general feature in animal cells.
Topological domains form dynamic and functional genomic structures 4C technology has been previously used to map the topology of the active and inactive X chromosomes in female mammalian cells, where X chromosome dosage compensation entails inactivation of one of the two female X chromosome. The active X forms multiple long-range interactions whereas the inactive X shows a random organization inside the inactive territory, which is dependent on the Xist non-coding RNA, which spreads from its site of synthesis to the whole chromosome territory in order to maintain silencing of the inactive X [38]. To study in detail the spatial conformation of the mouse X-inactivation centre, the locus which controls the expression of the non-coding Xist RNA and initiates X chromosome inactivation, chromosomal interactions across a 4.5 Mb region containing Xist have been mapped by chromosome conformation capture carbon copy (5C). The improved genomic resolution of this approach allows to precisely identify discrete TADs from 200 kb to 1 Mb. Consistent with genome-wide studies, this region has also been shown to be organized in TADs and, intriguingly, one of the TAD boundaries separates the Xist locus from its flanking regulatory locus TsiX [30 ]. FISH observed with structured illumination microscopy shows that large DNA segments belonging to the same TAD co-localize more than DNA fragments located in adjacent domains, demonstrating that different TADs segregate spatially in the nucleus. Disruption of a boundary causes ectopic chromosomal contacts and long-range transcriptional misregulation, whereas topological domains are largely unaffected in absence of H3K27me3 [30 ]. Moreover, another study showed that the 3D conformation of the X chromosome controls the initial transfer of the Xist RNA to distal X chromosome regions, which are not defined by specific DNA sequences [39]. On the other hand, chromosomal regions escaping X inactivation do not always localize outside the territory covered by Xist and, conversely, silencing can be maintained outside the Polycomb silencing from 1D to 3D Cheutin and Cavalli 33  Xist domain for a subset of the genes on the inactive X [40]. All these data suggest that sequence and gene specific cues cooperate with 3D chromatin organization in order to orchestrate the process of X inactivation.
Dynamic topological domains are also involved in the regulation of Hox gene expression, which controls the patterning of the vertebrate antero-posterior body axis. By probing loops established between the active part of the Hoxd cluster with elements dispersed throughout the nearby gene desert, it was possible to identify novel Hoxd enhancers, which disperse in the gene desert to form a regulatory archipelago that coordinately regulates Hoxd gene expression in digits [41]. The internal structure of Hox gene clusters was further investigated by a high resolution 4C approach. Inactive Hox genes associate into a single topological domain delimited from flanking regions. During activation, Hox genes progressively cluster into another compartment. This structural switch matches the transition in chromatin marks, with the H3K27me3 repressive mark initially covering repressed Hox genes, whereas their transcriptional activation associates with H3K4me3 deposition [29 ]. Further analysis of the HoxD cluster architecture reveals a functional switch between topological domains. During mouse limb development, a first wave of HoxD transcription specifies arm and forearm patterning and a late wave of transcription occurs when digits form. A subset of HoxD genes in the middle of the cluster initially interacts with the telomeric domain and later establishes new contacts with the centromeric domains [31 ]. Another work studying a long intergenic noncoding RNA HOTTIP transcribed from the 5 0 tip of the HoxA locus also highlights the importance of 3D architecture of Hox gene clusters. Chromosomal looping brings the noncoding RNA HOT-TIP into close proximity to its target genes and this chromatin proximity is necessary and sufficient for HOT-TIP-mediated transcriptional activation [42]. Similar to the dynamic change of chromatin marks, tissue-specific regulation of chromatin contacts and TAD identity in time and space seems to play a critical role in co-ordinating gene expression for regulation of cell differentiation during development. Understanding the molecular mechanisms regulating chromosome architecture will thus be crucial in future research in this field.

Long range interactions of H3K27me3 chromatin
A general rule emerges from 3C-based approaches: topological domains associated to open chromatin establish long range contacts with other active domains, whereas repressed chromatin regions tend to cluster together ( Figure 2) [18,22,23]. In particular, this has been well documented for H3K27me3 associated chromatin. For example, FISH studies show that the Drosophila Antp and Abd-B genes, which are separated by 10 Mb and located in the ANT-C and BX-C Hox clusters, co-localize inside a PC focus when repressed, but not when any of them is active. 4C analysis confirms this contact and shows that the BX-C locus can establish several other interactions, mainly with other H3K27me3 genomic domains located on the same chromosome [12 ]. Importantly, long-range interactions have been reported for transgenes containing regulatory regions of the BX-C including PREs associated with insulator activity, such as Fab7 and Mcp [43,44], whereas another PRE devoid of insulators, bxd, does not induce long-range contacts. In keeping with these data, the insulator portion of the Mcp and Fab7 are required and sufficient to establish long-range interactions [45,46], suggesting that PcG proteins may stabilize long-range interactions rather than induce them. On the other side of the coin, contacts can also occur when both target genes are active. Those contacts are functionally regulated because they rely on Trithorax, enhancer specificity and CTCF proteins [12 ,46]. These studies thus confirm the segregation between active open chromatin and repressed compact chromatin, because a high frequency of interactions is never observed between active and silenced genes.
The same theme emerges from several recent studies that analyzed long-range interactions in pluripotent stem cells and found significant co-localization of chromatin regions characterized by high pluripotency factor occupancy in mammals [47 ,48 ,49 ,50 ,51 ,52 ]. Once again, longrange contacts involved either active genes or silent chromatin, where many long-range interactions involve domains enriched in Polycomb/H3K27me3 in embryonic stem cells. Importantly, loss of the protein Polycomb Eed decreases contacts between Polycomb-regulated regions without altering the overall chromosome conformation [52 ]. Long-range interaction of H3K27me3 chromatin domains has also been reported during vernalization in Arabidopsis, when cold induces silencing of the flowering locus c (FLC). Live cell imaging shows that FLC alleles, tagged with the Lac operator system, cluster during cold. These contacts depend on the Polycomb trans-factors establishing the FLC silenced state and FLC-LacO alleles stay clustered after plants returned to warm [53 ], raising the exciting hypothesis that part of the long-term cell memory that characterizes vernalization in plants may involve regulation of nuclear localization of the vernalization genes. Taken together, this evidence demonstrates that, although long-range interactions of chromatin regulated by PcG proteins were firstly shown in Drosophila, this phenomenon is evolutionary conserved and is probably deeply affecting gene regulation processes in animal and plant cells.
TADs: distinct modules of gene regulation?
To summarize, genomes are locally organized in TADs matching genomic regions covered with a specific set of histone marks. Adjacent TADs are well separated from each other and long-range interactions only occur between TADs having the same chromatin signature (Figure 2). With regard to this interpretation, one should keep in mind that, although many long-range interactions have been identified at all scales with 3C based technologies, microscopy approaches show that their frequency is mostly low in cell populations. Recently, single-cell Hi-C technology has allowed the comparison of single-cell measurements and Hi-C results relying on millions of cells. Single-cell Hi-C experiments highlight the cell to cell variability of chromosome structures at larger scale, whereas individual chromosomes maintain domain organization at the megabase scale [54 ]. Hence, at local scale chromosome folding in the cell nucleus seems to rely on TADs which would form in every cell, whereas long-range interactions between them are probabilistic.
One could thus suggest that TADs form chromosomal modules that represent the key units of gene regulation. In this view, cis-regulatory elements belonging to one module would be dependent on one another, whereas separated TADs would have independent regulation. Consistently, integrations of a GFP reporter transgene in mammalian cell lines produced expression levels that correspond to the activity of the domains of insertion, rather than on the gene flanking the insertion point [55]. Similarly, insertion of a transposon-associated sensor at random genomic positions in mice identified long-range chromosomal regulatory activities, forming overlapping domains with tissue-specific expression [56]. Finally, long-range interactions between TADs of similar chromatin types suggests that, despite partial insulation of each TAD, each genomic locus may be affected by many others in its regulation, suggesting that the genome is more than just a linear succession of discrete genomic elements.