Histone Code and Higher-Order Chromatin Folding: A Hypothesis

Histone modifications alone or in combination are thought to modulate chromatin structure and function; a concept termed histone code. By combining evidence from several studies, we investigated if the histone code can play a role in higher-order folding of chromatin. Firstly using genomic data, we analyzed associations between histone modifications at the nucleosome level. We could dissect the composition of individual nucleosomes into five predicted clusters of histone modifications. Secondly, by assembling the raw reads of histone modifications at various length scales, we noticed that the histone mark relationships that exist at nucleosome level tend to be maintained at the higher orders of chromatin folding. Recently, a high-resolution imaging study showed that histone marks belonging to three of the five predicted clusters show structurally distinct and anti-correlated chromatin domains at the level of chromosomes. This made us think that the histone code can have a significant impact in the overall compaction of DNA: at the level of nucleosomes, at the level of genes, and finally at the level of chromosomes. As a result, in this article, we put forward a theory where the histone code drives not only the functionality but also the higher-order folding and compaction of chromatin.


INTRODUCTION
Chromatin is a complex polymer of DNA, histone proteins and other nonstructural proteins [1][2][3][4]. Modifications to DNA and histone proteins can affect both the structure and function This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/ licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. of chromatin [5,6]. For instance, posttranslational modifications on the amino acid tail of histone proteins can determine DNA compaction level and in turn, affect the binding of transcription factors to DNA [7][8][9]. Many of these modifications can occur in a concerted manner to modulate gene expression. A classic example is phosphorylation of histone 3 serine 10 (H3S10p) which has opposing roles during interphase and mitosis [10]. During interphase, phosphorylation at H3S10 is linked to acetylation at H3K9 and H3K14, relaxing chromatin and increasing the accessibility of various transcription factors [11,12]. In contrast during mitosis, phosphorylation at H3S10 is linked to enhanced methylation at H3K9 to condense chromatin [13,14]. Overall, this crosstalk can affect signalling pathways to modulate gene expression but also chromatin accessibility. As a result, the combination of histone marks that can potentially regulate chromatin and gene expression is termed as "histone code" [15,16].
In this article, we hypothesize that higher order chromatin folding can be built up from simple mathematical rules and that the histone code can provide a robust mechanism for this folding. To study the histone code, we defined a framework to correlate 39 histone marks at individual nucleosome positions. Our result shows that histone modifications can be easily clustered into five different groups [17]. One of the clusters is associated with promoters, while others are related to gene bodies or enriched at repeat regions. This result suggests that the histone code is not only present at nucleosome level but also at the level of genes and regulatory regions. We then relate these findings to a previous study of ours where pachytene chromosomes showed distinct compartmentalization of histone modifications at the level of chromosomes [18,19]. In light of these observation, we propose that the histone code might play a role at various levels of chromatin folding (nucleosome, gene, and chromosome), hinting at a mechanistic folding of DNA in a hierarchical manner.

HISTONE CODE AT NUCLEOSOME LEVEL
To study the associations between histone marks at the nucleosome level, we designed a strategy to assign histone modifications to a given nucleosome. Firstly, we map the ChIP-seq reads for different histone modifications to version hg18 of the human genome using Bowtie [20]. To obtain the best alignment, we set the seed length equal to read length (36 bps), with only two mismatches allowed. Next, we combine the mapped reads of the different histone modifications into one file and predict nucleosome positions using the Nucleosome Positioning from Sequencing (NPS) method [21]. In NPS, each read is extended to 150 nucleotides (nt) and the central 75 nt are taken to get a better estimate of the signal. The peaks which have significantly different distribution of reads on the two strands are removed because the two strands of the DNA should be sequenced evenly. Furthermore, the peaks whose width is less than 80 nt or greater than 250 nt are removed. Next, all the histone modifications are assigned to the predicted nucleosome positions. We count the total number of reads for a particular modification in the nucleosomal peak region and compare it with the expected number of reads for that chromosome using a Poisson distribution. We assign a histone modification to a nucleosome if the p-value for the read count is less than 10 −3 . This computation results in a binary matrix of 0's and 1's. Finally, we compute the correlation coefficients between marks using this binary matrix to describe the associations between the histone modifications. Corrplot and hclust (for hierarchical clustering) functions (R package) are used to plot the correlation matrix and group the histone modifications into distinct clusters.

MAINTAINED AT INCREASING ORDERS OF CHROMATIN FOLDING
We applied our framework to histone modifications data from [22][23][24][25], which is still to date the largest collection of histone marks available for a single cell line (CD4 T cells). Our method generated correlation coefficients between histone modifications depicted in Figure  1 using a coloring scheme. The hierarchical clustering identified five clusters of histone modifications. The largest group was found to be associated with active gene expression, comprising acetylation marks and H3K4me3 (Figure 1, top left cluster indicated by a square black box). Another group was associated with inactive chromatin mark (H3K9me3), a mark of constitutive heterochromatin ( Figure 1, the fourth cluster from the left). Lastly, we identified a group representative of repressive chromatin (H3K27me3) (Figure 1, the fifth cluster from the left). Using simple magnification of genome browser tracks, we noticed that the histone code which exists at nucleosome level also tends to holds true at the higher order of chromatin folding ( Figure 2). In the hypothesis that the histone code does not exist at higher levels of folding, no clustering of histone marks should be visible at higher magnifications, which is not the case here. For instance, the first cluster on the left is observed at promoters and exons while the second group is observed at the gene bodies. This result implies that histone marks associated at the nucleosome level may also pack together at specific sites in the genome at different length scales.

ORDER OF CHROMATIN ORGANIZATION
Recently, compartmentalization of active and inactive chromatin marks has been shown in interphase chromosomes [26][27][28][29]. However in these studies, a relative spatial positioning of chromatin with respect to a fixed reference is missing. Pachytene chromosomes provide an ideal platform to explore basics of DNA folding, as they are very well defined in 3D space, with a highly reproducible structure. Recently, we published a description of the epigenetic landscape of pachytene chromosomes [18], where we were able to study the relative spatial organization of three histone marks around the central axis of pachytene chromosomes (Figure 3). These histone marks also represent three different clusters in Figure 1. We found these marks to display very different patterns, in terms of size, periodicity and position on the chromosomes (Figure 3). H3K9me3 associated with centromeric chromatin forms helical spreads of length around 500 nm and is located at the end of the chromosome, very close to the central axis (in mouse chromosomes centromeres localize at one of the telomeres [30]). H3K4me3 associated with active chromatin forms 30-60 nm clusters and is located on lateral extensions of the chromosome, probably to facilitate transcription [31][32][33]. Finally, H3K27me3, a histone mark linked to repressed gene expression [34,35], shows a remarkable axial periodic and symmetric localization along the chromosomes, hinting for a possible implication in the recombination process [36,37]. H3K27me3 forms approximately 100-150 nm clusters, which is in between H3K4me3 and H3K9me3 cluster sizes. Overall, this distinct compartmentalization of histone marks hints that histone code might have a structural impact on the overall shape of the chromosomes at the highest order of organization. We summarize our concept of histone code at several levels of hierarchy in a final model (Figure 4).

DISCUSSION
Describing the histone code i.e. the functional associations among histone modifications, at different levels of chromatin folding is of particular importance to understand how chromatin structure affects its function. At low order, the histone code can predict if a gene is going to be transcribed or not. At the highest order of chromatin folding, i.e. at the scale of the nucleus, the histone code predicts the general condensation level of chromatin. For instance, H3K9me3 forms large and highly condensed clusters which are located towards the inner side of chromosomes while H3K4me3 forms small decondensed clusters at the periphery of a chromatin domain, most likely to facilitate transcription. Overall, these higher order structures may be transcription units or topologically associated domains (TADs), and may affect the functionality of DNA. For instance, during interphase, regions of active transcription can be brought together in order for transcription to happen [38], while during meiosis regions epigenetically repressed by H3K27me3 are close to the central axis where crossing-overs take place, possibly to silence transcription or prevent repeated regions from being recombined [39].
Nonetheless, histone code is a complex concept and may not hold true all the time [40,41]. Histone modifications from different clusters can be associated at times, for instance, H3K27me3 and H3K4me3 at poised genes and validating the model will require co-staining of several histone modifications. Furthermore, effort needs to be put to investigate the impact of the histone code on higher-order chromatin folding, using recent genome-wide HiC data [42,43], and validate these preliminary results.
What we don't know yet is how the three orders, nucleosome, gene and chromosome levels, interact and influence each other. Chromatin looping and folding into particular shapes such as the so-called fractal globule are currently the most advanced explanations on how chromatin can be organised inside the cell nucleus [44][45][46]. Nonetheless, how a 10-30 nm chromatin fibre transits to commonly observed X-shaped metaphase chromosomes is still an open question.
From our recent studies, we could observe 60 nm clusters of active chromatin domains (H3K4me3), 120 nm of repressed chromatin domains (H3K27me3) and 500 nm of inactive chromatin domains (H3K9me3). From other studies, we know about the existence of 10 nm nucleosomes and 30-100 nm chromatin fibers [47][48][49]. These 10, 30, 60, 120, 250 nm chromatin domain patterns are highly reminiscent of the 2 n power law (2,4,8,16,32,64,128,256,512,1024). Using these different measures of clusters as a hint, we hypothesize that chromatin folding could follow a power law of two (at this point, we do not have significant data to support this hypothesis), which is slightly different from the traditional fractal globule model [46]. We propose that the active/inactive state of chromatin is determined by a hierarchical folding design (Figure 4). For instance, the active state of chromatin may only be folded up to the 6th power of 2 i.e. 64 nm but not beyond; increasing levels of compaction will lead to transcriptional repression. This status will be maintained by the modifications from cluster 1 of our histone code table (Figure 1, top left cluster). As a consequence, each of the different levels of chromatin folding, the 10 nm nucleosome, the 30 nm chromatin fiber and the 60 nm nucleosomal domains of H3K4me3 will belong to the active chromatin state. The multiple acetylations and methylations will provide proofreading mechanisms to ensure that this status is maintained. Differently, repressed chromatin will require an additional level of folding, requiring the recruitment of methyltransferases to methylate several sites such as lysine 9 and lysine 27 on histone 3. In this view, a combination of histone modifications is necessary to make sure that no gene is repressed accidentally. For additional levels of folding, a coordinated trimethylation at more specific lysine residues such as lysine 9, lysine 79 on histone 3 and lysine 20 on histone 4 will be required to ensure that the inactive state of chromatin is maintained even after differentiation [50]. Having one active mark to induce or inhibit the transcription of a gene will not be robust. A synergy of modifications or the histone code working in a concerted manner will provide the proper proofreading mechanism to ensure a proper on/off switching of a gene or hierarchical folding of chromatin. Five clusters of histone modifications were identified based on 186,492 predicted nucleosome positions using hierarchical clustering. Coefficient of correlation of histone marks at the nucleosome level are presented by a coloring scheme from red (negative correlation) to blue (positive correlation) while white represents an absence of correlation. Black boxes highlight the identified clusters.