A computational approach to map nucleosome positions and alternative chromatin states with base pair resolution

Version of Record

Accepted for publication after peer review and revision.

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

Version of Record published: September 30, 2016 (This version)
Accepted: September 13, 2016
Accepted Manuscript published: September 13, 2016 (Go to version)
Received: April 15, 2016

1. Of interest
A logic-incorporated gene regulatory network deciphers principles in cell fate decisions

Gang Xue, Xiaoyi Zhang ... Zhiyuan Li

Research Article Apr 23, 2024
Further reading

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Understanding chromatin function requires knowing the precise location of nucleosomes. MNase-seq methods have been widely applied to characterize nucleosome organization in vivo, but generally lack the accuracy to determine the precise nucleosome positions. Here we develop a computational approach leveraging digestion variability to determine nucleosome positions at a base-pair resolution from MNase-seq data. We generate a variability template as a simple error model for how MNase digestion affects the mapping of individual nucleosomes. Applied to both yeast and human cells, this analysis reveals that alternatively positioned nucleosomes are prevalent and create significant heterogeneity in a cell population. We show that the periodic occurrences of dinucleotide sequences relative to nucleosome dyads can be directly determined from genome-wide nucleosome positions from MNase-seq. Alternatively positioned nucleosomes near transcription start sites likely represent different states of promoter nucleosomes during transcription initiation. Our method can be applied to map nucleosome positions in diverse organisms at base-pair resolution.

https://doi.org/10.7554/eLife.16970.001

eLife digest

Plants, animals and other eukaryotes wrap their DNA around complexes of proteins called histones to form repeating units known as nucleosomes. The interaction between histones and DNA is strong, and so the DNA region inside a nucleosome has limited access to other proteins, including those that drive the expression of genes.

Moving a nucleosome slightly can change the access to its DNA and significantly impact how the genes in the region are regulated. Nevertheless, determining the position of nucleosomes accurately or testing how nucleosomes are different between individual cells are challenging tasks. Most methods for identifying nucleosomes use an enzyme called micrococcal nuclease (or MNase for short) to break down the DNA that isn’t protected in nucleosomes, followed by high-throughput DNA sequencing to identify the DNA fragments that remain. However, this technique, known as MNase-seq, is limited because it only measures an average location of the nucleosomes across millions of cells.

Now, Zhou, Blocker et al. have developed a new computational approach to identify nucleosome positions more accurately using MNase-seq data obtained from both yeast and human cells. This approach revealed that in more than half of the yeast genome, a given nucleosome is found at slightly different positions in different cells. Nucleosomes positioned near the beginning of a gene mark it open or closed for binding by the cell’s gene expression machinery. Zhou, Blocker et al. suggest that the nucleosomes’ positions influence how gene expression starts via a multi-step process.

Following on from this work, the next step is to use the newly developed method to study how nucleosome positions change when other regulators of gene activity bind and when genes are activated or repressed.

https://doi.org/10.7554/eLife.16970.002

Introduction

The eukaryotic genome is compacted into chromatin (Kornberg, 1974) which is comprised of nucleosomes, each consisting of approximately 147 base pairs (bp) of DNA wound around a histone protein octamer (Kornberg and Lorch, 1999). The helical DNA makes direct contact with the histones every 10 base pairs, with the major groove of DNA alternating between facing towards and away from the histone core (Luger et al., 1997). Shifting the histones relative to the DNA sequence by a few base pairs can change the accessibility of sequence elements to DNA binding proteins if they are located in the linker sequences between nucleosomes, or may switch these elements between facing towards and away from nucleosomes if they are located within nucleosomal DNA (Jiang and Pugh, 2009b; Segal and Widom, 2009b; Zhang and Pugh, 2011).

The location of nucleosomes with respect to DNA sequences influences many biological processes. Nucleosomes restrict the accessibility of DNA sequences to protein factors, such as transcriptional regulators and the transcription machinery (John et al., 2011; Li et al., 2007; Liu et al., 2006; Zhou and O'Shea, 2011). The positions and occupancy of nucleosomes can influence the interplay between transcription factors (Mirny, 2010) and the level (Carey et al., 2013; Kim and O'Shea, 2008), dynamics (Lam et al., 2008), and differences in gene expression between cells (Dadiani et al., 2013; Raser and O'Shea, 2004; Tirosh and Barkai, 2008). Recently, nucleosome organization has also been suggested to affect how gene promoters interpret dynamic signaling information at the single cell level (Hansen and O'Shea, 2013; Hao and O'Shea, 2012), and heterogeneity in promoter nucleosome positions has been linked to differences in gene expression (Small et al., 2014). Knowing the precise location that nucleosomes occupy with respect to DNA sequence is crucial for understanding how these biological processes are influenced by eukaryotic chromatin.

Genome-wide nucleosome positions are commonly mapped with micrococcal nuclease digestion based high-throughput sequencing (MNase-seq) (Hughes and Rando, 2014). In this method, histone-DNA interactions protect DNA from MNase digestion and the protected DNA fragments are sequenced and aligned to genome sequences to infer the location of nucleosomes (Clark, 2010; Kaplan et al., 2009; Rando, 2010; Zhang and Pugh, 2011). Although MNase exhibits sequence preference when digesting DNA devoid of histones (Horz and Altenburger, 1981), genome-wide analyses of nucleosomes with MNase-based methods are generally consistent with studies using MNase-independent methods (Hughes and Rando, 2014), such as DNase I chromatin digestion (Hesselberth et al., 2009) and chemical cleavage (Brogaard et al., 2012).

Studies that apply MNase-based methods typically report the position of a nucleosome as the average of the bulk nucleosome population (referred to as the 'consensus center', Figure 1A) (Struhl and Segal, 2013; Zhang and Pugh, 2011). However, if nucleosomes have overlapping positions in a significant portion of the population, the effect of averaging over heterogeneous nucleosome positions can lead to discrepancy between the consensus center of nucleosomes and the most representative nucleosome positions (Figure 1A). A variety of methods have been developed to improve the precision of nucleosome mapping from MNase-seq data, such as peak finding of nucleosome occupancy (Zhang and Pugh, 2011) and filtering of single-end digestion patterns (Weiner et al., 2010), but determining the precise locations of individual nucleosomes within a cell population remains a challenge due to substantial variability in the mapped locations of digested nucleosomes – the midpoints of paired-end sequenced nucleosomes or the endpoints of single-end sequenced nucleosomes (Figure 1B). This variability may arise from a cluster of overlapping and stably positioned nucleosomes, as well as from effects causing different degrees of digestion of the same nucleosome by MNase, such as nucleosome breathing and nuclease trimming – all of which influence the distribution of the aligned reads and are difficult to disentangle (Clark, 2010). Recently, a chemical cleavage approach that uses a genetically engineered histone H4 to chemically cleave DNA sequences in contact with the nucleosome dyad allowed direct measurement of nucleosome positions with unprecedented resolution (Brogaard et al., 2012; Moyle-Heyrman et al., 2013). However, the requirement for genetic engineering of essential histones limits its current application to genetically tractable organisms. Therefore, novel experimental or analytical approaches that are generally applicable in eukaryotes are still needed to determine the accurate positions of nucleosomes in vivo. Here, we report a computational approach to determine in vivo nucleosome positions from paired-end MNase-sequencing data.

Figure 1 with 2 supplements see all

Download asset Open asset

Illustration of the Template-Based Bayesian (TBB) approach for determining nucleosome positions.

(A) Diagram illustrating the heterogeneous nucleosome positions and the consensus centers of nucleosomes along a genomic region in a population of cells. Blue ovals illustrate individual nucleosomes and dotted lines mark all nucleosome positions. (B) Example of digested nucleosome reads, their nucleosome positions and the overall occupancy. (C) Illustration of the computational pipeline of the TBB approach. Occupancy of sequencing read midpoints indicates the number of midpoints at every base pair for yeast Chr 8, 204, 500–206,500 bp. Blue ovals illustrate overlapping TBB nucleosome positions and are colored according to the magnitude of their coefficients β. Two common presentations of nucleosome sequencing data are shown for comparison: the light gray area represents the nucleosome occupancy generated by smoothing sequencing read midpoints with a Parzen window approach (band size of 20 bp) (Albert et al., 2007; Tsankov et al., 2010); the dark gray area (Fragment extension) represents the nucleosome occupancy generated by extending 73 bp on both ends from the sequencing read midpoints. (D) Histogram showing the distance between adjacent TBB nucleosome positions in a combination of the T1 and T2 experiments.

https://doi.org/10.7554/eLife.16970.003

Applying template-based deconvolution to experimental data has many applications in biology. For example, in super-resolution fluorescence microscopy, the locations of individual fluorophores within the diffraction limit of light can be identified by deconvoluting light intensity information with a function describing the distribution of light intensity from individual light spots (Betzig et al., 2006; Huang et al., 2009; Rust et al., 2006). Inspired by this, we use the size distribution of MNase digested nucleosome fragments to infer a digestion variability template for nucleosomes, and report a Bayesian method that makes use of these templates to identify the individual positions of nucleosomes at a base-pair resolution, hereafter referred to as the template-based Bayesian (TBB) approach. This approach can be applied to data generated through both paired-end and single-end sequencing to map chromatin structure in diverse organisms. Here we demonstrate the template-based Bayesian approach with paired-end sequencing of MNase digested nucleosomes in yeast and human cells. We show that the periodic occurrences of dinucleotide sequence motifs relative to the nucleosome dyad can be directly determined from MNase based nucleosome positions and are conserved in vivo in both yeast and human cells. Leveraging this method, we find that alternative configurations of nucleosomes are a common feature in both yeast and human chromatin. The alternatively positioned nucleosomes around gene transcription start sites represent configurations that differ in their compatibility with the assembly of the pre-initiation complex. A 3-step model for transcription initiation can reconcile the competition between nucleosomes and the transcriptional machinery observed from genomic analysis.

Results

A Bayesian approach based on digestion variability templates can identify positions of nucleosomes at base-pair resolution

We reasoned that it might be possible to resolve the positions of multiple overlapping nucleosomes if we could estimate the degree to which MNase digestion contributes to the deviation of the midpoints from a true nucleosome center. Since the overall digestion of nucleosomes is reflected in the length of nucleosomal DNA fragments, we tested the idea that we might be able to estimate the variation in the midpoints from the length of digested nucleosomes (Figure 1—figure supplement 1), and use this information to infer the positions of individual nucleosomes through deconvolution. The variability of digested nucleosomes could come from two sources: the technical variation that is associated with nuclease cleavage, such as variable trimming at nucleosome ends, and biological variation that directly influences the length of DNA wound around histones, such as nucleosome breathing and remodeling (Polach and Widom, 1995). While the technical effects are likely to affect both ends of nucleosomal DNA equally, the biological effects may create bias for specific nucleosomes and/or a specific side of the nucleosomes. However, the biological effects are generally believed to be transient and rare at a given genomic location within a population of cells (Andrews and Luger, 2011); we thus assumed that the digestion variation was equivalent at both ends of the nucleosome when averaged over the genome and population. The biological variation at individual nucleosomes could generate large shifts in read midpoints due to length differences in nucleosomes (likely by multiples of 10 bp due to the unwrapping of each helical turn of DNA), and could be identified as alternative nucleosome positions if they were present in a significant fraction of the bulk nucleosome population. Nucleosomes with substantially smaller size, such as sub-nucleosomes (Rhee et al., 2014), can be identified based on the sequenced fragment size (<100 bp) and were excluded from our analysis.

The template-based Bayesian approach includes four major steps (Figure 1C, see Materials and methods). First, we estimated the distribution of MNase digestion variability from the length distribution of DNA sequencing reads, assuming that MNase digestion is not biased towards either end of the nucleosome on average. This produced a digestion variability template, describing the overall deviations in read midpoints relative to the true nucleosome center (Figure 1—figure supplement 1). Second, each chromosome was segmented into regions with similar sequencing coverage to increase detection sensitivity, with a requirement to maintain sufficient segment length for estimating statistical properties of nucleosome positions. Third, within each segment, the occupancy of read midpoints (the number of paired-end read midpoints at each base pair, Figure 1C, gray trace) was deconvoluted with the digestion variability template to estimate the expected read occupancy at every base pair if digestion errors had not occurred (coefficient β, Figure 1C, black trace). Finally, the locations of nucleosomes were summarized based on this expected read occupancy and statistical thresholds. To control for falsely identified nucleosome positions arising from the sequence preference of MNase, we set the statistical thresholds using a false discovery rate (FDR≤0.05) estimated based on a simulated MNase digestion dataset that matches the experimentally observed digestion sequence distribution. In our study, the length distribution of paired-end sequencing reads from the gene coding and promoter regions was indistinguishable from the rest of the genome (Figure 1—figure supplement 1–2). Therefore, a single digestion variability template was applied across the yeast genome.

We represented the locations of nucleosomes using two metrics: the consensus centers of nucleosomes and the TBB positions of nucleosomes (Supplementary file 1). The consensus centers of nucleosomes were defined to represent non-overlapping nucleosomes with a minimum size of 147 bp (Figure 1C, blue bars), by applying a Gaussian window smoothing method to the expected read occupancy (coefficient β) from our model – they are equivalent to the nucleosome positions commonly described in the literature (Albert et al., 2008; Jiang and Pugh, 2009b; Struhl and Segal, 2013) (hereafter termed ‘consensus centers’). The TBB nucleosome positions describe the center locations of individual nucleosomes across the genome without limitation on minimum spacing (Figure 1C, red dots and blue ovals) (hereafter termed ‘TBB nucleosome positions’), which represent individual stably positioned nucleosomes over a statistical threshold.

Accurate estimation of nucleosome positions at base-pair resolution

We titrated MNase to digest yeast chromatin and selected for analysis two nucleosome samples that were digested to different sizes (‘T1’ and ‘T2’, Figure 2—figure supplement 1). Both samples were digested to primarily mono-nucleosomes without generating over-represented sub-nucleosome fragments. We generated paired-end data sets of 'T1' and 'T2', each with a coverage of ~2.8 read midpoints per base pair; as an average between these two data sets, we identified ~61,800 consensus centers of nucleosomes and ~125,000 TBB nucleosome positions, covering at least 75.5% of the S. cerevisiae genome. The spacing between adjacent TBB nucleosome positions peaked at 11 and 21 bp (Figure 1D), consistent with the rotation of DNA sequence by one or two helical turns around histone cores (Albert et al., 2007; Brogaard et al., 2012), and suggesting that the same nucleosome commonly occupies overlapping configurations in different cells.

The consensus centers of nucleosomes identified by the template-based Bayesian approach were consistent with the published methods employing Gaussian smoothing algorithms (often referred to as 'Parzen window') (Albert et al., 2007, 2008; Clark, 2010; Tsankov et al., 2010) (Figure 2A), with more than 90% of the nucleosomes matching within a single base pair (Figure 2—figure supplement 2). These consensus centers also showed general agreement with the published nucleosome reference maps (Jiang and Pugh, 2009a) (Figure 2B and Figure 2—figure supplement 3), validating the overall biological relevance of our data sets. We then evaluated the accuracy of our identified TBB nucleosome positions by comparing them to individual nucleosome positions mapped through a recently reported chemical approach (Brogaard et al., 2012), which provides a reference that is independent of MNase cleavage specificity. The nucleosome positions identified by these two approaches were consistent (Figure 2C and Figure 2—figure supplement 4), demonstrating that the TBB approach can precisely map the positions of individual nucleosomes. Further, we assessed the absolute precision of the TBB approach by using it to map nucleosome positions from in silico generated MNase-seq data sets where the simulated nucleosome positions are known. We systematically tested the precision of the TBB approach with varying sequencing coverages, relative nucleosome population abundance and spacing among overlapping nucleosomes (Figure 3 and Material and Methods). The majority of the TBB nucleosome positions identified in these simulated data sets were within 2–3 bp of the originally simulated nucleosome positions, even for the cases where nucleosome positions were separated by only 10–15 bp or differ by more than 3-fold in their relative occupancy (Supplementary file 2).

Figure 2 with 6 supplements see all

Download asset Open asset

Genome-wide evaluation of the TBB approach and alternative nucleosome positions.

Histogram of the nearest distance: (A) between the consensus centers of nucleosomes determined by the TBB approach and by the Parzen window approach; (B) between the consensus centers determined by the TBB approach and the reference MNase nucleosome positions (Jiang and Pugh, 2009a); (C) between the TBB nucleosome positions and the nucleosome positions mapped by the chemical approach (Brogaard et al., 2012); (D,E) between the consensus centers of nucleosomes (D) and the TBB nucleosome positions (E) mapped in two independent experiments. The median of the distance between matched consensus centers or TBB nucleosome positions is reported for each comparison. (F) Example of stable TBB positions that are tolerant to MNase digestion, chr 8, 341651 – 341771. Randomly selected reads fragments are shown to represent the locations of sequenced tags.

https://doi.org/10.7554/eLife.16970.006

Figure 3

Download asset Open asset

Nucleosome detection from in silico MNase-seq datasets.

(A) Plots summarizing the distance between the detected TBB nucleosome positions in the in silico datasets and the nearest simulated primary and alternative nucleosome positions. ('C', total sequencing coverage of all overlapping nucleosomes; 'E', the effective magnitude (relative occupancy of neighboring nucleosomes); and 'O', the offset (spacing between nearby nucleosome positions). (B) Examples of nucleosome detection in the simulation at different coverage, effective magnitude and offset (different values are highlighted in red). Sequencing read midpoints (gray) were distributed randomly around the simulated nucleosome positions (blue dots) according to the digestion variability template. The coefficients (blue trace) and nucleosome positions (red dots) determined by the TBB approach are shown for comparison.

https://doi.org/10.7554/eLife.16970.013

The accuracy and reproducibility of MNase-based mapping of nucleosomes may be limited by MNase digestion variability. The degree to which chromatin is digested by MNase depends on the DNA sequence and the preparation of nucleosomal DNA samples (Dingwall et al., 1981; Horz and Altenburger, 1981), and may differ significantly between experiments even when experimental protocols have been carefully followed. To evaluate the tolerance of the TBB nucleosome positions to different levels of MNase digestion, we compared the consensus centers and TBB nucleosome positions in the ‘T1’ and ‘T2’ data sets that represent digestion to different fragment sizes (Figure 2—figure supplement 1). Both the consensus centers and the TBB nucleosome positions were in agreement between these two experiments (Figure 2D–E, Figure 2—figure supplements 5–6), with a median difference of 2 bp across the genome; in comparison, randomly generated consensus centers and nucleosome positions gave rise to a median difference of 49 bp and 36 bp, respectively. Individual nucleosome positions can be reliably detected regardless of the differences in MNase digestion (Figure 2F), demonstrating the robustness of the determined nucleosome positions to digestion variability. Overall, these results demonstrate that the positions of nucleosomes can be reliably and accurately detected at base-pair resolution with paired-end MNase-seq coupled with the template-based Bayesian approach.

Figure 4 with 4 supplements see all

Download asset Open asset

Dinucleotides frequency of nucleosome positions.

(**A–C**) Normalized frequency of AA/AT/TA/TT and CC/CG/GC/GG dinucleotides of DNA sequences aligned at the centers of nucleosomes, for all TBB nucleosome positions in yeast, both before (A) and after (C) correction for MNase digestion bias, and for all TBB nucleosome positions on human chromosome 12, position 38,000,000 bp to 48,000,000 bp, a 10 Mbp region randomly chosen in the human genome (B). (D) The frequency of AA/AT/TA/TT for 62,035 randomly selected TBB nucleosome positions (blue trace), and for genome locations with an average distance of either 2 bp (orange trace) or 5 bp (black trace) from these selected TBB nucleosome positions. The distance was randomly perturbed by between 0–4 bp or 0–10 bp for each nucleosome positions, respectively.

https://doi.org/10.7554/eLife.16970.014

Figure 5 with 2 supplements see all

Download asset Open asset

Alternatively positioned nucleosomes at transcription start sites.

(A,B) Examples of a uniquely positioned nucleosome (A), and alternatively positioned nucleosomes at the TSS (B). Blue and magenta traces show the sequencing read midpoint occupancy and fitted coefficient β from experiments ‘T1’ and ‘T2’, respectively. Ovals indicate the TBB nucleosome positions and are colored based on coefficients β. (C) Heat map showing the TBB nucleosome positions ('Positions'), the occupancy of read midpoints from two experiments (T1 and T2), and the occupancy of read midpoints from the two MNase bias simulations ('T1 DG control' and 'T2 DG control'). All data are aligned by the centers of selected unique or alternative nucleosome positions (essentially the consensus center of +1 nucleosomes) that overlap with the transcription start site (TSS) (4672 open reading frame in total). The order of transcripts is ranked by the maximum space between TBB nucleosome positions within each group, as illustrated by the diagrams on the left. The positive direction of the x-axis indicates 5’ to 3’ for all transcripts. (D) Correlation coefficient of the sequencing read midpoint occupancy (un-smoothed) in experiment T1 and T2 for each gene in panel C. (E) Graph showing the end location of paired-end sequencing reads of the upstream and downstream nucleosomes in panel C (later defined as the proximal and distal nucleosomes, respectively). The ends are aligned at the position of upstream nucleosomes (proximal nucleosomes). (F) Length distribution of paired-end sequencing reads in the unique, proximal and distal nucleosomes at gene promoters. In both (E) and (F), if the midpoint of a sequencing read is within 5 bp of the position of a nucleosome, it is counted as a read of this nucleosome.

https://doi.org/10.7554/eLife.16970.019

Figure 5—source data 1 Source data for Figure 5C, with arrays of positions, midpoints from T1 and T2 used for generating the graph.: https://doi.org/10.7554/eLife.16970.020
Download elife-16970-fig5-data1-v2.xlsx

Several methods have been developed to determine the genome-wide locations of nucleosomes, each of which addresses different experimental and computational issues (Albert et al., 2007a; Brogaard et al., 2012; Polishko et al., 2012; Schep et al., 2015; Tirosh, 2012; Tsankov et al., 2010; Valouev et al., 2011; Weiner et al., 2010; Zhong et al., 2016). We briefly summarized the approaches, performance and limitations of these methods for reference (Supplementary file 3).

Dinucleotide periodicity is a conserved determinant of in vivo nucleosome positioning

Nucleosomes in vivo are thought to be enriched for genome sequences that show a pattern of periodic AA/AT/TA/TT or CC/CG/GC/GG motifs that oscillate every 10 bp with opposite phases (Albert et al., 2007a; Brogaard et al., 2012; Gaffney et al., 2012; Kaplan et al., 2009; Mavrich et al., 2008; Satchwell et al., 1986; Segal et al., 2006). This dinucleotide sequence pattern matches the bending property of preferred nucleosomal DNA sequences, as the double stranded DNA is twisted and bent to maintain contact with the histone cores (Drew and Travers, 1985). However, this periodic pattern was not observed around the consensus centers of nucleosomes determined using MNase based mapping methods (Valouev et al., 2011) (Figure 4—figure supplement 1) and only emerged from nucleosome fragments selected to be exactly 147 bp (Figure 4—figure supplement 2) (Albert et al., 2007a; Gaffney et al., 2012; Kaplan et al., 2009), perhaps due to the imprecision of the determined nucleosome centers.

In contrast, when we aligned DNA sequences at the TBB nucleosome positions in yeast, the dinucleotide AA/AT/TA/TT and CC/CG/GC/GG motifs displayed 10 bp periodicity across almost the full length of the nucleosome core (Figure 4A and Figure 4—figure supplement 3), illustrating the power of the TBB approach to precisely map nucleosome positions from MNase-based methods. To control for the influence of MNase digestion bias, we simulated MNase digestion based on its observed sequence preference (Gaffney et al., 2012), and identified nucleosome positions from the in silico MNase digestion data set. We used these in silico nucleosome positions to normalize the bias that was merely a result of MNase digestion (Figure 4—figure supplement 4). After these corrections, we observed dinucleotide AA/AT/TA/TT and CC/CG/GC/GG motifs displaying 10 bp periodicity (Figure 4C), consistent with the dinucleotide frequency around the chemically determined nucleosome positions in yeast (Brogaard et al., 2012). Randomly shifting the determined TBB nucleosome positions by an average of 2 bp greatly reduced the observed dinucleotide periodicity and shifting the positions by an average of 5 bp completely eliminated the periodic pattern (Figure 4D), demonstrating the importance of precisely mapping nucleosome positions in vivo.

Paired-end nucleosome sequencing has been used to map chromatin structure in higher eukaryotes (Gaffney et al., 2012), but a base-pair resolution nucleosome map has been lacking for organisms other than yeast. We applied the TBB approach to determine the positions of human nucleosomes from a recent paired-end MNase-seq study (Gaffney et al., 2012). The frequency of AA/AT/TA/TT and CC/CG/GC/GG motifs displayed clear 10 bp periodicity relative to human TBB nucleosome positions, which was comparable to the motif periodicity observed in yeast (Figure 4A and B, Figure 4—figure supplement 2). In contrast, when DNA sequences were aligned at the consensus centers of human nucleosomes, the frequency of these dinucleotide motifs did not exhibit a clear periodicity with respect to the consensus centers (Figure 4—figure supplement 1). With the template-based Bayesian approach, we demonstrated that nucleosome positions determined from MNase-based methods indeed present periodic dinucleotide frequency globally, and that this feature may be a conserved signal for nucleosome positioning in vivo.

Heterogeneous nucleosome positions at the TSS

Positioning of nucleosomes over yeast transcription start sites (TSSs) is influenced by promoter sequence signals and trans acting factors (Jiang and Pugh, 2009b; Radman-Livaja and Rando, 2010; Segal and Widom, 2009b). This observation, together with our finding that overlapping positions are commonly observed in yeast and human chromatin, motivated us to examine the relationship between TSSs and overlapping nucleosome positions (resulting from alternative positions). We identified all TBB nucleosome positions that were closely associated with TSSs (Jiang and Pugh, 2009a) and organized them into groups consisting of either unique or overlapping TBB nucleosome positions for each gene (Figure 5A and B). If a group contained several overlapping TBB nucleosome positions, the center of this group was defined as the numeric average of these positions. All groups were aligned at their centers and ranked by the spacing between TBB nucleosome positions within the groups (Figure 5C). The TBB nucleosome positions exhibited a bimodal shape for at least three quarters of the genes with nucleosome-covered TSSs (Figure 5C, ‘positions’), suggesting that alternative positioning is a common feature of nucleosomes near TSSs. The number of sequenced read midpoints showed the same bi-furcating shape and was highly consistent between two experiments (Figure 5C, ‘T1 midpoints’ and ‘T2 midpoints’). The occupancy of nucleosome midpoints at each gene correlated well between two experiments (T1 and T2), indicating alternative nucleosomes can be reliably observed (Figure 5D). The distribution of read midpoints was symmetric along the direction of gene transcription (Figure 5—figure supplement 1), and could not be explained by the sequence bias of MNase digestion (Figure 5C). When we examined the midpoints attributed to each nucleosome position within each group (Figure 5—figure supplements 1–2), we found that most groups showed less than 2-fold variation in the read occupancy between alternative TBB nucleosome positions, suggesting that the different positions are significantly represented in the cell population and therefore are likely of biological relevance. These alternative positions were not a consequence of differential digestion at nucleosome edges (Weiner et al., 2010), as both ends of paired-end sequencing reads showed two peaks that match the edges of alternatively positioned nucleosomes (Figure 5E). Differential digestion at nucleosome edges would produce two peaks of read ends on one side and just one peak on the other read end (Weiner et al., 2010). Of the genes with their TSSs covered by nucleosomes, approximately one quarter contain more than two TBB nucleosome positions at TSSs, suggesting that chromatin structure at the transcription initiation site is complex and heterogeneous.

Alternative chromatin states and transcription initiation

Alternative positions of nucleosomes around the TSS may allow the transcription start site of a gene to be exchanged between accessible and inaccessible states. To determine if this is the case, we analyzed the location of transcription start sites with respect to the alternative TBB nucleosome positions. We defined the nucleosome with its center position closer to the TSS as the 'proximal' nucleosome, and the nucleosome with its center position further away from the TSS as the 'distal' nucleosome (Figure 6A). The unique, proximal and distal nucleosomes were digested by MNase to similar sizes, further suggesting that the alternative nucleosome positions are not a result of MNase digestion bias (Figure 5F).

Figure 6

Download asset Open asset

Alternatively positioned nucleosomes and transcription pre-initiation complex.

(A) Plots (left) showing the locations of TSSs relative to the average centers of all overlapping TBB nucleosome positions shown in Figure 3C ('All', black) to uniquely positioned nucleosomes (magenta) and to the distally (red) and proximally (cyan) positioned nucleosomes. The gray area marks the region covered by the nucleosome core. The cartoon diagrams on the right illustrate the location of the TSS relative to the nucleosome dyad. (B) Area showing the average occupancy of subunits of the pre-initiation complex (TBP, TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, TFIIK and RNA polymerase II (Pol II); determined by ChIP-exo) (Rhee and Pugh, 2012) aligned at the center of the proximal (cyan) and the distal nucleosomes (red), determined by the TBB approach. (C) Bar graph showing the distribution of TATA box and TATA-like sequences of all genes (Rhee and Pugh, 2012) aligned at the dyad of the proximal nucleosomes. (D) Illustration of the 3-step model for transcription initiation mediated by alternatively positioned nucleosomes.

https://doi.org/10.7554/eLife.16970.023

Figure 6—source data 1 Source data for Figure 6, with genes and coordinates of TSSs, unique, proximal and distal nucleosome positions.: https://doi.org/10.7554/eLife.16970.024
Download elife-16970-fig6-data1-v2.xlsx

Among uniquely positioned nucleosomes, TSSs weree enriched ~10 bp inside the boundary of the nucleosome core (Figure 6A, magenta curve). Among the alternatively positioned nucleosomes, the proximal nucleosomes were located much closer to the TSS where the nucleosome core restricts the accessibility of these factors and therefore competes for active transcription (Figure 6A, cyan curve). For the distal nucleosomes, TSSs were aligned near the edge of the nucleosome core (Figure 6A, red curve), making these sites and the upstream regulatory elements accessible to other DNA binding factors. Alternatively positioned nucleosomes at TSSs gave rise to different accessibilities of the promoter elements within a population of cells, which may reflect the different activities in gene transcription. If one focused only on the average centers of these nucleosomes (essentially the consensus centers of the +1 nucleosomes), one would draw the conclusion that nucleosomes occlude the TSSs of most genes (Figure 6A, black line).

To explore how alternative TBB nucleosome positions might influence transcription, we focused on the initial steps of this process: the assembly of the transcription pre-initiation complex (PIC), which generally requires binding of TBP (TATA-binding protein); and the subsequent recruitment of general transcription factors and RNA polymerase II (Pol II) (Buratowski et al., 1989; Green, 2000; Orphanides et al., 1996; Roeder, 1996). Recently, it was suggested that the +1 nucleosome downstream of the TSS assists in the assembly of the PIC (Rhee and Pugh, 2012). However, the binding of the PIC extends into the boundary of the +1 nucleosomes (Rhee and Pugh, 2012), presenting a physical conflict between PIC binding and nucleosomes. To test if the resolution of alternatively positioned nucleosomes would provide a better understanding of the interplay between nucleosomes and the assembly of the PIC, we analyzed the occupancy of the PIC (determined with the ChIP-exo method (chromatin immunoprecipitation with lambda exonuclease digestion) by Rhee and Pugh [Rhee and Pugh, 2012]) around the identified distal and proximal nucleosomes, respectively. Most subunits of the PIC, including TBP, TFIIA, TFIIB, TFIIE, TFIIF, TFIIH and TFIIK, showed peak enrichment at the edge of the distally positioned nucleosome (Figure 6B, cyan), suggesting that the +1 nucleosome in the distal position marks a boundary for the assembly of the PIC. This boundary may be provided by the positioned nucleosomes or by binding of one or more components of the transcription complex. In contrast, the proximally positioned nucleosome covered most of the regions enriched for PIC binding (Figure 6B, red), suggesting that the +1 nucleosome in the proximal position is not compatible with the binding of transcription machinery. Intriguingly, TATA-box and TATA-like sequences were enriched in front of the proximal nucleosomes (Figure 6C), suggesting that proximal nucleosomes do not block recognition of most TATA elements. Thus, the proximal nucleosome may represent a promoter state that is not associated with active transcription and is populated prior to the assembly of the PIC (Figure 6D, Proximal state). The distal nucleosome may represent a promoter state in the process of transcription initiation (Figure 6D, Distal state). Once transcription begins, the barrier of nucleosomes no longer hinders Pol II as its occupancy extends through the region covered by the +1 nucleosome (Figure 6B).

The location of uniquely positioned nucleosomes relative to TSSs was similar to that of the distal nucleosomes (Figure 6A, magenta and red curves), suggesting that the binding and assembly of PIC may encounter less competition from chromatin at these genes. We found that genes with uniquely positioned nucleosomes at their TSSs were enriched in highly expressed genes (Newman et al., 2006) (top 500 genes, p = 0.001, Fisher’s exact test). More strikingly, we found that 73% of the genes encoding ribosomal proteins had TSSs with uniquely positioned nucleosomes (p<0.00001, Fisher’s exact test), highlighting a connection between the configuration of nucleosomes (unique vs alternative) and gene expression.

Sequence determinants of unique and alternative chromatin states

DNA sequence features, such as periodic occurrence of dinucleotides (AA/AT/TA/TT or CC/CG/GC/GG) and consecutive dA or dT sequences (poly(dA:dT)), can influence nucleosome positions (Struhl and Segal, 2013). We therefore tested the possibility that DNA sequence determinants may underlie the difference between uniquely, proximally, and distally positioned nucleosomes around TSSs. The frequency of dinucleotides occurred periodically ~10 bp from the positions of either proximal or distal nucleosomes and resembled the periodic feature of nucleosomes in yeast genome (R² = 0.75, Figure 7A). This periodicity was less pronounced in the uniquely positioned nucleosomes (R² = 0.40 between dinucleotide frequency in uniquely positioned nucleosomes and all nucleosomes in yeast genome; Figure 7A). Autocorrelation analysis of the dinucleotide frequency revealed a lack of periodicity in DNA sequences around the uniquely positioned nucleosomes, further highlighting the difference between uniquely and alternatively positioned nucleosomes (Figure 7B).

Figure 7

Download asset Open asset

Sequence features of uniquely and alternatively positioned nucleosomes.

(A) Plots showing the frequency of AA/AT/TA/TT and CC/CG/GC/GG dinucleotides of DNA sequences aligned at the TBB nucleosome positions of either unique, proximal or distal nucleosomes at gene promoters (illustrated by the diagrams). (B) Plots showing the normalized dinucleotide frequency (smoothed with a 3 bp window) of DNA sequences aligned at unique nucleosomes, alternative nucleosomes (proximal + distal) at gene promoters or all TBB nucleosome positions, and the autocorrelation analysis (performed in MATLAB) of the dinucleotide frequency within nucleosome core (−73–73 bp). (C,D) Plots showing the frequency of poly(dA:dT)₆ sequences aligned at unique nucleosomes or the average centers of alternatively positioned nucleosomes. The positive direction of the x-axis indicates 5’ to 3’ for all transcripts. Black arrows mark the enriched poly(dA:dT)₆ signals. Overall, 1172 genes contain unique nucleosomes and 3469 genes contain proximal and distal nucleosomes at their promoters. Gray traces present the analysis for a random permutation control of selected promoters and the location of nucleosome positions (A,C,D). The number of these random locations matches the number of nucleosomes in each plot.

https://doi.org/10.7554/eLife.16970.025

Poly(dA:dT) tracks are important for nucleosome depletion and correlate with transcriptional activity (Hughes et al., 2012; Iyer and Struhl, 1995; Raveh-Sadka et al., 2012; Segal and Widom, 2009a; Struhl, 1985). When we examined the occurrences of poly(dA:dT) sequences at least 6 nucleotides in length, we found that the uniquely positioned nucleosomes were flanked by a strong enrichment of poly(dA:dT)₆ sequences (Figure 7C). A similar but weaker enrichment of poly(dA:dT)₆ sequences was also observed flanking both the proximally and the distally positioned nucleosomes (Figure 7D). In comparison, no enrichment of poly(dA:dT)₆ sequences was observed in randomly selected DNA sequences (Figure 7C and D, gray curve). Overall, uniquely positioned nucleosomes and alternatively positioned nucleosomes presented different DNA sequence features, suggesting that these sequence determinants may contribute to the different configurations of chromatin states at gene promoters.

Discussion

Knowing the precise location of nucleosomes in vivo is key to understanding how diverse biological processes are regulated (Jiang and Pugh, 2009b). Although MNase-seq methods have been widely used for mapping nucleosomes in many organisms, the commonly reported consensus centers of nucleosomes do not precisely reflect the location of nucleosomes with respect to DNA sequence. We developed a new computational method to accurately determine the positions of nucleosomes at base pair resolution based on paired-end sequencing data, providing a general and accessible approach to accurately map the basic chromatin structure in eukaryotes.

The periodicity in the frequency of dinucleotide AA/AT/TA/TT and CC/CG/GC/GG motifs has been observed in vivo in several organisms, including unicellular fungi (Albert et al., 2007a; Brogaard et al., 2012), insects (Mavrich et al., 2008), chicken (Satchwell et al., 1986) and human (Gaffney et al., 2012), and was thought to strongly influence the positioning of nucleosomes (Kaplan et al., 2009; Struhl and Segal, 2013). However, this feature was only obvious when the sequences of nucleosomal DNA fragments exactly 147 bp in length were selected for analysis (Albert et al., 2007a; Gaffney et al., 2012; Mavrich et al., 2008; Segal et al., 2006), and was much weaker or even disappeared around the consensus centers of nucleosomes that were determined with an entire data set (Brogaard et al., 2012; Valouev et al., 2011). Thus, differences in the selection and analysis of MNase-seq data may result in different conclusions about nucleosome positioning in vivo (Gaffney et al., 2012; Valouev et al., 2011). With the high-precision nucleosome position maps generated by TBB, we directly demonstrated that the DNA sequences around the mapped nucleosome positions exhibited 10 bp periodicity in the frequency of dinucleotide motifs in both yeast and human cells, arguing that this sequence feature is a universal signal for positioning nucleosomes in vivo at the whole genome level.

Positioning of nucleosomes is dynamic and linked to many biological processes. Identifying overlapping nucleosome positions that cannot co-exist on the same DNA molecule at the same time provides a snapshot of possible chromatin states in dynamic processes. We observed at least two states of chromatin where gene transcription is initiated: a state where a nucleosome covers the transcription start site and competes with transcription machinery, but leaves the core promoter sequences mostly accessible ('proximal state'); and another state where a nucleosome barely covers the transcription start site and seems poised for the assembly of the pre-initiation complex ('distal state'). We propose a 3-step model (Figure 6D) to describe the transcription initiation steps influenced by nucleosomes, including a state where RNA polymerase transcribes through the nucleosome covered sequence ('open state'). The transition from the 'proximal state' to the 'distal state' and to the 'open state' could be actively driven by the assembly of the PIC; in this case, different nucleosome states may reflect rate-limiting steps during transcription initiation. Alternatively, this transition could be controlled by transcription factors, chromatin modulators and even the preference of DNA sequences as a means to regulate and assist gene transcription. Interestingly, we found that TSSs of highly expressed genes are enriched with uniquely positioned nucleosomes, which resemble nucleosomes in their 'distal state'. It is possible that the +1 nucleosomes of these most highly expressed genes are oriented in a way that allows constitutive assembly of the PIC and therefore frequent transcription activity. Further genomic analysis and mechanistic studies will shed light on the generalization and the underlying principles of this model.

DNA sequences may play a role in determining whether nucleosomes have unique or alternative positions. Unique nucleosome positions may result from the combination of the strong nucleosome boundary signal from poly(dA:dT) sequences and the lack of a rotational positioning signal from periodic dinucleotide motifs. Conversely, alternatively positioned nucleosomes may be a consequence of a weak boundary signal and a strong rotational positioning signal that allows nucleosomes to interconvert between multiple equivalent positions. The exact positions of these nucleosomes and how alternative nucleosomes are positioned relative to each other may be further controlled by other factors associated with promoters, such as chromatin modifiers, remodelers and transcription factors. The configurations of nucleosomes (unique vs. alternative) may influence the transcription of a gene; if promoter DNA sequences contain information that helps to specify the nucleosome configurations, eukaryotic genome could use nucleosome configurations as a means to encode transcriptional programs.

It is worth noting that the template-based Bayesian approach does not capture all possible positions of nucleosomes within the cell population – it identifies the nucleosome positions that are represented with strong statistical confidence (Materials and methods), reflecting the different configurations of nucleosomes that are likely to be relevant in biological processes. Revealing these heterogeneous configurations of nucleosomes in cells will provide new insights into how chromatin regulates or is influenced by biological processes. Furthermore, mutually exclusive positions of protein-DNA complexes can imply dynamic information. As an example, the alternatively positioned nucleosomes at TSSs suggest a change in chromatin structure during transcription initiation. Similar analytic approaches can be applied to other systems to reconstruct dynamic processes.

Share this article

Cite this article

Illustration of the Template-Based Bayesian (TBB) approach for determining nucleosome positions.

Genome-wide evaluation of the TBB approach and alternative nucleosome positions.

Nucleosome detection from in silico MNase-seq datasets.

Dinucleotides frequency of nucleosome positions.

Alternatively positioned nucleosomes at transcription start sites.

Figure 5—source data 1

Alternatively positioned nucleosomes and transcription pre-initiation complex.

Figure 6—source data 1

Sequence features of uniquely and alternatively positioned nucleosomes.

Author details

Xu Zhou

Present address

Contribution

Contributed equally with

Competing interests

Alexander W Blocker

Contribution

Contributed equally with

Competing interests

Edoardo M Airoldi

Contribution

For correspondence

Competing interests

Erin K O'Shea

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organisms

Further reading