A 1-dimensional statistical mechanics model for nucleosome positioning on genomic DNA

The first level of folding of DNA in eukaryotes is provided by the so called '10 nm chromatin fibre', where DNA wraps around histone proteins (approx. 10 nm in size) to form nucleosomes, which go on to create a zig zagging bead on a string structure. In this work we present a 1 dimensional statistical mechanics model to study nucleosome positioning within one such 10 nm fibre. We focus on the case of genomic sheep DNA, and we start from effective potentials valid at infinite dilution and determined from high resolution in vitro salt dialysis experiments. We study positioning within a polynucleosome chain, and compare the results for genomic DNA to that obtained in the simplest case of homogeneous DNA, where the problem can be mapped to a Tonks gas. First, we consider the simple, analytically solvable, case where nucleosomes are assumed to be point like. Then, we perform numerical simulations to gauge the effect of their finite size on the nucleosomal distribution probabilities. Finally we compare nucleosome distributions and simulated nuclease digestion patterns for the two cases (homogeneous and sheep DNA), thereby providing testable predictions of the effect of sequence on experimentally observable quantities in experiments on polynucleosome chromatin fibres reconstituted in vitro.


Introduction
Chromatin is the building block of chromosomes within eukaryotes [2][3][4][5][6]. It is made up by histone proteins (normally octamers) and DNA, which wraps around the histones to form a left-handed superhelix [7]. There are 147 base pairs of DNA wrapped around a histone octamer, and this complex is known as a nucleosome.
Electron microscopy and atomic force microscopy revealed that when spread on a surface, chromatin fibres (a DNA chain containing many nucleosome) adopts a characteristic bead-on-a-string structure (Fig. 1). This is the first level of compaction of DNA within eukaryotic nuclei, which needs to be complemented by higher orders of compactions which are to date not fully understood [2,3]. An important observation about nucleosomes is that their positioning within the genome, which has been mapped through high-throughput experiments by the genome projects, is not random, but highly reproducible. In vitro, experimentalists are also able to map the position of nucleosomes along a DNA chain of given sequence (see e.g. [9]). Because in the test tube there are no other constituents than DNA and histone octamers, it follows that, at least under those conditions, the positioning of the nucleosome must be dictated by simple biophysical laws: nucleosome-nucleosome interactions along the chain [10], and the nucleosome: DNA interaction. The latter interaction is partly given by electrostatic attractions between the negatively charged DNA and the positively charged histone octamer [11][12][13][14], but it also includes a sequencespecific component, which is largely due to the sequence-dependent elastic properties of the genetic material [15][16][17][18].
Here we present a simple 1-dimensional model where nucleosomes diffuse along a DNA chain, interact with each other via steric repulsion, and with the genome via a sequence-dependent effective potential, which is informed by high resolution in vitro positioning experiments [9,19]. As a test case, we consider a DNA sequence corresponding to part of the beta-lactoglobulin gene from the sheep genome, which was previously studied in vitro by our collaborators [9]. The experimental data we use as an input considered a highly diluted situation in which, effectively, a single histone octamer was allowed to diffuse on the DNA, to find its optimal position. Nucleosomal diffusion is very slow under normal conditions [6,20,21]; this is why the experiments are challenging and normally require a salt dialysis protocol whereby the amount of monovalent salt in the buffer (which screens electrostatic interactions hence weakens the strong attraction between histone proteins and DNA) is slowly and very gradually decreased.
Our goal here is to start from these mononucleosome potentials, determined through salt dialysis experiments, and predict, either analytically or numerically, the 1D of a chromatin fibre containing a finite density of nucleosomes. Within our model, the positioning is solely due to sequence, hence our results can be seen as predictions for positioning within polynucleosome chromatin fibres of variable density which can be reconstituted in vitro. [Normally, it is more difficult to control nucleosome density in vivo, where a wealth of nucleosome positioning data exist [15,[22][23][24].] We also simulate digestion experiments followed by gel electrophoresis, which are often used to assess quality and 1D organisation of chromatin fibre in vitro. By simulating these experiments both on the beta lactoglobulin gene and on a hypothetical homogeneous DNA where the interaction between histone octamer and DNA is uniform (sequence-independent), we can dissect the roles of steric interactions [16] and sequence [15,17,18] in the positioning, at least in this specific, and highly simplified, framework.
There are several excellent contributions available which consider the nucleosomal positioning of chromatin fibres with 1-dimensional statistical mechanics models, for instance Refs. [22][23][24][25][26]. Many of these works build on the idea that the nucleosomes along DNA behave like a Tonks gas of particles of finite size interacting via excluded volume in 1D -this model owes its name to Lewi Tonks, who, in 1936, computed equations of state for this system [1]. An interesting application of Tonks' theory was provided by Ref. [27]. Such paper derives all the thermodynamical properties of DNAprotein systems in the absence of any other interaction except volume exclusion, and provides predictions for correlation functions, density functions and free energies of a 1D nucleosomal chain. Other works on chromatin building on the Tonks gas idea can be found for instance in Ref. [10] and Refs. therein.
A more recent work, the so-called Takashi model, is also directly relevant to our work. In the Takashi model [28], a system of random walkers with excluded volume on a discrete lattice is considered. This model can be studied in part analytically, and some formulas for the lattice sites occupation probabilities can be derived -however, in the generic case, these quantities need to be computed numerically. Our work can be seen as a continuation of this approach, where the sequence-spefic DNA-histone potential is included, and comes from experimental data. Under some approximations, namely when nucleosomes are point-like, but their mutual avoidance is retained, so that they cannot overtake each other on the 1D chain, and when we consider a piecewise continuous approximation of the sequence-dependent potential, we are able to solve the model analytically, and compute nucleosomal distribution functions exactly thanks to an explicit evaluation of the partition function. In the general case, we solve the model numerically with Monte-Carlo simulations.
The focus on analytics and exact results is one difference with respect to previous work in Refs. [22,[24][25][26]. Another novel aspect of our work is the afore-mentioned simulation of digestion experiments [4]. These simulations provide predictions which can be directly compared with results from micrococcal nuclease digestion of reconstituted chromatin fibres. In these experiments [4], chromatin fibres are subjected to the action of an enzyme (typically micrococcal nuclease) which cuts the chromatin fibre at regions of "naked" DNA (i.e., not associated with a histone octamer). The distribution of fragment length can then be measured by means of gel electrophoresis experiments on the digested fibre. Our model can, in particular, predict the effect of sequence on the output of such digestion experiments, by comparing the results obtained with genomic and homogeneous DNA. As we shall see, these differences are quite subtle, and depend on the duration, or efficiency, of the nuclease digestion. We hope that these results may stimulate further experimental work aiming at detecting sequence-specific effects on nucleosomal positioning in reconstituted chromatin fibres.
Our work is structured as follows. In the next Section we will introduce the analytical methods used to evaluate the partition function of a 1D system of nucleosomes with a generic piecewise linear potential. We will then report, in Section 3, our analytical results on the nucleosome positional distribution functions along the chromatin fibre for the case of a uniform DNA, and for the case of the genomic DNA sequence from the beta lactoglobulin gene. In Section 4, we will go on to present our Monte Carlo simulations of a polynucleosome chain with finite size histones, focussing on the prediction for the digestion pattern. Finally, Section 5 contains our conclusions.

A statistical mechanics model for random walkers in a 1D potential
In this Section we show how to extend the Tonks gas approach used in the Takashi model in Ref. [28] to the situation we are interested in. In our model, nucleosome positions are not restricted to the lattice nodes, instead, they are allowed to random walk on the continuous genomic DNA lattice.
Our aim is to calculate the statistical properties of this system. We will do so by studying random walkers with steric exclusion under the influence of an arbitrary lattice potential V (x).
We begin by recalling the static approach to random walks discussed in [28] for the case in which the walkers move in a generic 1D potential on the lattice. A Boltzmann (exponential) factor in the integrand implements the effect of the potential: (1) To make progress, we need to further manipulate Eq. 1 to simplify the calculations in what follows. Specifically, we observe that the e −β(V (x 1 )+V (x 2 )+ ... +V (x N )) factor in the partition function simply weighs each of the microstates a-la-Boltzmann. For simplicity, β = 1 henceforth. The particular case of V (x) = 0 for all x corresponds to a uniform, sequence-independent potential, where all microstates are equally probable.
Generally, the partition function of the system is Z N (L) = {conf ig} N n=1 e −V (xn) , where configurations must include all possible sets {x 1 , x 2 , ... , x N } over the lattice of length L, with 0 ≤ x 1 ≤ x 2 ≤ ... ≤ x N −1 ≤ x N ≤ L, which can be implemented introducing θ-functions in Eq. 1, ensuring sequential ordering of the particles on the lattice. The sum of ordered weighted configurations in the continuous limit gives rise to equation 1.
By using previously defined quantities, the probability of having the N -th random walker at position x n , which is a central quantity in our theory, can be written as follows, where, Z n−1 (l) = {conf ig} ( n−1 j=1 e −V (x j ) ) and Z N −n (L−l) = {conf ig} ( N j=n+1 e −V (x j ) ). ‡ Let's take a closer look at Eq. 2. This equation reflects the idea that in general a partition function represents the counting of all available microstates, weighted by the Boltzmann probability. Hence, the probability of having the n th particle at position l is the fraction of weighted microstates satisfying this constraint over all possible weighted microstates in the system.
Then, the denominator Z N (L) represents all accessible states, through the total or general partition function for N particles on the lattice in Eq. 1. The numerator of Eq. 2 represents N particles on the lattice with the n th particle fixed at position x n = l, weighted by the e −V (l) factor, and all weighted configurations of n − 1 particles left of particle x n through the Z n−1 (l) expression and N − n particles right of x n , with the Z N −n (L − l) expression.
The main advantage of our approach is that Z N (x), Z n−1 (x) and Z N −n (x) will be analytical continuous functions in the range {0, L}. Hence, this method will allow for straightforward computations of the statistical properties of random walkers under exclusion process, such as particle PDFs or probability of a gap between particle pairs for arbitrary N of particles on the lattice.

Computing partition functions: 'Divide et Impera'
In this Section, we outline an algorithm to compute the partition functions Z N (L), which will be useful to model 1D Brownian motion under the influence of piecewise defined potentials.
One may think of a lattice of size L as made up of a number S of sections. Let us further consider a system of N particles overall, and let us call n i the occupation number of lattice region i (1 ≤ i ≤ S). If we define Z(i, n i ) the partition function for each lattice section i and n i particles on it, the Z(i, n i ) can be computed by direct application of equation 1.
Consequently, The delta function in Eq. 3 ensures that there are exactly N particles on the lattice. Each Z(i, n i ) is a summation of weighted configurations of n i infinitesimally small particles on the i th lattice section. Hence, the product of all partition functions and summation over occupation numbers produce all possible microstates of the system. A diagrammatic description of equation 3 is shown in Fig. 2.

Indistinguishability and particle exclusion
For an arbitrary potential V (x), the following observation is useful: The LHS counts arrangements of particles on the lattice preserving particle label ordering. This is equivalent to counting the number of possible configurations of indistinguishable particles on the lattice (RHS). Hence, a factor of 1 N ! can be included, instead of employing the theta function.
Generalising these ideas to N particles, Eq. 1 can be recast as Consequently, equation 2 now becomes Here we explain how to compute Z and Z for piecewise potentials exploiting the algorithmic approach in Eq. 3 and implementing the result in Eq. 5.
In the following, a special class of potentials will be studied: piecewise linear potentials, so that there are constant gradients in the potential for each lattice section. Figure 3 shows a graphical representation of a possible configuration of the system and its relative representation in terms of partition functions for the various lattice sections. In this case, the partition function for section i will be determined by L i , the length of the lattice section, and m i , the constant gradient of the potential over the region with n particles (V (0) is the value of the potential at the origin of the lattice section), as follows: Finally, the probability for particle l to be at position x within section i can be written as: where ZR(i, n) and ZL(i, n) are the partition functions representing configurations with n particle to the right or left of particle l within section i of the lattice. Such functions are defined through equation 7. Eq. 8 is obtained by substituting Eq. 7 into Eq. 2 where appropriate. Computations via this recast version of Eq. 2 are faster in practice. This is because Eq. 8 now involves powers of single particle partition functions, which can be determined algorithmically through the formula on the RHS of Eq. 7 for this particular class of piecewise potentials.
These algorithms are the main novelty of our analytical approach, and they dramatically reduce the computational time needed to get an explicit expression for nucleosomes position probability distribution functions (PDFs). For instance, implementing these algorithms through 'Wolfram Mathematica', it takes less than 5 seconds for our code to output PDFs for 30 particles on the genomic DNA lattice. This is the method we use to compute the results presented in the next Section.

A case study: point-like nucleosomes on a homogeneus and on a genomic DNA
In this Section we provide a series of analytical results for nucleosome positioning within: (i) a genomic DNA segment (the beta lactoglobuline gene in the sheep); and (ii) a homogeneous DNA as a reference case. As we shall see, we are here able to obtain explicit results for an arbitrary (sequence-dependent) histone:DNA interaction potential which is piecewise linear -equivalently, a potential with piecewise constant first derivative, or gradient. This approximation is useful as it naturally breaks up the DNA molecule into S sections, within each of which the potential is linear. As can be seen in Fig. 4, the case of the genomic DNA segment (which comes from the observed positions of nucleosome dyads, i.e. centres) can be approximated fairly well with a piecewise linear function.

Positional probability distribution function for a chain of ten nucleosomes
In this Section we plot our analytical solution for the positional probability distribution function (PDFs) of each of the nucleosome in a chain where 10 histone octamers deposit on the first 4.6 kilo-base pairs of the sheep beta lactoglobuline gene (explicit expressions were derived using a 'Wolfram Mathematica' code based on Section 2.3). The situation Sheep genomic DNA: a piecewise linear fit Figure 4. This plot shows the DNA potential as inferred from (mono)nucleosome positioning experiments on a genomic segment from the beta lactoglobuline gene from sheep DNA. In yellow a piecewise linear fit to the genomic DNA potential.
we consider corresponds to a coverage of one nucleosome per 460 base pairs, which is about half the physiological one, where one nucleosome corresponds to about 200 base pairs [2,4]. This low density is chosen so as to avoid crowding effects which are likely to invalidate our approximation of point-like nucleosomes which we use in our analytics. . Colours correspond to: particle 1 and 2 (black), particle 3 and 4 (red), particle 5 and 6 (green), particle 7 and 8 (blue), particle 9 (brown), particle 10 (yellow). PDFs for particle 2,4,6,8,10 V(x)=0 Figure 6. Plots of the nucleosomal positional PDFs for 10 particles on a homogeneous DNA. Colours correspond to: particle 1 and 2 (black), particle 3 and 4 (red), particle 5 and 6 (green), particle 7 and 8 (blue), particle 9 and 10 (brown). Fig 5 show the position of the odd and even nucleosomes/particles. Some observations can be made about the most likely positions of the nucleosomes predicted analytically. First, the histone:DNA experimentally obtained potential is higher for the first 1000 bases than for the rest of the molecule. Hence, except for particle 1, no other particle presents a significant probability of localising within such a region. Second, again in line with intuition, we observe that the particles' PDFs have their highest peaks in lattice regions corresponding to troughs or deep minima in the potential. Third, it is interesting to note that, for neighbouring particles, the maxima and minima of the PDFs tend to coincide, but they vary in heights. For instance, the PDFs for particle 1 and particle 2 both have peaks at x ∼ 1500 and x ∼ 1900 (base pairs), but at x ∼ 1500 the peak for particle 1 is higher than for particle 2, while at x = 1900 the situation in reversed. This is caused by the steric, or exclusion, interaction between the nucleosomes, as their order is fixed within the chromatin fibre (they cannot overtake each other). To highlight the effect of sequence, we compare in Fig. 3.1 the PDFs for nucleosomes in the case of the genomic DNA segment with the case of a homogeneous DNA where there is no sequence-dependent variation in the histone:DNA interaction. It can be seen that, at least at this coverage, sequence makes a large difference in determining the nucleosome positional PDFs within a chromatin fiber.

Gap probabilities for 10 particles on the DNA lattice
Another interesting quantity we can obtain from our theory are the 'gap probabilities', which is the probability of a 'gap' of g base pairs between successive nucleosomes. This quantity is, as we will discuss more in detail later on, relevant for digestion experiments with nuclease. In the context of this analytical calculations, such gap PDFs are useful to evaluate the limitations in our approach, as gaps in reality need to be larger than the nucleosome size, i.e. 146 base pairs, which is disregarded in our analytics (which assumes point-like nucleosomes, the interested reader will find more results for gap probabilities in a Tonks gas at [29] and [30]).
How can we compute analytically the probability of a gap of size 'g' between particle 'n' and particle 'n + 1', with particle 'n' at position 'x' and particle 'n + 1' at position 'x + g' ? The route we follow is to perform a normalised sum of all microstates which satisfy the condition of having a gap of g base pairs between the n-th and the (n + 1)-th particles. The relevant formula for this gap PDF is therefore where Z n−1 (x) represents weighted configurations of particles located to the left of particle 'n' and Z N −n−1 (L − x − g) represents weighted configurations of particles located to the right of particle 'n + 1'. The e −V (x) and e −V (x+g) factors take into account the configurational weights of particle 'n' and particle 'n + 1' on the lattice. Figure 7 describes the role of the functions in Eq. 9 §.

…" …"
"""""""x n+1 " ""g" e +V(x)" *Z' n+1 (x)" e +V(x+g) "*Z'' N+n+1 (L+g+x)" instead describes the probability for particle 'n' and particle 'n + 1' to display a gap of size 'g' anywhere on the lattice (L is the total number of base pairs). Through numerical integration, gap PDFs have been produced and are shown in Fig. 8. These have been evaluated and plotted extending the 'Wolfram Mathematica' code introduced in the previous section. § (Z ) N (x) and (Z ) N (x) can be computed either through methods previously shown or much more simply through algorithms presented in section 2.3.  From figure 8, one can readily notice that there is a finite probability for neighbouring particles pairs to have a gap of less than 147 DNA base pairs. Therefore, it is clear that our approximation will be inaccurate for dense nucleosomes systems; however it can still provide a good approximation for dilute systems (see also below, where we compare with numerical simulations incorporating realistic nucleosome sizes). If particles had finite sizes, then P (0) = 0, however average gap sizes < g > are expected to be line with more realistic scenarios.

The effect of nucleosome size: Monte-Carlo simulations of nucleosomes on homogeneous and genomic DNA
In the previous Section, we studied the statistics of point-like nucleosomes on a polynucleosome chromatin fibre, taking into account steric nucleosome-nucleosome interactions by a simple exclusion interactions (nucleosomes could not overtake each other along the DNA chain). It is obviously of interest, especially when the density of nucleosome along the fibre is large, to relax this approximation. In this Section we will therefore present Monte-Carlo simulations of the positioning of nucleosomes along a DNA, again considering the cases of a homogeneous DNA and of the genomic DNA segment, as done in Section 3 analytically. Before we present the results of these simulations, we briefly discuss, as an aside, how one could generalise our previous exact treatment to correctly consider the finite nucleosome size. As this avenue requires ultimately a numerical treatment, we focus later on direct Monte-Carlo simulations. We note that limitations to an equilibrium approach to Monte Carlo simulations of nucleosome position patterns (especially when used to study chromatin in vivo) are discussed in [25]. We also note that Monte-Carlo simulations similar to those reported here can be found in [24][25][26].

A possible exact treatment for excluded volume between nucleosomes on the DNA lattice
Consider having a single nucleosome on a DNA of length L. Then, the weight of a general microstate with the particle occupying the lattice region between x and x + c (c is the size of the particle) is: To show this, first evaluate W (x) for a discrete lattice and then let ∆x → 0 (∆x is the lattice spacing) to represent a continuous lattice: Then, the partition function becomes the sum of all weighted microstates.
The generalisation for Z N is as follows: While these integrals provide an explicit route to the evaluation of the partition function, and ultimately of the positional PDFs for nucleosomes, they cannot easily be evaluated for a large number of particles, even through numerical methods. Therefore, in what follows we present results obtained from Monte-Carlo simulations on a 1D lattice where the finite size is directly considered, together with the effect of the sequence-specific DNA:histone interaction.

Monte-Carlo algorithm, and its validation
Our Monte-Carlo approach consists of dynamical simulations where N particles of finite size (146 base pairs) diffuse on a lattice of L base pairs (L = 4600 when comparing to the analytics, L = 10943 otherwise -the latter value corresponds to the whole beta lactoglobulin gene). We consider variable nucleosome density, corresponding to 10 ≤ N ≤ 30. Each nucleosome interacts with the DNA through volume exclusion (the histones are either pointlike, or with size equal to 147 base pairs) and one of the following: (i) either a sequence-dependent potential obtained from the experiment in Ref. [9] for the genomic DNA sequence extracted from the beta lactoglobulin gene, (ii) or a sequence-independent potential corresponding to a homogeneous DNA. As in Section 3, the sequence-dependent potential, V (x), is derived from the single-nucleosome positional PDFs p(x), mapped experimentally in Ref. [9], as V (x) = −k B T log (p(x)). For the boundary conditions, when wanting to compare with our analytics (N = 10 and L = 4600 base pairs, Section 4.2 and 4.3), we consider an open DNA chain with reflecting boundary conditions at the ends; in particular, nucleosomes are not allowed to move if they are at one of the end of the chains and the trial move will cause them to hop off the chain. Instead, in Section 4.4 we use periodic boundary conditions, corresponding to a DNA loop (as used in chromatin reconstitution experiments in vitro [9]). Initially, all N nucleosomes are dispersed on the DNA lattice uniformly, at a fixed mutual distance (results did not depend on the initial condition provided the simulation was long enough). For each time step, we then selected randomly one of the nucleosomes, and moved it to the right or left (with equal probability). Provided that the move does not lead to a steric clash with neighbouring nuclesomes, we then accepted or rejected the move according to the Metropolis criterion, which depends on the potential which is chosen. Similar algorithms are used in [24][25][26].
As a validation, we first consider the case treated in Section 3, where the nucleosomes are point-like and only interact via simple exclusion, through their inability to overtake each other. In Fig. 9, we compare the analytically derived positional PDF for the first particle with that measured in Monte Carlo simulation. The two PDFs are in fairly good agreement, although they slightly differ quantitatively in some of the peak heights. This discrepancy is most likely due to (small) differences between the real DNA potential and the fitted one, which is piecewise linear. Indeed, there is a larger difference between the DNA potential and the fitted one between bases 175 0 and 2000, which is also the region where the simulated and analytical PDFs differ the most.

Results for the positional PDFs with finite histone size
In this Section, we study the same polynucleosome chain of N = 10 nucleosomes, on a DNA with L = 4600 base pairs, where, however, the histone size is now set to 146 base pairs. This size corresponds to the wrapping of 1.75 turns typical of 10-nm chromatin fibres adopting beads-on-a-string structures.
First, we consider the case of the beta lactoglobulin gene potential. Fig. 10   shows the comparison between the analytical particles PDFs and the finite histone size simulations. Pleasingly, there is a good semiquantitative agreement between simulations and theory: this confirms the expectation that our analytical theory works well at low nucleosome density. The density corresponding to N = 10 is two-to three-fold smaller than the one relevant in vivo (which is 1 nucleosome/200 base pairs); however, this packing is well in the range explored in vitro for reconstituted chromatin.
Similar results are obtained with the homogeneous DNA, where nucleosomes interact only via excluded volume (Fig. 11 shows the comparison between simulations and theory for the positional PDF of particle 1).

Simulated digestion patterns
Finally, in this Section, we consider polynucleosome loops, with different number of nucleosomes, N ; we also study the full lactoglobulin gene. In this way we recreate conditions which are closer to experiments on chromatin reconstituted by salt dialysis. Because the DNA is a loop, the positional PDFs of nucleosomes in the homogeneous potential are uniform (i.e., flat) due to translational invariance. The positional PDFs Figure 10. Comparison between analytical PDF for point particles on the genomic DNA potential in green and numerical simulations for particle positionings with excluded volume in purple. The effect of exclusion does not qualitatively change the particles PDF in terms of number of peaks and relative heights but marginally shrinks them as a consequence of the enhanced steric effect.
corresponding to the sheet potential are instead shown in Fig. 12 for the various cases considered. Similar considerations qualitatively apply as for the analytical results; especially at low density the nucleosomes are confined close to the potential troughs.
In experiments, it is common to assess chromatin structure and relative nucleosome spacing through nuclease digestion assays [4,9,31,32]. These experiments consist of two main steps. First, a chromatin fibre is subjected to the action of an enzyme (typically micrococcal nuclease) which cuts 'unprotected' DNA: i.e. DNA which is not wrapped up in nucleosomes. This step, known as "digestion", leads to a population of DNA fragments of different size: most of these are wrapped around histone octamers, as nuclease quickly degrades naked DNA. Following digestion, the salt concentration is tuned so that histone octamers subsequently unbind. The second main step is then to perform a gel electrophoresis experiment on the remaining fragments of DNA: as the mobility depends on charge, hence length, these experiments give a measure of the size distribution of fragments associated with histone octamers following digestion. The distribution gives a measure of the 1D organisation of the nucleosomes along the fibre: for instance mononucleosomes, dinucleosomes or more complex structures contribute differently to the size distribution as the fragment length associated with those is normally different [4]. It is also of interest to ask whether and to what extent the Comparison between analytical PDF for point particles on the homogeneous DNA lattice (green) and numerical simulations accounting for excluded volume (purple).
sequence affects the probability distribution which is measured by these experiments, and that is the question we address here.
Although the efficiency of micrococcal nuclease depends on local DNA sequence, as a first approximation it is common to model such digestion experiments by assuming that each DNA base pair not associated with a nucleosome is cut with a probability p, which depends on the digestion time and efficiency [31,32]. We started from this simplified view, and simulated digestion experiments for our polynucleosome chains generated in silico. In the simulated digestion, we further assumed that only fragments containing at least one nuclesomes remain in the gel (as nuclease quickly degrades unprotected DNA which it has got hold of). Fig. 13 shows the distributions of fragments resulting from a simulated digestion experiments for a variety of parameters. These plots address the question whether it is possible to detect sequence-dependent nuclesome positioning effects via digestion experiments. Our results show that the sequence signature is extremely sensitive to efficiency, or duration, of digestion, and to nucleosome density. At the same time, quite surprisingly, there is only a very subtle difference between the beta lactoglobulin loops and the homogenous DNA. The largest difference between fragments remaining after digestion with a sequence heterogeneous or homogeneous DNA can be found, according to our simulations, for intermediate densities, and for low or intermediate digestion

efficiency.
These results are non-trivial, but can be rationalised via the following arguments. At small density, the nucleosomes are so far that nuclease cuts usually contain a single nucleosome, so that sequence effects on gap PDFs are not important. At very high density, the spacing is mainly dictated by steric interactions, hence sequence is again irrelevant. At intermediate densities, instead, the positions of potential minima leads to nontrivial correlations in gap PDFs, and this leads to different digestion patterns. This suggests that sequence effects should be more visible at intermediate density. Furthermore, extreme digestion in our model only leaves nucleosomal DNA, with 146 base pair fragments: the resulting distribution is close to a Dirac delta function, and again sequence effects are washed away. Therefore, low or intermediate digestion and intermediate density are more likely to leave detectable sequence signatures. This is in line with our numerical results, although the quantitative magnitude of the effect (which is difficult to estimate a priori) is found numerically to be extremely small (see Fig. 13).

Conclusions
In this work, we have studied a 1D statistical mechanics model of a 1D polynucleosome chain. Our main focus was on the effect of sequence on the 1D organisation of nucleosomes along the chromatin fibre, and to address this we have compared the statistics of a DNA molecule with uniform DNA:histone interaction to another case, of a genomic DNA region, where the DNA:histone potential was informed by existing experimental high-resolution nucleosome positioning data. The results we obtain are therefore the interplay between the sequence-dependent potential and excluded volume interactions between nucleosomes within the chain.
We have first shown that, if we approximate the sequence-dependent potential via a piecewise linear function, we can obtain analytically explicit formulas for the probability distribution functions (PDFs) of each of the nucleosomes within the chromatin fibre. However, our analytical approximation did not consider the effect of the finite size of nucleosomes (which cannot occupy less than 146 base pairs in reality), as, for simplicity, it dealt with pointlike nucleosomes.
We have then presented Monte-Carlo simulations of the model, where finite size effects can be fully taken into account. After validating our algorithm by reproducing the particle PDFs with pointlike nucleosomes, we have studied how the results change when accounting for the finite size of nucleosomes. The general trend is that, as expected, as the nucleosomal density along the fibre increases, the effect of finite size becomes more important; at the same time, for large densities sequence-dependent features become less visible. In practice, it would not be trivial to directly compare predicted PDFs for a polynucleosome chain to experimental data; we therefore simulated a digestion experiment on our polynucleosome chain, as this is a popular assay used to characterise the structure and nucleosome distribution of chromatin fibres in the test tube. The in silico digestion patterns suggest that there is a possibility to pick up directly sequence effects with such assays, however this requires a careful tuning of the duration of digestion, and of the density of nucleosomes. We hope that these results will stimulate further experimental high-resolution work on nucleosome positioning within DNA of different sequence. It would also be of interest to couple our 1D treatment to simulations of the structure of 3D chromosomes, which are normally done on homogeneous fibres [32][33][34][35][36][37][38][39].

Acknowledgments
We acknowledge J. Allan for useful discussions and for providing us with the nucleosome positioning data on the beta lactoglobulin gene. A. M. acknowledges support form the UK Engineering and Physical Sciences Research Council (EP/I004262/1). S. T. acknowledges support form the UK Engineering and Physical Sciences Research Council (EP/L504920/1).