Looping and clustering model for the organization of partitioning proteins on the bacterial genome

The bacterial genome is organized in a structure called the nucleoid by a variety of associated proteins. These proteins can form complexes on DNA that play a central role in various biological processes, including chromosome segregation. A prominent example is the large ParB-DNA complex, which forms an essential component of the segregation machinery in many bacteria. ChIP-Seq experiments show that ParB proteins localize around centromere-like parS sites on the DNA to which ParB binds specifically, and spreads from there over large sections of the chromosome. Recent theoretical and experimental studies suggest that DNA-bound ParB proteins can interact with each other to condense into a coherent 3D complex on the DNA. However, the structural organization of this protein-DNA complex remains unclear, and a predictive quantitative theory for the distribution of ParB proteins on DNA is lacking. Here, we propose the Looping and Clustering (LC) model, which employs a statistical physics approach to describe protein-DNA complexes. The LC model accounts for the extrusion of DNA loops from a cluster of interacting DNA-bound proteins. Conceptually, the structure of the protein-DNA complex is determined by a competition between attractive protein interactions and the configurational and loop entropy of this protein-DNA cluster. Indeed, we show that the protein interaction strength determines the"tightness"of the loopy protein-DNA complex. With this approach we consider the genomic organization of such a protein-DNA cluster around a single high-affinity binding site. Thus, our model provides a theoretical framework to quantitatively compute the binding profiles of ParB-like proteins around a cognate (parS) binding site.


I. INTRODUCTION
Understanding the biophysical principles of chromosome structure in both eukaryotic and prokaryotic cells remains an outstanding challenge [1][2][3][4][5][6][7]. Many bacteria have a single chromosome with a length three orders of magnitude longer than the cell itself, posing a daunting organizational problem. Owing to recent technological advances in live-cell imaging and chromosome conformation capture based approaches, it is becoming increasingly clear that the DNA is not coiled like a simple amorphous polymer inside the cell [8][9][10], but rather exhibits a high degree of organization over a broad range of lengthscales [11]. It remains unclear, however, how this spatial and dynamic organization of the chromosome is established and maintained inside living bacteria [12]. A host of so-called Nucleoid-Associated Proteins (NAPs) have been shown to play a central role in the spatial organization of the bacterial chromosome [12][13][14]. Bacteria can express up to tens of thousands of copies of such NAPs, including proteins such as H-NS, HU, or FIS; such NAPs bind to the DNA in large numbers, and by interacting with each other and with DNA in both sequence-dependent and sequence-independent manners they can collectively structure the DNA polymer and control chromosome organization. In many bacterial species, the broadly conserved ParABS system is responsible for chromosome and plasmid segregation [12,15]. A large ParB-DNA complex forms the so-called partitioning complex ParBS. This partitioning module assembles around centromere-like parS sites, frequently located near the origin of replication. The ParBS complexes can subsequently interact with ParA ATPases, leading to the segregation of replicated origins [16][17][18][19][20][21][22][23].
How is this ParBS partitioning module physically organized on the DNA? ParB is known to bind specifically to parS, triggering the formation of a protein-DNA cluster containing hundreds of ParB proteins, which is visible as a tight focus in microscopy images of fluorescently labeled ParB [15,19,24,25]. The propensity of ParB to form foci around parS has been exploited in recent studies, which used exogenous expression of fluorescently labeled ParB along with parS insertion to label DNA loci for live-cell imaging [26,27]. In the F-plasmid of Escherichia coli cells, each ParB focus contains roughly 300 proteins, together representing 90% of all ParB present in the cell [25]. ParB protein numbers of a similar order of magnitude have been estimated for the chromosomal ParBS complex of Caulobacter crescentus [16]. Various models have been introduced to explain the distribution of ParB along DNA around parS sites. An early study of the distribution of ParB proposed that ParB proteins spread from the parS sequence by nearestneighbor interactions, forming a continuous filament-like structure along the DNA [28]. This model was termed the Spreading model. However, this is effectively a 1D model with short range interactions. On general statistical physical grounds, such a 1D model cannot be expected to account for the formation of a large coherent protein-DNA complex, given physiological protein interaction strengths [25,29]. Furthermore, even if continuous filament-like structures could form, it has been argued that that the number of ParB proteins available in the cell is not sufficient to allow enrichment at the large genomic distances from parS observed experimentally by simple 1D polymerization of ParB along DNA [25].
To resolve the puzzle of how the vast majority of ParB proteins can remain strongly localized by a single parS site, we recently introduced a novel theoretical framework to study the behavior of interacting proteins that can bind to a DNA polymer [29]. Physically, this model describes the collective behavior of interacting proteins on a self-avoiding polymer chain. This model suggested that ParB assembles into a three-dimensional complex on the DNA, as illustrated in Figure 1a,b. In this model, each protein can interact with nearest neighbors along the DNA through a 1D spreading interaction and, in addition, each protein can interact through a 3D bridging interaction with another ParB protein bound to a site that may be distant on the DNA albeit close in 3D space. This Spreading & Bridging model is supported by single molecule experiments, which provide direct evidence for the presence of bridging interactions between two ParB proteins on DNA [30,31]. Importantly, computational studies have shown that the Spreading & Bridging model supports phase transitions [29]. Indeed, one of our central results was that a combination of 1D spreading bonds and a 3D bridging bond between ParB proteins constitutes the minimal model for condensation of ParB proteins on DNA into a coherent complex, consistent with the observation that ParB-GFP fusion proteins form a tight fluorescent focus on the bacterial DNA [15,19,24,25]. Moreover, the condensation of ParB into a coherent protein-DNA complex predicted by the Spreading & Bridging model also naturally ensures the localization of the majority of ParB proteins by a single parS site, because it is easyer to localize a large number of proteins when they robustly form a single large cluster. By contrast, models with just 1D spreading interactions or just a 3D bridging interaction between proteins were found to result in fragmented protein clusters dispersed over the whole DNA.
The 3D structure of ParB-DNA complexes can have important implications for the binding profiles of ParB on DNA. Recent high-precision ChIP-Seq experiments on the F-plasmid in E. coli [25] provide quantitative binding profiles, which are strongly peaked around parS with a broad decay of up to 13 kilobasepairs (kb), consistent with earlier observations [24,28]. To quantitatively account for these measurements of the distribution of ParB proteins on DNA two approaches can be employed: On the one hand, the Spreading & Bridging model results in binding profiles that can be easily calculated analytically in the limit of very strong protein-protein interactions. In this limit, the cluster of ParB on the DNA becomes compact with a corresponding triangular distribution of ParB along DNA. On the other hand, the binding profiles can also be estimated with the so-called Stochastic Binding model, where a sphere of high ParB concentration is assumed to exist within which a DNA polymer freely fluctuates [25]. In this Stochastic Binding model, proteins can bind and unbind stochastically from the DNA when it resides within the region of high ParB concentration centered around parS, as shown in Figure 1c. The description of the average protein binding profile is thus similar to the return statistics of the polymer into the ParB sphere [32], implying a long range (power-law) distribution of ParB proteins along DNA. This model predicts a profile that is in better agreement with the high-resolution ChIP-Seq data, highlighting the importance of loops to influence the long range decay of protein occupation probability. However, this biophysical model does not explain how the ParB proteins condense around parS. When combined, these two approaches could provide crucial insights into how ParB clusters are formed and how the polymeric nature of the DNA can impact the organization of ParB on DNA.
In this paper, we propose a comprehensive theory to describe the distribution of ParB proteins on the chromosome in terms of molecular interaction parameters. We expand on the ideas of the Spreading & Bridging and the Stochastic Binding models to provide a quantitative analytic approach to describe the genomic organization of ParB that are bound around parS sites on the DNA. To this end, we develop a simple model for protein-DNA clusters that explicitly accounts for the competition between protein-protein interactions, which tend to favour a compact cluster, and the entropy associated with the formation of loops, which favours a looser cluster configuration. This Looping and Clustering model represents a reduced, approximate version of the full Spreading & Bridging model that provides a clearer understanding and greatly facilitates calculations of the distribution profile of ParB or other proteins that form protein-DNA clusters.

II. THE MODEL
To theoretically describe the protein binding profiles of ParB on DNA, we first consider a DNA polymer of length L that can move in space on a 3D cubic lattice and is in contact with a "cytoplasm" containing a finite number of proteins. These proteins are able to bind to/unbind from the lattice sites on the DNA along which the proteins can freely diffuse. Importantly, in this model the DNA itself is also dynamic and fluctuates between different threedimensional configurations, which are affected by the presence of interacting DNA bound proteins. When proteins are bound to the DNA, they are assumed to be able to interact attractively with each other by contact interactions in two distinct ways: (i) 1D spreading interactions with coupling strength J S , defined as an interaction between proteins on nearest-neighbor sites along the polymer, and (ii) a 3D bridging interaction with strength J B between two proteins bound to sites on non-nearest neighbor-sites on the DNA, but which are positioned at nearest neighbor-sites in 3D space (see Figure 1a,b). Thus, these bridging interactions couple to the 3D configuration of the DNA, while the 1D spreading interactions do not. In prior work, we introduced this model for interacting proteins on the DNA, which was termed the Spreading & Bridging model [29]. The Hamiltonian for this model is given by: where the variable φ i ∈ {0, 1} represents the occupancy by a protein of the i-th DNA binding site, and the Kronecker delta δ |ri−rj |,1 is equal to one when binding sites i and j with respective spatial locations r i and r j are positioned on nearest-neighbor binding sites in space and it is zero otherwise. Note, this particular form of the Hamiltonian, in principle, allows a valency of up to 4 bridging bonds per protein on a 3D cubic lattice. Single-molecule experiments provide evidence for bridging bonds [30], but the actual bridging valency of a ParB protein may be limited to one [33,34]. To capture this, the Hamiltionian in the Spreading & Bridging can be easily adjusted to reduce the valency for 3D bridging bonds per protein. Even in the realistic limit where each protein can form two spreading bonds and a single bridging bond, the system has been shown to exhibit a condensation transition where the majority of the proteins form a single large cluster that can be localized by a single parS site on the DNA [29]. While it is possible to perform Monte Carlo simulations of the Spreading & Bridging model for a lattice polymer, it remains challenging to perform analytical calculations within this framework. In this paper, we are primarily concerned with describing the average binding profile of proteins along the DNA (see right panels in Figure 1). With this aim in mind, we can simplify our model by realizing that the configurations of ParB proteins along the DNA are more sensitive to J S than to J B . While both spreading and bridging bonds are necessary for the condensation of all proteins into a single cluster, loop extrusion from the cluster is controlled by J S , and such loop extrusion will strongly impact the binding profile of proteins on the DNA. Thus, we will assume that J B is sufficiently large that approximately all available bridging bonds between the proteins inside the 3D protein-DNA cluster are satisfied, leaving J S as the main adjustable parameter in the model.  [25]. This model can be seen as taking the limit of the spreading bond strength to zero (JS → 0), and thus the formation of loops are not hampered by protein-protein bonds. In this limit, the binding profile can be described as the return of the polymer to an origin of finite size, which is given by P (s) ∝ (s + C) −1.5 , where C is a constant.
A contiguous 3D cluster of proteins on DNA with loops can effectively be represented by a disconnected 1D cluster along the DNA, where connections in 3D between the 1D subclusters are implied, and domains of protein-free DNA within the disconnected 1D cluster represent loops that emanate from the 3D cluster (see Figure 1b,c). We can describe this system by a reduced model for the effective 1D cluster in which we account for the entropy of the loops that originate from the protein-DNA cluster. In this model, the spreading bond energy, set by the parameter J S , competes with the formation of loops and will therefore play a crucial role in determining the binding profile of ParB on DNA around a parS site.
To capture these effects, we propose the reduced Looping and Clustering (LC) model, which offers a simplified description of 3D protein-DNA clusters with spreading and bridging bonds. In this model a loop is formed whenever there is a gap between 1D clusters. We can make the connection between the gaps in the 1D cluster and the number of loops extending from the 3D cluster explicit by writing down the partition function for this model. The effective 1D cluster corresponding to a 3D cluster with m proteins and n loops has a multiplicity: which counts the number of ways in which one can partition m proteins into n + 1 subclusters in 1D. In addition, we note that creating n loops will require breaking n spreading bonds, and the probability at equilibrium for this to occur will include a Boltzmann factor ∼ exp (−nJ S ), where the interacting energy is expressed in unit of k B T . The loops that are formed are assumed to be independent, and thus modify the configurational entropy as [32]: where d is the spatial dimension, ν is the Flory exponent, and 0 is the lower cutoff of loop sizes and approximately represents the persistence length of DNA. This entropy is obtained by considering both the loops formed within the protein cluster and the protein-free segment of DNA outside the cluster. Indeed, the number of configurations associated with loop i for a Gaussian polymer is given by z i −dν i [32,35], where z is the lattice coordination number. Thus, there is also an extensive contribution to the entropy given by k B i log(z). However, when a loop of length i forms, the same length of polymer is removed from the DNA outside of the cluster, which also results in a reduction of the entropy by k B i log(z). Thus, there is a precise cancellation between the extensive contribution to the entropy associated with the loop inside the cluster and the extensive contribution due to effectively shortening the DNA outside the cluster.
It is now straightforward to write down the partition function of the Looping and Clustering model: where all lengths are measured in units of protein binding sites, and the bond interactions are in units of k B T . In the partition function, it is convenient to set the upper boundary of integration to infinity. Strictly speaking, the upper boundary should be L − (m + ), where = i i represents the total accumulated loop length. In practice, however, for chromosomes, but arguably also for plasmids, L m and the probability to have a large loop is very small. For instance, if we consider the F-plasmid of E. coli with a length of 60 kbp, it would correspond to L = 3750 in units of the ParB footprint of 16 bp [25,36]. For this system, Monte Carlo (MC) simulations of the LC model, with m = 100 reveal that the average cumulated loop size is ≈ 140 for small couplings (J S = 2) down to ∼ 10 for large couplings (J S = 5), which in both cases is much less than the DNA length. Thus, for biologically relevant cases it is reasonable to assume that the length of the DNA polymer is much larger than the footprint of the whole protein complex on the DNA.
From the partition function, we can write down an effective 1D Hamiltonian for the LC model, which explicitly accounts for the balance between spreading bonds and loop entropy: This Hamiltonian is useful to perform Monte Carlo simulations of the model as a benchmark for the approximations performed in the analytical approach described in the next sections.
In summary, the LC model constitutes a simple statistical mechanics approach to describe how proteins assemble into a protein-DNA cluster with multiple loops. To calculate the average distribution of proteins along the DNA, we will assume that the affinity of ParB binding to parS is sufficiently strong such that one of the proteins in the cluster is always bound to a parS site. In the next sections we will describe how to compute the ParB binding profile around this parS site given a fixed number of loops with specified loop lengths. Then we will use the statistical mechanics framework provided above to perform a weighted average over all possible loop numbers and sizes to arrive at a simple predictive theory for the ParB binding profile.

A. 1-loop binding profile
It is instructive to start our analysis of ParB binding profiles by first calculating the probability of ParB occupancy as a function of distance from the parS site for the case of a protein-DNA cluster with only one emanating DNA loop (n = 1) with fixed loop length . We will assume a fixed number m of ParB proteins in this 1-loop protein-DNA cluster, and that one of these proteins is bound to the parS site at any time, as illustrated in Figure 2. Thus, to calculate the 1-loop ParB binding probability, P 1 (s, ), at a distance s from parS, we need to consider all possible configurations of proteins in the protein-DNA cluster subject to these constraints. IG. 2: Schematic of the system with m proteins and a single loop of size . The whole cluster is split in two parts: m1 is the number of proteins in the cluster that overlaps with parS and m − m1 is the number of proteins in the other cluster. The origin of the genomic coordinates is parS, the right edge of the system (RE) is located at the coordinate s . We can divide the configurations into two equally likely cases: (i) the leftmost cluster overlaps with parS or (ii) the rightmost cluster overlaps with parS.
First, we note that P 1 (s, ) = 0 for s > m + , because the 1D cluster can maximally extend to a distance m + , which occurs when the 1D cluster adopts a configuration that lies entirely on one side of the parS site. For a binding site at a distance s < m + , the ParB binding probability is reduced, either by configurations where this site is located on the DNA loop within the 1D cluster, or by states where the 1D cluster adopts a configuration around the parS site that does not extend to the binding site at s, placing this site outside the 1D cluster. To capture these effects, it is helpful to express P 1 (s, ) in terms of conditional probabilities: where "loop@s" represents a condition corresponding to site s being part of a loop extruding from the cluster, i.e. an unoccupied site on the DNA within the protein cluster, as depicted in Figure 2. The overbar here represents the complementary condition, and the expression above simplifies because P 1 (s, |loop@s) = 0 by construction. We can proceed to calculate the conditional probability, P 1 (s, |loop@s), by decomposing it as a sum of probabilities of mutually exclusive configurations, which are conditioned by the location s of the right edge of the 1D ParB cluster denoted as "RE@s " (see Figure 2). Then, we will take a continuous limit for the binding profile assuming m 1, and express the binding profile P 1 (s, |loop@s) in terms of probabilities, P (RE@s ), for the condition describing the position of the right edge of the cluster. Thus, we first write the conditional probability P 1 (s, |loop@s) for s ≥ 0 (the case s < 0 is obtained by symmetry) as: Clearly, P 1 (s, |loop@s; RE@s ) = 1 when s < s and zero otherwise, and thus we have replaced this term by the unit step function θ(s − s) in the second line above. To calculate P (RE@s ), it is convenient to introduce two subclusters, 1 and 2, with m 1 and m − m 1 proteins respectively (0 < m 1 < m), such that cluster 1 with m 1 proteins is overlapping with parS, as shown in Figure 2. Given two such subclusters, two equally likely situations can occur: (i) the leftmost cluster overlaps with parS, i.e. m − m 1 + ≤ s < m + or (ii) the rightmost cluster overlaps with parS, i.e. 0 ≤ s < m 1 . This directly allows us to construct the conditional probability to find the right edge of the whole system, such that one of the m 1 proteins in the cluster overlaps with parS:  (12), and dashed curves represent data obtained from exact enumeration as a benchmark for analytical approximations. We note that for = 0, we recover the triangular profile of the S&B model in the strong coupling limit JS → ∞ [29].
where the prefactor 1/2 comes from the equal probabilities to find the system in one of the two cases (i) and (ii). The conditions (i) and (ii) are encoded with a product of two unit step functions for (i) and a single step function for (ii). Each single realization can be obtained by shifting the position of the site in cluster 1 overlapping with parS and is equally likely, giving rise to an overall prefactor 1/m 1 . From this, we can obtain the full probability P (RE@s ) by integrating over m 1 : where p(m 1 ) = 2m 1 /(m(m − 2)), since the number of configurations for each m 1 is ∝ m 1 and m 1 ∈ [1, m − 1]. After evaluating this expression, we obtain the normalized probability distribution for the right edge of the 1D cluster to be positioned at s , which we use to compute the conditional probability in Eqs. (6) and (7): To obtain the full 1-loop protein distribution (Eq. (6)), we first need to compute the probability for a site to not be part of loop, If the loop density, ρ, were uniform, we would simply have p uni. (loop@s) = ρ uni. (s, m, l) = m+ , since the 1D cluster has a total length of m + with a single loop of length . This uniform condition would only apply if we randomly choose sites to be part of the loop and ignore the requirement that all these loop sites need to be neighboring. In a real cluster, however, we expect the loop density ρ(loop@s) to be higher in the bulk of the 1D cluster than close to the parS site or the edges, because fewer loops can be formed near the parS site or near the boundaries of the 1D cluster, at which a protein must be bound by construction. In particular, we expect the loop density, In the normalization of this expression we distinguish the cases where the loop is either smaller or larger than the number of proteins in the cluster. With Eqs. (10) and (12), we have all the elements to calculate the 1-loop protein binding profile P 1 (s, ) from Eq. (6). We investigated the binding profiles P 1 (s, ) predicted by this model for a selected set of parameters, as shown in Figure 3. We only show s > 0 because of the symmetry of the binding profile. It is instructive to contrast these profiles with the triangular profile (black curve) for a cluster with no loops. As expected, the addition of loops widens the profile, allowing it to extend out to a distance m + . The widening of the binding profile is accompanied by a faster decay of the profile in the vicinity of parS, which crosses over to a flatter profile at distances s > due to additional contributions from configurations where the loop lies between the parS site and site s.
Interestingly, for some cases the profile even becomes non-monotonic with a slight increase near the far edges of the domain. These features of the profile reflect the reduced loop density near parS and near the far edges of the cluster. Note, the integral under this curve remains constant for varying to conserve the number of particles in the cluster. To verify the validity of the analytical approximations leading to P 1 (s, ), we used exact enumeration as a benchmark. Overall, the numerics and the analytics are in good agreement for the 1-loop case, as shown in Figure 3. In the next section, we employ the approximate analytical expressions to efficiently calculate the full binding profile averaged over all configurations.

IV. PROTEIN BINDING PROFILES AND OTHER PROPERTIES OF THE LOOPING AND CLUSTER-ING MODEL
Above we defined the Looping and Clustering model and calculated the binding profile of proteins around a parS site for a cluster with either 0 loops or 1 loop with fixed length. Real protein-DNA clusters, however, are expected to fluctuate with new loops forming and disappearing continuously. To capture such fluctuations, we will use the expressions for the binding profile of a static cluster with fixed loop length together with a statistical mechanics description of the LC model to obtain average binding profiles for dynamic clusters, including an ensemble average over both the number of loops and the loop lengths.
To obtain a full binding profile averaged over all realizations, it is useful to investigate the statistics of loops that extend from the protein-DNA cluster and how these statistics are determined by the underlying microscopic parameters of the model. We start by considering the number of loops that extend from the cluster. Using the partition function in Eq. (4), it is possible to calculate the basic features of the LC model. For instance, the moments of the distribution of the number of loops are given by: From this, we easily find the average loop number is: where x = e −J S 1−dν 0 /(dν − 1). The average loop number n is depicted in Figure 4a, demonstrating the exponential dependence on the spreading energy J S . In Figure 4b, we plot n as a function of the total number of proteins m in the protein-DNA cluster. Over a broad range of parameters, we observe the expected linear dependence of the average loop number n on m.
The linear dependence on m reflects that, in the Looping and Clustering model, loops can form anywhere in the cluster. However, one would expect that loops can only form at the surface of a 3D cluster. For a compact, spherical cluster, this would result in a dependence n ∼ m 2/3 . However, Monte Carlo simulations of the full S&B model have revealed that the protein-DNA clusters are not compact [29], but rather have a surface that scales almost linearly in m, close to the behavior of the simplified LC model presented here.
The the enrichment of bound ParB as a function of genomic position on the DNA, providing a measure of the average protein binding profile of ParB on DNA [24,25]. In the LC model, the ParB density profile along DNA can be calculated from: where Z LC is given in Eq. ρ(s, m, i , ) by using a generalization of the 1-loop expression in Eq. (12), In the analysis above, we aimed to capture the effects of multiple loops in a simple way by assuming statistical independence of the loops, and by using the analytical 1-loop expressions to approximate the impact of loop formation on the loop density and the ParB binding profile of the protein-DNA complex. To test the validity of these approximations, we performed MC simulations of the complete LC model. We find that the numerically obtained loop probability is in reasonable agreement with our approximate expression for the multi-loop density, as shown in Figure 4c. Thus, despite the simplicity of our approach, the analytical model provided here captures the essential features of looping in protein-DNA clusters. The full protein binding profile P (s) around a parS site is calculated by averaging the static binding profile for different total loop numbers and loop lengths using the Boltzmann factor (see the partition function Eq. (4)) from the Looping and Clustering model as the appropriate weighting factor. The resulting expression in Eq. (15) for the protein binding profile of a protein-DNA cluster is the central result of this paper. We use this expression to compute binding profiles for the full Looping and Clustering model, which are shown in Figure 5 as a function of the distance s to parS for m = 100, 200, and 400. By construction, the site s = 0 corresponding to parS is always occupied, and thus P (s = 0) = 1 for all values of the spreading energy J S . This feature of the LC model captures the assumed strong affinity of ParB for a parS binding site. For J S = 5, the binding profile converges to a triangular profile, implying a very tight cluster of proteins on the DNA with almost no loops. The triangular profile in this case results from all the distinct configurations in which this tight cluster can bind to DNA such that one of the proteins in the cluster is bound to parS, and therefore the probability drops linearly to 0 at s ≈ m. The same triangular binding profile was observed for the S&B model in the strong coupling limit J S → ∞ [29]. Interestingly, as J S becomes weaker, we observe a faster decrease of the binding profile near parS together with a broadening of the tail of the distribution for distances far from parS. This behavior results from the increase of the number of loops that extrude from the ParB-DNA cluster with decreasing spreading bond strength J S . The insertion of loops in the cluster allows binding of ParB to occur at larger distances from parS. Thus, the genomic range of the ParB binding profiles is set by s max ≈ m + , where the average cumulated loop length is controlled by J S (see Figure 4, main graph) and m These results illustrate how the full average binding profile is controlled by the spreading bond strength J S : the weaker J S , the looser the protein DNA-cluster becomes, which results in a much wider binding profile of proteins around parS.
In the limit J S → 0, the LC model quantitatively reduces to the statistics of non-interacting loops predicted by the Stochastic Binding model [25], as shown in Figure 6. In this case, the binding profiles exhibit asymptotic behaviour P (s) ∝ s −1.5 for large s. Interestingly, when the number of cluster proteins m increases, we observe the appearance of a second regime where P (s) ∼ s −α with α ≈ 0.5. We attribute this weaker scaling P (s) with s at intermediate genomic distances to the reduced loop density near parS (see Figure 4c). A similar behaviour was also observed for the Stochastic Binding model: at small genomic distance from parS, the DNA is assumed to be always inside the region of high ParB concentration, leading to a slower decay of the binding profile of ParB near parS.
To investigate how the functional shape of the binding profile is determined by the total number of proteins in the cluster, we plot the binding probability versus the scaled variable s/m for m = 100, 200, and 400, as shown in Figure 7. For fixed J S , the data approximately collapse onto a single curve as a function of the scaled distance s/m. This implies that the functional shape of the ParB binding profile is largely determined by the spreading bond strength J S , while the number of proteins in the cluster determines the width of the profile.

V. DISCUSSION
Here we introduced the Looping and Clustering model to describe the average genomic configuration of proteins in a large protein-DNA complex. In our model, the formation of a coherent cluster of ParB proteins is ensured by a combination of spreading and bridging bonds between DNA bound proteins, which together can drive a condensation transition in which all ParB proteins form a large protein-DNA complex localized around a parS site [29]. We do not assume, however, that this protein-DNA cluster is compact. Indeed, loops of protein-free DNA may extend from the cluster, which strongly influences the average spatial configuration of proteins along the DNA. In the LC model, the formation of loops in the protein-DNA cluster is controlled by the strength of spreading bonds, i.e. the bond between proteins bound to nearest neighbor sites on the DNA. Specifically, for every protein-free loop of DNA that extends from the cluster, a single spreading bond between two proteins within the cluster must be broken. Thus, if the spreading interaction energy, J S , is sufficiently small, thermal fluctuations will enable the transient formation and breaking of spreading bonds, thereby allowing multiple loops of DNA to emanate from the protein cluster (See Figure 1). The LC model predicts a profile in good quantitative agreement with binding profiles measured with ChIP-Seq on the F-plasmid of E. coli [25] with J S = 1k B T and m = 400, as shown in Fig. 5c. Our results also have implications for experiments that employ fluorescent labelling of DNA loci by exogenous ParBs [26,27]. Indeed, our model can be used to investigate how the protein interaction strengths determine the 3D structure and mobility of the ParB-DNA cluster, as well as the tendency of multiple ParB foci to adhere to each other.
The Looping and Clustering model, which we introduce to calculate the binding profile of ParB-like proteins on the DNA, is a simple theoretical framework similar to the Poland-Scheraga model for DNA melting [37,38]. An important difference in the LC model with respect to the homogeneous Poland-Scheraga model, is that translational symmetry is broken due to the presence of a parS site at which a protein is bound with a high affinity such that loops are effectively excluded in the vicinity of parS. We show that the binding profiles predicted by this model are sensitive to both the expression level of proteins and the spreading interaction strength, which directly controls the formation of loops in the protein-DNA cluster. Our model thus provides a means to use binding profiles, measured for instance in ChIP-Seq experiments, to infer molecular interaction strengths of the proteins that form large protein-DNA clusters.
Conceptually, the spreading bond interaction determines how "loose" the protein DNA cluster is, which directly impacts the ParB binding profiles. When J S is large, loop formation is unlikely, resulting in a compact protein-DNA cluster with a corresponding triangular protein binding profile centered around parS. At intermediate J S , the protein-DNA cluster becomes looser with the formation of loops, resulting in a binding profiles that are more strongly peaked around parS but with far-reaching tails, which is in accord with high-resolution ChIP-Seq experiments [25]. In the limit J S → 0, the loops can form at any position with no energetic cost. In this limit, the binding profile J S → 0 is consistent with the Stochastic Binding model with a profile of the form [25]: P (s) ∝ s −1.5 ( Figure 6). Thus, the Looping and Clustering model offers a description for a broad parameter regime, connecting two limits investigated in preceding studies [25,29]. This model provides an insightful quantitative tool that could be employed to analyze and interpret ChIP-Seq data of ParB like proteins on chromosomes and plasmids.