Polymer compaction and bridging-induced clustering of protein-inspired patchy particles

There are many proteins or protein complexes which have multiple DNA binding domains. This allows them to bind to multiple points on a DNA molecule (or chromatin fibre) at the same time. There are also many proteins which have been found to be able to compact DNA in vitro, and many others have been observed in foci or puncta when fluorescently labelled and imaged in vivo. In this work we study, using coarse-grained Langevin dynamics simulations, the compaction of polymers by simple model proteins and a phenomenon known as the"bridging-induced attraction". The latter is a mechanism observed in previous simulations [Brackley et al., Proc. Natl. Acad. Sci. USA 110 (2013)], where proteins modelled as spheres form clusters via their multivalent interactions with a polymer, even in the absence of any explicit protein-protein attractive interactions. Here we extend this concept to consider more detailed model proteins, represented as simple"patchy particles"interacting with a semi-flexible bead-and-spring polymer. We find that both the compacting ability and the effect of the bridging-induced attraction depend on the valence of the model proteins. These effects also depend on the shape of the protein, which determines its ability to form bridges.


Introduction
Within both bacteria and eukaryotes, there are many DNA binding proteins which interact with the genome to perform different functions. Many of these proteins form complexes which have more than one DNA binding domain. Such a complex can then bind the chromosome at multiple places at the same time, forming molecular bridges. This is thought to be at the basis of several mechanisms for gene regulation. For example, in higher eukaryotes gene expression from a promoter is often activated when proteins (i.e., transcription factors, TFs) bind at a site on the chromosome called an enhancer; promoters and enhancers can be separated by millions of base-pairs of DNA [1]. The current model for how enhancers operate is that they must come into close physical contact with the promoter, forming a chromosome loop, and that this interaction is mediated by a complex of DNA binding proteins (e.g., TFs and RNA Polymerase II). Another regulatory mechanism is that chromosome regions which need to be inactivated are often compacted-again bridgeforming DNA binding complexes have been implicated (e.g., the heterochromatin protein HP1 [2], or the polycomb repressive complexes, PRC1 and PRC2 [3]).
Previously [4,5], using simple polymer simulations, we uncovered a remarkable property of bridge forming proteins interacting with DNA or chromatin (the fibrous material formed from DNA and proteins which makes up eukaryotic chromosomes [6]). When protein complexes are represented in a simulation as simple spheres with an isotropic attractive interaction with the polymer, they will spontaneously form clusters on that polymer, even in the absence of any explicit proteinprotein attraction [ Fig. 1(a)]. This occurs because once a protein binds to two polymer regions to form a bridge, there is a local increase in polymer density-a second protein is more likely to bind the polymer in the same region. This sets up a positive feedback (bridging increases local polymer density which increases bridging). We termed this effect the bridging-induced attraction (BIA). An alternative way of looking at the effect is that forming a polymer-protein-polymer bridge leads to an enthalpy gain, but an entropic loss (a bridge forms a loop, which reduces the entropy of the polymer). Placing a second bridge in the same region gives a second enthalpy gain without incurring a further entropic penalty.
More specifically, the BIA only operates when the proteins can form bridges (i.e., they can bind at least two polymer regions at a time), and the interaction energy must be strong enough such that bridges form for a sufficiently long time for cluster growth to commence. We previously [4] found that the mechanism is robust to changes in protein size and polymer stiffness (though this can alter the shape of the emerging clusters). We also previously found that the mechanism still works for the minimal case of proteins which can only bind two polymer regions at the same time, though the effect is weaker. Here, we investigate the effect in more detail with regard to the valency and shape of the model proteins. We also examine the ability of these model proteins to compact the polymer, which was also previously studied for the case of isotropic spheres [7,8] In other previous work, we found that by "patterning" the polymer with specific (strong attractive interaction) and non-specific (weak attractive interaction) protein binding sites, the action of the BIA drives the polymer into specific structures [4,9]. By introducing multiple species of spherical proteins with different binding sites, simulations can reproduce the structure observed in mammalian genomes [9]. This includes the "compartmentalisation", or spatial segregation of active and inactive regions of the genome, and the formation of domains of enriched self-interactions (often 0P central patch Figure 1: The bridging-induced attraction (BIA). (a) Snapshots of a simulation of isotropic spheres interacting uniformly with a polymer. The initial condition is shown on the left; the BIA leads to clustering (right; snapshot after 10 5 τ -at longer times spheres merge into a single cluster). For clarity, in the right hand image only spheres bound to the polymer are shown (a sphere is said to be bound to a polymer bead if their centre-to-centre distance is less than 1.7 σ). called "topologically associated domains", or TADs, in the chromatin biology literature) [10]. For example, with just two species of protein, and with two sets of binding sites inferred from experimental data, simulations can correctly predict over 80% of the positions of TAD boundaries in human chromosomes (as revealed in chromosome-conformation-capture experiments [11]). We also showed that for non-specific binding of isotropic sphere model proteins with an excess of polymer, the clusters will grow, or coarsen, until there is a single large cluster containing protein and polymer beads, surrounded by a region with unbound polymer and low protein density [12]. In other words the pro-teins proceed to phase separate in a way reminiscent of a binary fluid [13,14]. The coarsening can be arrested either by introducing specific strong binding sites, or by non-equilibrium processes [12].
Here we study the BIA in the context of more detailed model proteins. Specifically, we perform Langevin dynamics simulations of simple "patchy" model proteins with a finite number of polymer binding sites. In the next section we give details of the model and simulations. We then present results for model proteins with valence (number of "patches") 2, 3 or 4, both for the case where there is an abundance of polymer and low protein concentration (studying cluster formation via the BIA), and where there is high protein concentration such that the "binding sites" on the polymer become saturated (and we examine the resulting polymer compaction). Finally we present a similar analysis for the case of valence-2 proteins with patches at different angles-i.e., we examine the effect of protein "shape". Our main results are that the BIA operates more strongly for higher valence proteins, and that the main contribution of protein shape is that it affects the ability of the proteins to form bridges. This will be of interest in understanding the mechanisms through which DNA-binding proteins form foci in cells, and how they impact on genome organisation and regulation.

Model
To study the BIA in more detail, here we use Langevin dynamics simulations of a simple polymer model, with proteins represented by "patchy particles" [ Fig. ??(b)]. The latter are rigid bodies made up from spheres (a "central" bead surrounded by "patch" beads), similar to the patchy particles models common in the soft matter physics literature [15][16][17][18]. We use a common bead-and-spring polymer model [19], which could represent DNA or chromatin-we use a persistence length close to that of the latter. L beads of diameter σ are connected in a chain via FENE bonds, described by the potential where r i,i+1 = |r i+1 − r i | is the separation between two beads in the chain, k B T is the Boltzmann constant multiplied by the temperature, and K FENE = 30 k B T and R 0 = 1.6 σ are an energy scale and the maximum bond length respectively. Polymer bending rigidity is provided by a Kratky-Porod potential given by where θ i is the angle formed between three consecutive beads along the chain, defined by cos(θ i ) =t i ·t i+1 , wheret K BEND is the bending rigidity, related to the intrinsic polymer persistence length l p = K BEND σ/(k B T ); we set l p = 3 σ. Finally, steric interactions between polymer beads is provided by a Weeks-Chandler-Anderson (WCA) potential given by where r ij = |r i − r j | is the separation between any two beads i and j.
We consider nine different patchy protein models [ Fig. 1(b)]. Previous simulations of patchy particles have often involved a complicated potential where interactions are determined by the angle between points on the surface of a spherical bead (e.g., [20]); here, we instead use a simple scheme where the central bead interacts sterically with polymer beads, while patch beads interact with the polymer via an attractive interaction but no steric interaction. The minimum of the potential is where the patch bead-polymer bead separation is zero, such that when "binding" a patch sits inside a polymer bead [ Fig. 1(c)]. By using a short range interaction, in practice when it is bound a patch either sits inside one polymer bead, or sits between two consecutive beads along the chain (i and i + 1).
As detailed in Appendix A we check that in each simulation no single patch interacts with more than one non-consecutive polymer bead (i.e., a single patch cannot form a bridge).
More specifically, protein central beads have diameter σ and interact with polymer beads and all other protein beads and patches via the WCA potential given in Eq. (2). Patch beads (also diameter σ) interact with central and other patch beads via the WCA, but interact with polymer beads via a potential for r ij < r c and V PATCH (r ij ) = 0 otherwise. Here r ij is the separation between patch i and polymer bead j, a is a length set at a = 1.75 σ, and the range was set at r c = 0.9 σ. This has a similar functional form to the commonly used Morse potential. The energy 0 determines the interaction strength; we note that the minimum energy of this potential (when r ij = 0) is not 0 due to the terms in r c . For clarity in plots we use the actual binding energy P ; since it is possible for a protein patch to sit between two consecutive polymer beads, this energy can be a sum of contributions. As detailed in Appendix A, to ensure that we plot the correct value of P , we measure it from the simulations for each 0 value. These potentials are plotted in Fig. 1(d).
In summary, there are attractive protein-polymer interactions with strength P , but importantly there are no attractive protein-protein interactions.
We name the different models "1P", "3P", etc. for proteins which have 1 patch, 3 patches and so on. We also consider five varieties of protein with two patches named "2Pa" to "2Pe", which have patches separated by angles π, 0.825π, 0.65π, 0.475π, and 0.3π respectively. As a comparison we also ran simulations with isotropically interacting spheres (e.g., as in Refs. [4,9], here denoted the 0P model), where the protein-polymer interaction was determined by a shifted and truncated Lennard-Jones potential given by for r ij < r c and V LJ,cut (r ij ) = 0 otherwise; r ij is the separation between sphere protein i and polymer bead j. We set r c = 1.8 σ. The isotropic spheres can can interact with as many polymer beads as allowed sterically. Another difference between this and the patchy proteins case is that in the latter, on the whole, a polymer bead can interact with at most one protein patch at at time. This is because when bound, patch beads sit inside the polymer beads; since they interact with other patches through a WCA potential this will exclude another patch from interacting with the same polymer bead. In the isotropic 0P case a polymer bead can interact with multiple proteins.
A common approach in Langevin dynamics is to employ periodic boundary conditions; however, unless a sufficiently large enough system is used, this allows configurations where the polymer interacts with itself (through the bridging proteins) across the periodic boundaries. To avoid such configurations, and to avoid having to perform large simulations with a large number of proteins (to achieve concentrations sufficient for the BIA to act), here we instead confined all beads within a cubic region of width l x . This is achieved through a "wall" potential V WALL (r ij ), where here r ij is the separation between bead i and the closest point to it on wall j. Six such walls give the faces of the cube, and the potential takes the value if the bead is on the "outside" of the wall or 0 otherwise. We set K WALL = 10 k B T . As detailed for each set of simulations, typically we choose l x to be slightly larger than the mean radius of gyration for the polymer as predicted by the worm-like-chain (WLC) model [21]. This means that the polymer is under (weak) confinement, and so has a lower entropy than it would in the absence of confinement; nevertheless, the polymer concentration is still much lower than one would typically find for chromatin or DNA in vivo.
We performed Langevin dynamics simulations using the LAMMPS molecular dynamics software [22]. Briefly, the position of polymer bead i evolves according to the equation where m and ξ are the mass and friction respectively, V i is the sum of interaction potentials for bead i, and η i (t) is a noise term with components where η iα (t) = 0 and η iα (t)η iβ (t ) = δ α,β δ(t − t ). The centre of mass of each rigid body patchy protein moves according to the same equation, with its orientation governed by a similar equation for rotation [22]. For simplicity we set the mass and friction the same for each polymer bead and each protein component bead.
The simulation length, mass, and energy units, σ, m, and k B T give rise to a time unit τ = mσ 2 /k B T . The friction coefficient is related to the diffusion constant via the Einstein relation D = k B T /ξ, and we set ξ = 2 which gives over-damped motion with D = 0.5 σ 2 τ −1 leading to a Brownian time of τ B = σ 2 /D = 2 τ . To map these units to a real physical system one could, e.g., assume that for chromatin σ ≈ 30 nm, T =300 K, and the nucleoplasm viscosity ν = 10 −2 Pa s (where from the Stokes equation for a sphere of diameter d, ξ = 6πνd); this then gives τ = 0.3 ms.
In LAMMPS the Langevin equations are integrated using a velocity-Verlet algorithm; we set the time step dt= 0.01τ . To initialise simulations, proteins were positioned randomly, while polymer beads were placed along a random walk. Initialisation simulations were carried out using "soft" interaction potentials (to remove bead overlaps without numerical instabilities), before these were changed to the interaction potentials detailed above. Each simulation was then further equilibrated by running for 5 × 10 4 τ in the absence of attractive interactions. Simulations were then run with protein-polymer attraction for at least 10 5 τ , by which time all measured quantities had stopped systematically changing. Values shown in figures are obtained from an average over configurations taken at regular intervals from a further simulation run of 10 5 τ ; we also average over 5 independent simulations. In plots the error bars show the standard error in the mean.

Results
The BIA for patchy proteins with different valence In this section we present simulations of patchy proteins with different valence, and study the formation of clusters due to the BIA. We consider a system with L = 2000 polymer beads and N = 100 proteins of one type in a cube of size l x = 90 σ. The number of proteins is small compared to the number of polymer beads (protein binding sites) in order that discrete clusters will be detectable if they arise (compare the case below where the ratio N/L is larger and the polymer is saturated by proteins). Some typical snapshots are shown in Fig. 2(a).
We first consider the fraction of proteins f b which are bound to the polymer on average at any one time [ Fig. 2(b)]. For the 1P model we can approximate the behaviour using a simple kinetic binding model, i.e., where unbinding occurs at some rate k off ∼ e − P /k B T , and is independent of the number of proteins already bound. This leads to a steady state where A is a constant (see Appendix B). A fit is shown as a black dashed line in Fig. 2(b). For the other models, we find that at low energy the number of proteins bound grows with the protein valence. Since each patch can interact with the polymer separately, the total binding energy per protein is the sum of these contributions-thus we expect the mean residence time k −1 off to increase with the number of patches. These curves do not fit well to a function of the form given in Eq. (4), but instead f b increases more steeply, suggesting cooperative binding (due to the BIA) which increases with valence.
To quantify clustering, we define a protein as belonging to a cluster if its centre bead is less than some threshold distance φ away from that of another protein (i.e., the smallest clusters contain two proteins). Rather than choosing an arbitrary value for φ, we instead examined the number of proteins found to be in clusters, N cl as a function of φ for a high value of P . Since the different valence proteins bind the polymer to a different extent, in Fig. 2(c) we plot N cl scaled by the mean number of bound proteins N b . The multivalent proteins (models 2Pa, 3P, and 4P) show an initially steep increase with φ, but this begins to plateau, suggesting clustering, as expected due to the BIA. The 1P proteins show a strikingly different behaviour: after an initial slow increase in N cl with P the dependence becomes linear. We expect that this clustering is simply due to proteins binding randomly along the polymer coil-this can be confirmed using polymer configurations from a simulation without proteins, and choosing M polymer beads at random to attach proteins to, before performing the same clustering analysis (dashed black line in Fig. 2(c); M is the number of proteins bound in the 1P simulation). See Appendix C for further details. Based on Fig. 2(c) we suggest that φ = 3.5 σ is a reasonable threshold to determine clustering. Figure 2(d) then shows the fraction of proteins involved in clusters f cl as a function of P for each protein model. Consistent with the above discussion, the 1P proteins show a weak increase with P , while the other protein models show a steep increase and plateau. The curves become steeper (with clustering observed at lower energy) as the number of patches increases, indicating that the BIA effect becomes stronger with increasing valence.

Polymer compaction by patchy proteins with different valence
We next characterise the ability of the different protein models to compact a polymer. We perform simulations with a smaller L = 1000 bead polymer and N = 500 proteins in a simulation box of side l x = 70 σ; in this case there are sufficient proteins to compact the whole polymer. Typical snapshots are shown in Fig. 3(a). In Fig. 3(b) we plot the radius of gyration of the polymer, defined as in order to asses polymer compaction. Here r i is the position of polymer bead i, and r c = (1/L) i r i . We find that, as expected, no compaction is observed for the 1P model which cannot form bridges. The other models all show compaction when the proteinpolymer interaction strength P is large. For the 4P case a steep decrease in R g is observed at around P ≈ 2 k B T ; this behaviour is similar to the case of isotropic spheres [7], and reminiscent of the well studied polymer coil-globule transition for a poor solvent [23,24]. As the number of patches decreases, the decrease in R g shifts to larger P , indicating that proteins with fewer patches are less able to compact the polymer. This looks similar to the case with fewer proteins as shown in Fig. 2(b). Again, at small P , f b increases with protein valence, and (except for the 1P case) there is a plateau at large P . This time, however, at large P there is a clear trend for the higher valence proteins to show lower f b values-this is likely because the "binding sites" on the polymer are becoming saturated.
Since for the different models, different numbers of proteins are bound for a given value of P , it is useful to consider the level of polymer compaction as a function of the total protein-polymer interaction energy, f b P P , where P is the number of patches per protein (valence). Scaling the binding energy in this way collapses the R g curves on top of each other for the 2P-4P models [ Fig. 3(d)]. One can rationalise this by considering that to compact the polymer to the same extent one would need twice as many 2Pa proteins as 4P proteins for the same P (a cluster of two 2Pa proteins looks much like a single 4P protein, etc.).

Effect of shape on the BIA for valence-2 patchy bridges
Above we considered model proteins where the patches were placed equidistantly around the central bead. Of course, in real proteins such an arrangement of DNA binding domains may not the be case. To systematically study the effect of different protein shapes, in this section we consider valence 2 proteins and vary the angle between the patches. Specifically, we consider the five model proteins as shown in the bottom row of Fig. 1(b). Real examples of proteins which might have a similar arrangement of binding domains include the eukaryotic protein HP1 (which is thought to form dimers either in an elongated or a folded state, depending on post-translational biochemical modifications [25]), and the bacterial protein H-NS (which is thought to adopt elongated or folded shapes depending on salt concentration or the presence of other ligands). Both of these examples could therefore, under different conditions, be similar to either the 2Pa or 2Pe models.
To study the formation of clusters via the BIA, we again consider a system with an L = 2000 bead poly-mer and N = 100 proteins with l x = 90 σ; simulation snapshots are shown in Fig. 4(a). In Fig. 4(b) we plot the fraction of proteins found in clusters for each of the different models (using a threshold separation φ = 3.5 σ, as before). The 2Pb protein model behaves very similarly to 2Pa-as the binding energy increases, there is a sharp increase in clustering at around P = 4 k B T . For the 2Pe model, where the DNA binding patches are both on the same side of the molecule, we see quite different behaviour-the proteins start to cluster at smaller P , and by P ∼ 4 k B T the level of clustering starts to plateau (at a much lower level than in reached, e.g., by the 2Pa proteins). By measuring the number of proteins bound, we can again estimate the level of clustering which would be observed just due to random binding to a polymer coil in the absence of any cooperative effects [black dashed line in Fig. 4(b)]. We find that the clustering observed in the 2Pe case is approximately at this level, indicating that the BIA is not in effect (at high energy for 2Pe, f cl is actually lower than expected due to random binding, suggesting that these model proteins have a larger excluded volume when bound than taken into account by our random binding estimate -see Appendix C). The 2Pd model also only shows the level of clustering expected from random binding to the polymer [black dotted line in Fig. 4(b)]. The 2Pc curve shows a behaviour somewhere between the two.
Figure 4(c) shows how the level of protein binding f b depends on P . For all of the models we observe an "Sshaped" curve. One might naively expect that, since we do not observe clustering for the 2Pe proteins, there would be no cooperative effects, and so, e.g., f b would pass through 0.5 at larger P . In fact, the opposite is the case. The 2Pe curve crosses f b = 0.5 at lower P , while the 2Pa and 2Pb curves virtually overlap, crossing f b = 0.5 at larger P . The curves for the other models sit somewhere in between. These observations can be rationalised by considering that the valence-2 proteins can bind the polymer in several different 'modes' [Fig. 4(e)]: either just one of the patches binds the polymer, or they both do. In the latter case, patches can bind two beads which are nearby along the polymer contour, or two beads which are far apart; we describe these as "coating" and "bridging" bonds respectively. Specifically, if a protein binds polymer beads i and j, we say it is coating if |i − j| < 3, otherwise it is bridging. When only one patch binds, we call this "dangling". Figure 4(f) shows, for each protein model, the fraction of proteins which bind in each mode as a function of P . For the 2Pa and 2Pb models we find that most proteins bind in the bridging mode, with a small fraction "dangling". As the angle between the patches decreases (progressively through the 2Pc, 2Pd and 2Pe models), a larger proportion of proteins bind in the coating mode-in the 2Pe case nearly all proteins bind in this way. This explains the observed trends for clustering: the BIA is only in effect when proteins can form bridges. In fact, if we plot f cl as a function of the fraction of proteins binding in the bridging mode [ Fig. 4(d)], the curves for the 2Pa, 2Pb and 2Pc curves collapse on top of each other. The 2Pe proteins actually behave like valence-1 proteins, but since they have two binding patches, their polymer interaction effectively has double the energy (the black dashed line in Fig. 4(c) shows the curve given by Eq. (4) with the replacement P → 2 P using the constant A found from the fit to the 1P proteins).
We also examined how protein binding affected bending of the polymer by measuring the mean angle θ i between consecutive polymer segments (defined as in Eq. (1)). This revealed that the 2Pe proteins tend to slightly stiffen the polymer (reducing θ i ), whereas the 2Pd proteins tend to induce bends (increasing θ i ).
As expected, binding of proteins which readily form bridges (2Pa-c), and therefore create loops, also leads to an increased bending. These effects are all very small with θ i changing by less than 0.5% with respect to a polymer coil in the absence of proteins; the effect on the persistence length as measured from the decay in bond-bond correlations ( t i ·t i+s i = e −s/Lp ) is smaller than the error in the measurement. Previous work [26][27][28] has shown that proteins which induce bending or a stiffening of a polymer can experience mechanically induced cooperative binding, but these effects appear to be too weak to lead to clustering in the case of the model proteins studied here.

Effect of shape on polymer compaction for valence-2 patchy bridges
Finally, in Fig. 5 we present results from simulations with N = 500 valence-2 proteins interacting with an L = 1000 bead polymer (l x = 70 σ), and again examine the ability of the proteins to compact the polymer. The radius of gyration decreases with P only for the 2Pac protein models. The 2Pa and 2Pb curves overlap, indicating that these proteins can compact the polymer to a similar extent (R g tends to be slightly higher for 2Pc). The fraction of proteins bound as a function of P [Fig. 5(c)] shows similar trends as the case with fewer proteins-the 2Pe proteins bind more readily. Fig. 5(e) shows the proportion of proteins which are binding in each mode as a function of P ; again the behaviour is very similar to that in the previous section, with the proteins more likely to coat rather than bridge as the angle between patches decreases. The proportion which bind in the "dangling" mode is slightly larger at higher P values than in the case with fewer proteins (this is likely due to "binding sites" on the polymer becoming saturated). Figure 5(d) again shows R g , but this time as a function of the fraction of proteins bound in the bridging mode: there is a rough collapse of the curves for the different proteins.
Together these results suggest that again, the effect of protein shape mainly comes down to the ability of each model to form bridges.

Discussion and Conclusions
In this work we have further investigated the bridginginduced attraction (BIA), a mechanism which was uncovered in simulations of sphere-polymer mixtures [4]. Previously, we showed that spheres with an isotropic attractive interaction with a polymer tend to spontaneously form clusters, even in the absence of attractive sphere-sphere interactions. These clusters grow and coarsen until there is a single large cluster [9]. In this way the BIA leads to a bridging-induced phase separation. Spheres are the simplest model for DNA or chromatin binding proteins (or protein complexes) which can bind in multiple places simultaneously in order to form molecular bridges. In the present work we study in more detail the effect of protein valence on the BIA by using simple "patchy particles" to model multivalent proteins.
We found that all model proteins with valence ≥ 2 form clusters, but that the effect is stronger for higher valence. The fraction of proteins involved in (≥ 2 protein) clusters, f cl shows an "S-shaped" dependence on the protein-polymer interaction energy. The fraction of proteins bound on average f b also shows an S-shaped curve, as expected. In comparison to valence-1 proteins (or a simple kinetic binding model), the f b curves for multivalent proteins are steeper and, for example, cross f b = 0.5 at lower energy as the valence increases-this indicates cooperative binding. This is consistent with the action of the BIA, where after an initial bridge forms, subsequent bridges forming in the same place act to stabilise the first.
In our previous work on isotropic spheres [12] we found that bridging-induced clusters coarsen, growing in size until only one large cluster remains at equilibrium (perhaps with smaller clusters which form transiently). These dynamics can be very slow compared to the duration of a typical simulation; two possible coarsening mechanisms are cluster coalescence (which requires large, slowly diffusing clusters to come into contact, and can be hindered by intervening unbound polymer), or shrinking of small clusters at the expense of large ones (where sufficient time must be allowed for proteins to unbind and rebind the polymer). It would be interesting to understand whether cluster coarsening also occurs with the lower valence protein models. Particularly, is the equilibrium state fully phase separated, or a microphase separation? In practice it was not possible to detect cluster coarsening in the relatively small and short simulations presented here. While it is beyond the scope of the present work, we propose that in the future, longer simulations (perhaps employing enhanced sampling techniques) of larger systems in which many small clusters initially form could be used to study cluster coarsening. In other systems (e.g. a phase separating binary fluid mixture) coarsening is associated with interfacial tension; in the present system, cluster surface tension is provided by "unbound" patches and polymer beads at the edges of clusters [29]. One might speculate that with the lowest valence pro-teins, it will be easier to satisfy all protein bounds and so the surface tension will be lower, which could lead to an arrest of coarsening (though this picture clearly neglects the entropic contribution of the polymer configuration).
For the case of valence-2 proteins we found that the BIA also depends on the protein shape. Specifically, for our simple patchy proteins we varied the position of the two patches; starting from a model where they were on opposite sides of the protein "central bead" (model 2Pa) we progressively reduced the angle between the patches across five models, with the final model having two adjacent patches on one side of the central bead (model 2Pe). The level of clustering was higher for a larger angle between the patches. This effect can be traced back to the ability of the proteins to form bridges. For example, the 2Pe model proteins much more readily bind the polymer in a "coating" arrangement-if no bridge is formed, the BIA will not be in effect. This is confirmed by a plot of the fraction in clusters f cl as a function of the fraction of proteins forming bridges, where (aside from clustering which occurs simply due to localisation of proteins to the polymer, which dominates for the 2Pd-e models) the curves collapse on top of each other [ Fig. 4(d)].
As noted above, there are several examples of proteins which are thought to have structures similar to our valence-2 model proteins in terms of the layout of their DNA binding domains. This includes the histone-like nucleoid structuring protein (H-NS), which has also previously been studied using simple coarsegrained simulations [30,31]. These proteins readily form dimers (though larger oligomers are also observed) which adopt either an elongated (2Pa-like) or folded (2Pe-like) shape depending on salt concentration or the presence of other factors. An interesting system which could be studied further in the future is a mixture of different model proteins, or proteins which can switch between two states/shapes [31]. One could consider proteins which transition thermodynamically between two states, i.e., there is an energy barrier between the states, which can be surmounted by thermal fluctuations (and polymer binding might act to stabilising the protein against switching state). Or, one could consider active (non-equilibrium) switching, as in Ref. [12], where the proteins switch between states independently of thermal fluctuations in a way that breaks detailed balance. The latter might represent a protein which undergoes refolding after, e.g., posttranslational chemical modification or interaction with ATP.
When there is an excess of proteins (compared to the number of polymer beads, or "binding sites"), rather than clusters we observe a polymer collapse. This is reminiscent of the coil-globule transition which is observed for a polymer in a poor solvent [23,24], and has previously be studied for the case of isotropic spheres [7,8]. Here we studied the ability of the different model proteins to compact a polymer by measuring the radius of gyration; we find that only multivalent proteins can collapse/compact the polymer, and as valence increases the "compacting ability" of the proteins also increases [the decrease in R g occurs at lower values of P , Fig. 3(b)]. The R g curves for the different models collapse on top of each other if we plot as a function of f b P P [ Fig. 3(d)], indicating that in this regime the compaction depends on the total proteinpolymer attraction energy. In regards to the polymer compacting ability of the valence-2 proteins, we found that this again depended on the propensity of the proteins to form bridges. Protein models which tend to bind in a coating mode (models 2Pd and 2Pe) did not induce polymer collapse. Consistent with this, plotting the radius of gyration as a function of the number of proteins binding in the bridging mode gave curves for the different models which roughly overlap.
These results are highly relevant for understanding the physics of the formation of genome-associated protein foci in cells. While many proteins only have a single DNA binding domain, these often function within large complexes. For example, transcription typically involves RNA polymerase, transcription factors, and co-activating factors such as the mediator complex. In recent years, an idea which has gained much traction in the molecular biology literature is that protein foci form as micro-phase separated liquid-like droplets mediated by protein-protein interactions [32][33][34]. That view prompts a number of questions, including, "What prevents such droplets from coarsening? ", "How are droplets 'pinned' at the relevant genomic location? ", and "Do active biochemical processes play a role?. The BIA offers an alternative (or, more likely, complementary) mechanism for cluster formation. It will be interesting in the future to study the interplay between protein-chromatin and protein-protein interactions, as well as the effect of non-equilibrium processes. The physics of such systems likely plays an important role in the formation of many different protein structures within the nucleus, impacting on genome regulation in both healthy and disease states.

Appendix A: Details of patchy interactions
In this appendix we further detail two aspects of the patch-polymer interactions.
First, it is intended that for all of the patchy protein models, each patch can only make one protein-polymer bond at a time-i.e., that a patch itself cannot form a bridge. To ensure this, in all simulations we counted the number of polymer beads that each patch interacts with (an interaction is defined as the patch-polymer centre-to-centre separation is less than the V PATCH (r) cut off r c = 0.9 σ). We found that it was possible for a patch to sit between two consecutive polymer beads (i and i + 1), and we did not count this as bridging. The relative occurrence of patches interacting with a single polymer bead or with two consecutive beads depends on the interaction energy, and was observed in over 97% of patch-polymer interaction events. Occasionally, a patch was found to be interacting with 3 polymer beads, but this was usually (∼ 2.5% of all interaction events) with three consecutive beads (i,i + 1,i + 2); this type of interaction induces a bend in the polymer. Instances of a patch binding more than one nonconsecutive polymer bead (true bridging by a patch) accounted for less than 0.5% of patch-polymer interaction events across all simulations.
Second, we measured from the simulations the actual patch-polymer interaction energy P for a given interaction strength setting 0 . These two energies are not equal for several reasons: (i) due to the functional form of V PATCH (r ij ) as given in Eq. (3), the energy minimum is shifted; (ii) though the minimum of this function is at V PATCH (r ij = 0), due to, e.g., steric interactions between the protein "central bead" and the polymer, it may not be possible for the patch-polymer bead separation to actually reach zero; (iii) as noted above, a patch can sit between two consecutive polymer beads, so the total interaction energy will be the sum of two contributions. To accurately get a mapping function P = f ( 0 ), we performed simulations with different 0 for a single model 1P protein and an L = 500 bead polymer, and measured the mean residence time for the protein on the polymer τ res = k −1 off as a function of 0 . Assuming that the unbinding rate is given by k off = A 0 e − P /k B T , and obtaining A 0 from a simulation with 0 =0, we can obtain P from the measured k off . We obtain the mapping function by fitting (Fig. 6); the simplest function which gives a good fit to the data is P , this prediction of clustering due to random noncooperative binding is slightly higher than observed in the 2Pe simulations-we expect this is because 2Pe binding at polymer bead i sterically excludes binding at more surrounding beads than accounted for in the "virtual protein" procedure detailed above.
The black dotted line in Fig. 4(b) is obtained using values of M from the number of proteins bound in the 2Pd simulations.